Modeling of Structured 3-D Environments from Monocular Image Sequences


The vision sensor developed in this research estimates its three-dimensional position with respect to the environment and models it simultaneously. Extreme accuracies are not aimed at. Instead, estimates become recursively more accurate, when objects are approached and observed from different viewpoints.

The modeling process starts by extracting interesting tokens, like lines and corners, from the first image. Those features are then tracked in subsequent image frames. Also some previously taught patterns can be used in tracking. Few features in the same image are extracted only from small regions. By this way the processing can be done at video frame rate. New appearing features can also be added to the environment structure.

Click for better resolution

Kalman filtering is used in estimation. The parameters in motion estimation are location and orientation and their velocities. The environment is considered as a rigid moving object with a centroid. Environment structure that consists of 3-D coordinates of the tracked features, is estimated in respect to this point. The initial model lacks depth information. The relational depth is obtained by utilizing the fact that close points in the image plane move faster than the more distant ones during translational motion. Additional information is needed to obtain absolute coordinates.

Click for better resolution

Special attention has been paid to modeling uncertainties. Measurements with high uncertainty get less weight when updating the motion and environment model. The rigidity assumption is utilized by using shapes of a thin pencil for initial model structure uncertainties. By observing continuously motion uncertainties the performance of the modeler can be monitored.

In contrast to the usual solution, the estimations are done in separate state vectors, which allows motion and 3-D structure to be estimated asynchronously. In addition to having a more distributed solution, this technique provides an efficient failure detection mechanism. Several trackers can estimate motion simultaneously, and only those with most confident estimates are allowed to update the common environment model. Tests showed that motion of six degrees of freedom can be estimated in unknown environment. 3-D structure of the environment is estimated simultaneously. The achieved accuracies were millimeters at a distance of 1-2 meters, when simple toy-scenes and more demanding industrial pallet scenes were used in tests. This is enough to manipulate objects, when the modeler is used to offer visual feedback.


Some application areas


Some test results


Future goals


more detailed information

Tapio Repo ( tre@ee.oulu.fi )