Geometric model from silhouette

Motivation

In images of an object, the occluding contour, or silhouette of the object, is the border between that object and the background. If these contours are available (through background subtraction or blue-screening) a significant amount of shape information can be extracted from them. Using sihlouette information, and calibrated cameras, a shape known as the visual hull of the object can be found. Shape from sihlouette (SFS) is one of the most robust methods for extracting shape from visual information. It is also cheap, and easy to implement. Depending on the object, and the number of camera views used, the captured model can be sometimes be quite accurate, but there are parts of the shape that can never be captured correctly. For example, the inside of a bowl is curved in, but there is never a contour where this shape is visible. Similarly, small indentations like the eyes on a face, can't be captured exactly with this method. These inacuracies of the geometry are dealt with by an image based method as described in the Dynamic Texture section.

Theory & Implementation

To extract the visual hull from a set of k images, each silhouette is extended to a cone-like volume by back-projection of the line segments that make up the contour. Then the intersection of all k volumes forms the most accurate approximation to the visual hull that can be found with those camera views. The computational difficulty comes when performing this intersection.

Exact intersection of polygonal volumes is generally very slow, so most SFS algorithms convert volumes to a representation where these intersections can be calculated more efficiently. Usually, a regular grid is used to store binary voxels: each indicating occupancy of a small cube of space. The volumes can easily be intersected by an "and" operation. This approach leads to large memory usage, and "blocky" models as a result of extracting polygonal surfaces from the volume with the marching cubes algorithm.

Rather than a discrete, voxel grid, we compute volume intersections using a data structure called marching intersections. This data structure stores occupancy in the form of rays, instead of cubes. A cube is formed by 3 sets of n2 rays aligned with the x,y,and z axes respectively, where each ray keeps track of each point where it enters or exits the volume. This effectively keeps track of the same information as a complete voxel grid using less memory (O(n2) vs O(n3)) while also storing exact intersection points which can be used to render a smooth polygonal mesh (using a modified marching cubes implementation). The volumes are intersected efficiently by projecting all rays onto each image, finding points where rays intersect the silhouette, and back-projecting those points to adjust the extents of each ray in the structure. The figure on the right shows a model recovered from SFS, and one slice of the marching intersections data structure. We have implemented a capture system for the convenient capture and processing of SFS generated models.


Example: Model built from images of a carved stone elephant


3 of the images used to build the model above