New technology: Image-based Modeling and Rendering

These web pages showcases new technology in capturing, modeling and rendering of virtual heritage. The 3D models of seal hunt statues were captured using only 2D images. The objective of our research is to make this type of 3D capture and modeling as easy as taking and editing conventional 2D photos.

By contrast, in conventional modeling a so called 3D modeling program is used to build an object or scene by adding geometric primitives (e.g. lines, planes, spheres...) one-by-one in a fashion similar to how a 2D picture can be drawn in say Macdraw or xfig. There are several such modelers, e.g. Maya, 3D studiomax, Lightwave, Blender. The three first are commercial programs costing from thousands to ten thousand dollars. The last, Blender can be downloaded for free. Making detailed and realistic models by hand using a modeling program is very tedious and time consuming. In also difficult to ensure that the manually entered model is a faithful copy of the real heritage object.

Another way to capture 3D geometry of a real object is to use an active laser scanner. This falls into the general category of active 3D sensing, which also includes structured light, where a light pattern is projected on the object and mechanical sensing, where a touch probe is traced along object surfaces. Laser scanning is the most popular and accurate of these methods. There are two principles. For small and medium scale objects a triangulation based laser head is used. A line of laser light is beamed on the object and captured by a camera offset a distance. Depth is calculated from the resulting image curve. For large scale objects and scenes a time of flight laser is used, and the depth is calculated by measuring the time until a light pulse comes back from the surface. Either technology is quite expensive, with laser capture "heads" starting at $20,000 and whole capture systems often being in the $100,000's.

Laser scanning is mostly applied in industrial and engineering applications, where the high accuracy of the model is needed. Manual modeling is mostly used for movie special effects and computer games, where the complete creative freedom in model creation is most important. Both methods require rather large budgets and skilled users, and have therefore had little penetration in broader applications. An additional problem in using these methods for real world visual modeling is that they produce geometry only, and for photorealistic rendering photos of object appearance have to be separately (often manually) registered with the geometric models.

Introduction to Image-based techniques

Recently many "image-based" techniques have been popular research topics in both computer graphics and vision. One technique developed mostly in computer graphics involves using cameras to sample all the light rays, the so called "plenoptic function" in a 3D volume. An interesting aspect is that 3D geometry is not explicitly represented yet new viewpoints can be rendered. The most well known implementations are the lightfield and lumigraph systems. These both require numerous carefully calibrated camera images, and hence have found little practical use.

On the other end of the spectrum, so called structure-from-motion (SFM) methods in computer vision (which are similar to photogrammetry) compute an explicit geometry from camera images based on the identification of points in two or more images which correspond to the same physical 3D point. Then triangulation is used to determine the location of the 3D point. The main challenge here is that it is non-trivial to automatically and reliably find the corresponding points. Hence resulting models have relatively few points and the modeled geometry is at best a coarse representation for the real world object.

In shape-from-silhouette (SFS), instead of using corresponding points, a 3D shape is computed from the 2D image projection of an object silhouette in several images. When capturing objects we need to isolate or "segment" the object from the background, and in the process we get the silhouette. Practically this can be done with techniques from blue screening to background subtraction. The geometry computed with SFS is not in general the true object geometry, but an enveloping geometry called the visual hull. This is because not all object features are seen on the silhouette (for instance indentations in the object can hide detail). Yet, while the SFS model is only approximate it is easily and quickly computed, and therefore it is our method of choice for capturing object geometry.

Components of our image-based model

Our image based model consists of two parts. The first is a conventional coarse 3D geometry captured using SFS. Since this model is only approximate, we augment it with a fine scale model representing both regular surface appearance (as conventional textures do) but also how the appearance changes over different viewpoints due to fine scale geometric detail and light. This variation is represented as differential or derivative images. Mathematically these form a basis for the texture viewpoint variation and for any viewpoint the basis elements can be blended into a texture image.

In model capture, an input image sequence is decomposed into these two parts, the coarse geometry and the fine scale appearance basis. For rendering the 3D geometry is reprojected into a 2D image just as in regular graphics, but unlike regular graphics we don't texture (paint) the model with a static texture image. Instead an intuitive way to visualize the dynamic texture is that it plays a small movie on each model facet as the viewpoint of the rendering changes. The content of the movie is designed to hide the coarseness of the underlying geometric model.

+ =

Geometry + dynamic texture = photo-realistic rendering of novel views

Principles and steps in extracting a model

The process to acquire a digital model involves the following steps:
  1. Input images: Record several (50-100) digital images of the object from different viewpoints. This is done by streaming video directly into the program while the object moves on a turntable.
  2. Calibration: Calculate where the camera was with respect to the object for each of the images.
  3. Segmentation: Remove the blue background behind the object to isolate the object and compute its silhouette.
  4. Calculate geometry: The shape-from-silhouette method computes a visual hull using all the silhouettes.
  5. Texture coordinates: To relate the convoluted surface of the 3D object to a 2D texture space first the object surface is split into several somewhat flat pieces, then each piece is fully flattened and assigned a spot in the texture (see texture map above)
  6. Dynamic texture basis: Using the 3D model and the 3D to 2D texture coordinate transform all the input images are put into texture coordinates and from their variation the dynamic texture basis is calculated.
  7. Model export: The model consisting of geometry and texture basis is saved to file.
You can learn more about the detail stages of the modeling in our capture system users guide.

Rendering and Animation

The final models can be rendered either using our stand-alone renderer, or combined with othjer models and e.g. an appropriate background scene using a rendering plug-in we wrote for the standard modeler Maya.


Credits
Information and artifacts provided by the University of Alberta museum services.

Model capture performed by University of Alberta's computer vision research group as a WISEST project involving two High School interns, Katie and Tammy, under the supervision of Cleo Espiritu and Martin Jagersand.

Site design, content and graphics by Cleo Espiritu

Software by Keith Yerex and Neil Birkbeck.

Maya modeler provided by Alias. (For which we wrote an image-based rendering plugin and used to render the videos and scene images)

Background music of the Hunt video: "Wind Dancing" by Soaring Eagle.

Seal hunt information taken from:
The Inuit
Aboriginal Innovations in Art, Science and Technology
Native Peoples and Cultures of Canada by Alan D. McMillan
Scared Hunt by David F. Pelly