Image based modeling and rendering with dynamic texture

Dynamic Textures

Motivation

Texturing an object or scene model with an image or images taken from the real world is a common method attempting to achieve photo-realism. The process of turning an arbitrary image into a texture image involves transforming the image from 2D camera coordinates, using the 3D model, into texture coordinates as illustrated in the figure below. However, if we use several input images from the real scene, we will get a slightly different texture image from each one of them. This can be seen by comparing the upper and lower texture image.

Some reasons why the texture looks different are:

Model geometry: Objects are modeled with planar surface patches (triangles), but the real world is seldom perfectly planar which introduces paralax errors (see e.g. the windows).
Tracking camera pose: Errors in correspondence between the camera and texture coordinates introduces shifts in the texture image. (See the house edge on the right side)
Motion: Temporal changes to the texture image, such as waves on a water surface.
Light: The real image contains the combination of the scene and the light, and the light reflection may be dependent of the viewing angle.
Quantization: The forshortening of near oblique surfaces in the real image can cause aliasing and upsampling problems.

Theory

Early approaches to deal with the view-dependency problem of photo-realistic texturing usually involved keeping lots of input images, and when rendering texture using a real image shot close to the virtual camera pose, or blend together the 2 or 3 closest real images. However, among drawbacks of this approach is the large database of textures needed, and despite this switching textures produce jumping images, while blending causes blur.

Our solution instead involves constructing a spatial basis, B, which correctly represents the view dependency of the texture image, and then at rendering time modulate this basis with a time, k, and pose, X, varying function y so that a time varying dynamic texture is composed as: T(k) = By(X,k). The basis B can analytically be expressed in terms of image derivatives and geometry. Depending on the complexity of the scene and viewing conditions a varying number of basis vectors are needed, but for simple cases (ie only illumination variation) a handful vectors are sufficent, while most real scenes, with both geometry and tracking induced variation will need a few dozen. The details and proofs of the mathematical derivation can be found in our IEEE VR2003 tutorial text.

An example of three basis vectors from the house scene above illustrates how the texture basis represents not images, but the corrections needed to compensate for e.g. the non-planarity of the windows and the right edge of the house which was mis-tracked.

Implementation

In the rendering stage modulating a texture involves basically a large matrix product with the modulation function. This can be performed efficiently either on graphics HW or SW using SIMD-accelerated MMX instructions as described in the renderer section. Computing the basis and modulation function is done beforehand. Conveniently it turns out that for most variation it can be computed using only the geometric and intensity information extracted from the real image sequence. The mathematics is described in the detailed tutorial text.

Experiments

Modulated texture vs. traditional texture while rendering a rotating quadrilateral with a wreath

Rendering a flower

Changing illumination on a face