Simultaneous 3D Reconstruction for Water Surface and Underwater Scene

This paper presents the rst approach for simultaneously recovering
the 3D shape of both the wavy water surface and the moving
underwater scene. A portable camera array system is constructed, which
captures the scene from multiple viewpoints above the water. The correspondences
across these cameras are estimated using an optical
ow
method and are used to infer the shape of the water surface and the
underwater scene. We assume that there is only one refraction occurring
at the water interface. Under this assumption, two estimates of the
water surface normals should agree: one from Snell's law of light refraction
and another from local surface structure. The experimental results
using both synthetic and real data demonstrate the e ectiveness of the
presented approach.

Reference

Y. Qian, Y. Zheng, M. Gong, and Y.H. Yang, "Simultaneous 3D reconstruction for water surface and underwater scene," ICCV, 2018.

Stereo-Based 3D Reconstruction of Dynamic Fluid Surfaces by Global Optimization

3D reconstruction of dynamic fluid surfaces is an open
and challenging problem in computer vision. Unlike previous
approaches that reconstruct each surface point independently
and often return noisy depth maps, we propose
a novel global optimization-based approach that recovers
both depths and normals of all 3D points simultaneously.
Using the traditional refraction stereo setup, we capture the
wavy appearance of a pre-generated random pattern, and
then estimate the correspondences between the captured images
and the known background by tracking the pattern.
Assuming that the light is refracted only once through the
fluid interface, we minimize an objective function that incorporates
both the cross-view normal consistency constraint
and the single-view normal consistency constraints. The key
idea is that the normals required for light refraction based
on Snell’s law from one view should agree with not only
the ones from the second view, but also the ones estimated
from local 3D geometry. Moreover, an effective reconstruction
error metric is designed for estimating the refractive
index of the fluid. We report experimental results on both
synthetic and real data demonstrating that the proposed approach
is accurate and shows superiority over the conventional
stereo-based method.

Reference

Y. Qian, M. Gong, and Y.H. Yang, "Stereo-based 3D reconstruction of dynamic fluid surfaces by global optimization," CVPR, 2017.

Two-view underwater 3D reconstruction for cameras with unknown poses under flat refractive interfaces

In an underwater imaging system, a refractive interface is introduced when a camera looks into the water-based environment, resulting in distorted images due to the refraction of light. Simply ignoring the refraction effect or using the lens radial distortion model causes erroneous 3D reconstruction. This paper deals with a general underwater imaging setup using two cameras, of which each camera is placed in a separate waterproof housing with a flat glass window. In order to handle refraction properly, a sim- plified refractive camera model is used in this paper. Based on two new concepts, namely the Ellipse of Refrax (EoR) and the Refractive Depth (RD) of a scene point, we derive two new formulations of the underwater known rotation structure and motion (SaM) problem. One gives a globally optimal solution and the other is robust to outliers. The constraint of known rotation is further relaxed by incorporating the robust known rotation SaM into a new hybrid optimization framework. Our method is able to si- multaneously perform underwater camera calibration and 3D reconstruction automatically without using any calibration object or additional calibration device. In order to evaluate the performance and practical applicability of our method, extensive experiments using synthetic data, synthetically rendered images and real underwater images were carried out. The experimental results demonstrate that the proposed method can significantly improve the accuracy of the reconstructed 3D structure (within 0.78 mm for an object of dimension over 200 mm compared with the ground truth model captured by a land-based system) and of the system parameters for underwater applications. Compared with bundle adjustment using the refractive camera model initialized with traditional 3D reconstruction methods, our proposed optimization method has significantly better completeness and accuracy and lower 3D errors in the re- constructed models.

Reference

L. Kang, L. Wu, Y. Wei, S. Lao, and Y.H. Yang, "Two-view underwater 3D reconstruction for cameras with unknown poses under flat refractive interfaces," Pattern Recognition, Vol. 69, 2017, pp. 251-269.

A Closed-form Solution to Single Underwater Camera Calibration using Triple Wavelength Dispersion and its Application to Single Camera 3D Reconstruction

In this paper, we present a new method to estimate
the housing parameters of an underwater camera by making
full use of triple wavelength dispersion. Our method is based
on an important finding that there is a closed-form solution to
the distance from the camera center to the refractive interface
once the refractive normal is known. The correctness of this
finding is mathematically proved in this paper. To our best
knowledge, such a finding has not been studied or reported,
and hence is never proved theoretically. As well, the refractive
normal can be estimated by solving a set of linear equations
using wavelength dispersion. Our method does not require any
calibration target such as a checkerboard pattern which may
be difficult to manipulate when the camera is deployed deep
undersea. Extensive experiments have been carried out which
include simulations to verify the correctness and robustness
to noise of our method and real experiments. The results of
real experiments show that our method works as expected. The
accuracy of our results are evaluated against the ground truth in
both simulated and real experiments. Finally, we also show how
we can apply dispersion to compute the 3D shape of an object
using one single camera.

Reference

X. Chen and Y.H. Yang, "A closed-form solution to single underwater camera calibration using triple wavelength dispersion and its application to single camera 3D reconstruction," IEEE Trans. on Image Processing, 2017.

3D Reconstruction of Transparent Objects with Position-Normal Consistency

Estimating the shape of transparent and refractive objects is one of the few open problems in 3D reconstruction. Under the assumption that the rays refract only twice when traveling through the object, we present the first approach to simultaneously reconstructing the 3D positions and normals
of the object’s surface at both refraction locations. Our acquisition setup requires only two cameras and one monitor, which serves as the light source. After acquiring the ray-ray correspondences between each camera and the monitor, we solve an optimization function which enforces a new position-normal consistency constraint. That is, the 3D positions of surface points shall agree with the normals required to refract the rays under Snell’s law. Experimental results using both synthetic and real data demonstrate the
robustness and accuracy of the proposed approach.

Reference

Y. Qian, M. Gong, and Y.H. Yang, "3D Reconstruction of Transparent Objects with Position-Normal Consistency," IEEE Conf. on Computer Vision and Pattern Recognition, June 26-July 1, 2016, Las Vegas, USA.

snake

Quasi-Dense Correspondence in Stereo Images using Multiple Coupled Snakes

In this project, we develop a new method to establish quasi-dense correspondence between a pair of stereo images without camera calibration. Our proposed method is based on the traditional snake formulation using an energy
function. The energy function incorporates a new matching term. When the energy function is minimized, the control points
along the curves in the stereo pair are matched. Moreover, a penalty term is applied to prevent two snakes in the same image to overlap. In our method, snakes in both images are coupled and can change their shapes simultaneously. In particular, the control points on the curves are matched and evolved at the same time. Comparing our method to the conventional stereo methods, the latter requires camera parameters in order to employ the epipolar constraint to locate correspondences while ours does not. Comparing to the traditional feature based stereo methods such as those using SIFT and SURF, the number of correspondences established by our method is significantly higher. Our method is especially suitable in scenes for which there are many textureless regions, and hence SIFT/SURF can find few matches. In order to evaluate the accuracy of different methods, the fundamental matrix is computed using the correspondences established by each method. The experimental results from both synthetic and real images are compared to the ground truth and to the
conventional sparse matching method to demonstrate that our method has significant improvement over existing methods.

Reference

X. Chen and Y.H. Yang, "Quasi-Dense Correspondence in Stereo Images using Multiple Coupled Snakes," CRV, May 28-31, 2013, Regina, Saskatchewan.

 

gaussian blurred

Recovering Stereo Depth Maps using a Single Gaussian Blurred Structured Light Pattern

In this project , we develop a new single shot structured light method to recover dense depth maps. Contrary
to most temporal coding methods which require projecting a series of patterns, our method needs one color pattern only.
Unlike most single shot spatial coding methods which establish correspondence along the edges of the captured images, our method produces a dense set of correspondences. Our method is built based on a new important observation that a Gaussian blurred De Bruijn pattern preserves the desirable windowed uniqueness property. A Gaussian blurred De Bruijn pattern is used so that the color of every illuminated pixel is used to its fullest advantage. The simulated experiments show that the proposed method establishes a correspondence set whose density and accuracy are close to that of using a temporal coding method. We also demonstrate the robustness of our approach by applying it to several real-world datasets.

Reference

X. Chen and Y.H. Yang, "Recovering Stereo Depth Maps using a Single Gaussian Blurred Structured Light Pattern," CRV, May 28-31, 2013, Regina, Saskatchewan.

lightbox

Underwater Camera Calibration Using Wavelength Triangulation

In underwater imagery, the image formation process includes refractions that occur when light passes from water
into the camera housing, typically through a flat glass port. We extend the existing work on physical refraction models by considering the dispersion of light, and derive new constraints on the model parameters for use in calibration. This leads to a novel calibration method that achieves improved accuracy compared to existing work. We describe how to construct a novel calibration device for our method and evaluate the accuracy of the method through synthetic and real experiments.

Reference

T. Yau, M. Gong, and Y.H. Yang, "Underwater Camera Calibration using Wavelength Triangulation," CVPR, June 25-27, 2013, Portland, Oregon, oral presentation.

tank

An Experimental Study of the Influence of Refraction on Underwater 3D Reconstruction using the SVP Camera Model

In an underwater imaging system, a perspective camera is often placed outside a tank or in waterproof housing with a flat glass window. The refraction of light occurs when a light ray passes through the water-glass and air-glass interface, rendering the conventional multiple view geometry based on the single viewpoint (SVP) camera model invalid. While most recent underwater vision studies mainly focus on the challenging topic of calibrating such systems, no previous work has systematically studied the influence of refraction on underwater three-dimensional (3D) reconstruction. This paper demonstrates the possibility of using the SVP camera model in underwater 3D reconstruction through theoretical analysis of refractive distortion and simulations. Then, the performance of the SVP camera model in multi-view underwater 3D reconstruction is quantitatively evaluated. The experimental results reveal a rather surprising and useful yet overlooked fact that the SVP camera model with radial distortion correction and focal length adjustment can compensate for refraction and achieve high accuracy in multi-view underwater 3D reconstruction (within 0.7 mm for an object of dimension 200 mm) compared with the results of land-based systems. Such an observation justifies the use of the SVP camera model in underwater application for reconstructing reliable 3D scenes. Our results can be used to guide the selection of system parameters in the design of underwater 3D imaging setup.

Reference

L. Kang, L. Wu, and Y.H. Yang, "An Experimental Study of the Influence of Refraction on Underwater 3D Reconstruction using the SVP Camera Model," Applied Optics, Vol. 51, Issue 31, pp. 7591-7603. (http://dx.doi.org/10.1364/AO.51.007591)

 

Practical Structure and Motion Recovery from Two Uncalibrated Images using epsilon Constrained Adaptive Differential Evolution

Metric 3D reconstruction from two uncalibrated images involves estimating both the camera parameters and the 3D structure of a scene, which is known to be sensitive to the quality of image correspondences. In this paper, the above problem is recast as a single constrained optimization problem, which can be efficiently solved by a new modified "Constrained Adaptive Differential Evolution ("ADE) optimizer, within which noticeable acceleration of convergence rate has been achieved by incorporating geometrically meaningful evolutionary operations. The proposed approach avoids solving the inverse 3D reconstruction problem by directly searching for the globally optimal 3D structure while satisfying the epipolar geometry and the cheirality constraints. Given a set of outlier affected noisy image correspondences, the camera calibration and scene structure can be simultaneously estimated in our global optimization framework. Extensive experimental validation on both synthetic data and real images were carried out. The performance of our proposed method is compared with that of four well-known fundamental matrix estimation methods, each of which is combined with analytical focal length estimation, optimal triangulation and bundle adjustment optimization. Statistical analysis of the results demonstrates that our method can significantly improve the accuracy of camera calibration and scene reconstruction. The stable, accurate numerical performance as well as fast convergence rate make our method a practical one.

Reference

L. Kang, L. Wu, X. Chen and Y.H. Yang, "Practical Structure and Motion Recovery from Two Uncalibrated Images using Epsilon Constrained Adapative Differential Evolution," Pattern Recognition, to appear. (http://dx.doi.org/10.1016/j.patcog.2012.10.028)

 

Two-view Underwater Structure and Motion for Cameras under Flat Refractive Interfaces

In an underwater imaging system, a refractive interface is introduced when a camera looks into the water-based environment, re-
sulting in distorted images due to refraction. Simply ignoring the refraction effect or using the lens radial distortion model causes erroneous 3D reconstruction. This paper deals with a general underwater imaging setup using two cameras, of which each camera is placed in a separated waterproof housing with a flat window. The impact of refraction is explicitly modeled in the refractive camera model. Based on two new concepts, namely the Ellipse of Refrax (EoR) and Refractive Depth (RD) of a scene
point, we show that provably optimal underwater structure and motion under L1-norm can be estimated given known rotation. The constraint of known rotation is further relaxed by incorporating two-view geometry estimation into a new hybrid optimization framework. Experiments using both synthetic data and real underwater images demonstrate that the proposed method can signi cantly improve the accuracy of camera motion and 3D structure estimation for underwater applications.

Reference

L. Kang, L. Wu, and Y. H. Yang, "Two-view Underwater Structure and Motion for Cameras under Flat Refractive Interfaces," ECCV, 2012. (http://dx.doi.org/10.1007/978-3-642-33765-9_22)

Refractive Epipolar Geometry for Underwater Stereo Matching

A fundamental component of stereo vision is that of epipolar geometry. It tells us that the corresponding point
of a pixel in one image is restricted to a line in another image. This constraint reduces both the complexity of finding stereo correspondences and also the chance of making incorrect matches. When a refractive surface is introduced, which is in
underwater imaging, this constraint no longer holds. Instead, the corresponding point of a pixel in one image is now restricted
to a curve, not a line. In this project, we investigate the impact of a planar refractive interface has on underwater imaging and stereo matching. We address the issue of 3D point projection in a refractive medium, including cases where the refractive interface is not parallel with the camera’s imaging plane. A novel method for calibrating the parameters of a planar refractive interface in the local image space is proposed. We show how to compute the refractive epipolar curve for a pixel, which allows us to generate a matching cost volume that compensates for the effects of refraction. Our experimental results show that our new approach can significantly improve the results of underwater stereo matching over previous approaches.

Reference

J. Gedge, M. Gong, and Y.H. Yang, "Refractive Epipolar Geometry for Underwater Stereo Matching, " Canadian Conference on Computer and Robot Vision 2011, May 25 -May 27, St. John's, Newfoundland. (http://dx.doi.org/10.1109/CRV.2011.26)

 

Background depth Background colour Background confidence

A New Multiview Spacetime-Consistent Depth Recovery Framework for Free Viewpoint Video Rendering

In this project, we develop a new framework for recovering spacetime-consistent depth maps from multiple video sequences captured by stationary, synchronized and calibrated cameras for depth based free viewpoint video rendering. Our two-pass approach is generalized from the recently proposed region-tree based binocular stereo matching method. In each pass, to enforce temporal consistency between successive depth maps, the traditional region-tree is extended into a temporal one by including connections to “temporal neighbor regions” in previous video frames, which are identified using estimated optical flow information. For enforcing spatial consistency, multi-view geometric constraints are used to identify inconsistencies between depth maps among different views which are captured in an inconsistency map for each view. Iterative optimizations are performed to progressively correct inconsistencies through inconsistency maps based depth hypotheses pruning and visibility reasoning. Furthermore, the background depth and color information is generated from the results of the first pass and is used in the second pass to enforce sequence-wise temporal consistency and to aid in identifying and correcting spatial inconsistencies. The extensive experimental evaluations have shown that our proposed approach is very effective in producing spatially and temporally consistent depth maps.

References

C. Lei and Y.H. Yang, " A Region Tree based Image Discrete Labeling Framework," (Invited paper), Handbook of Pattern Recognition and Computer Vision, 4th edition, Editor C.H. Chen, World Scientific Publishing, to appear in 2010.

C. Lei, X. Chen, and Y.H. Yang, "A New Multiview Spacetime-Consistent Depth Recovery Framework for Free Viewpoint Video Rendering, International Conference on Computer Vision, Kyoto, Japan, September 27-October 2, 2009. (http://dx.doi.org/10.1109/ICCV.2009.5459357)

Supplemental materials (21MB)

(code)

Stereo using slanted surface

Local Stereo Matching with 3D Adaptive Cost Aggregation for Slanted Surface Modeling and Sub-Pixel Accuracy

In this project, we develop a new local binocular stereo algorithm which takes into consideration plane fitting at the per-pixel level. Two disparity calculation passes are used. The first pass assumes that the surfaces are fronto-parallel and generates an initial disparity map, from which the disparity plane orientations of all pixels are extracted and refined. In the second pass, the cost aggregation for each pixel is computed along the estimated disparity plane orientations, rather than the fronto-parallel ones. The experimental results show the validity of the approach.

References

M. Gong, Y. Zhang, and Y.H. Yang, "Near-Real-Time Stereo Matching with Slanted Surface Modeling and Sub-pixel Accuracy," Pattern Recognition, Vol. 44, Issues 10-11, pp. 2701-2710. (http://dx.doi.org/10.1016/j.patcog.2011.03.028)

Y. Zhang, M. Gong, and Y.H. Yang, "Local stereo matching with 3D adaptive cost aggregation for slanted surface modeling and sub-pixel accuracy," International Conference on Pattern Recognition, Tampa, Florida, December, 2008. Oral Presentation. (http://dx.doi.org/10.1109/ICPR.2008.4761101)

 

 

Real-time Multi-view Stereo Algorithm using Adaptive Weight Parzen Window and Local Winner-take-all Optimization

In this project, we develop a real-time multi-view stereo algorithm, which is based on local winner-take-all optimization. When computing the disparity maps for a given view, the algorithm performs 3 steps: cost volume generation, cost volume merging, and disparity selection. The main focus of this project is on the second step and a new cost volume merging method is proposed, which combines the adaptive weight and the Parzen window approaches. The proposed method can deal with noise and visibility problems effectively, even with poorly generated cost volumes as input. The experimental results demonstrate the validity of our presented approach.

Reference

Y. Zhang, M. Gong, and Y.H. Yang, "Real-time multi-view stereo algorithm using adaptive weight Parzen window and local winner-take-all optimization," Canadian Conference on Computer and Robotic Vision, Windsor, ON, 2008. Oral Presentation. (http://dx.doi.org/10.1109/CRV.2008.41)

 

photometric stereo

Using a Raster Display for Photometric Stereo

In this project, we develop a new controlled lighting apparatus which uses a raster display device as a light source. The setup has the advantage over other alternatives in that it is relatively inexpensive and uses commonly available components. The apparatus is studied through application to shape recovery using photometric stereo. Experiments on synthetic and real images demonstrate how the depth map of an object can be recovered using only a camera and a computer monitor.

Reference

Funk, N. and Y.H. Yang, “Using a Raster Display for Photometric Stereo,” Canadian Conference on Computer and Robotic Vision, Montreal, 2007, pp. 201-207. Oral Presentation. (http://dx.doi.org/10.1109/CRV.2007.66)

 

Evaluation of Contructable Match Cost Measures for Stereo Correspondence Using Cluster Ranking

The most popular, and arguably the de facto standard, method for evaluating stereo correspondence algorithms is to use the Middlebury online evaluation supplied by Scharstein and Szeliski. To use this tool, a researcher submits a single disparity map calculated for each of four different stereo pairs. For each stereo pair the tool will then calculate the percentage of pixels, in the submitted disparity map, that differ from the ground truth by more than a threshold in three different evaluation regions. An over-all ranking is obtained from the average of all 12 ranks for each algorithm. Since its introduction as a standardized test set, the Middlebury online evaluation has helped foster stereo correspondence research by greatly simplifying the process of comparing a new technique to existing techniques. A problem with this online ranking method, that is addressed by our proposed cluster ranking method, is that using only one disparity map per stereo pair does not allow one to determine whether there are two or more algorithms that produce statistically similar results on the stereo pair.
Furthermore, the over-all ranking provided by the Middlebury online evaluation does not identify which, if any, algorithms have statistically similar performance over-all. Our proposed cluster ranking evaluation method uses statistical significance tests combined with a greedy clustering algorithm to rank stereo algorithms such that only those that produce statistically dissimilar results, according to the statistical test employed, are assigned different ranks. When ranking algorithms by their results from a single stereo image pair, we use the error rates from both images combined with an analysis of variance test (ANOVA) to identify the algorithms that produce statistically similar results. When ranking the rankings of algorithms from many different stereo pairs into a single over-all ranking, the Friedman
test is used instead of ANOVA.

click U of Alberta Stereo Vision Site for more results.

References

D. Neilson and Y.H. Yang, "A Component-wise Analysis of Constructible Match Cost Functions for Global Stereopsis," IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 33, Issue 11, pp. 2147-2159, 2011. (http://dx.doi.org/10.1109/TPAMI.2011.67)

D. Neilson and Yee-Hong Yang, "Evaluation of constructable match cost measures for stereo correspondence using cluster ranking," CVPR, Anchorage, Alaska, June 22-28, 2008. Poster Presentation. (http://dx.doi.org/10.1109/CVPR.2008.4587692)

 

region tree

Region-tree-based Stereo Using Dynamic Programming Optimization

in this project, we propose a new stereo matching algorithm which combines the strengths of the region-based approach and the 2D DP optimization framework. In particular, instead of optimizing a global energy function defined on a 2D pixel-tree structure using DP, a region-tree built on over-segmented image regions is used.  As expected, by incorporating the unique advantages of region-based stereo matching, better performance is achieved. Currently, the performance of this algorithm is ranked within the top 10 by the Middlebury Stereo Vision site.

References

C. Lei, J. Selzer, and Y.H. Yang, “Region-Tree based Stereo using Dynamic Programming Optimization, Proc. IEEE Conf. on Computer Vision and Pattern Recognition, New York, NY: June 17-22, 2006, pp. 2378-2385. Poster Presentation (Ranked 6 on the Middlebury Stereo Website in 2006.) (http://dx.doi.org/10.1109/CVPR.2006.251)

C. Lei and Y.H. Yang, " A Region Tree based Image Discrete Labeling Framework," (Invited paper), Handbook of Pattern Recognition and Computer Vision, 4th edition, Editor C.H. Chen, World Scientific Publishing, 2010.

 

 

Near Real-Time Reliable Stereo Matching Using Programmable Graphics Hardware

A near-real-time stereo matching technique is presented in this paper, which is based on the reliabilitybased dynamic programming algorithm we proposed
earlier. The new algorithm can generate semi-dense disparity maps using only two dynamic programming passes, while our previous approach requires 20~30
passes. We also implement the algorithm on programmable graphics hardware, which further improves the processing speed. The experiments on the four Middlebury stereo datasets show that the new algorithm can produce dense (>85% of the pixels) and reliable (error rate<0.3%) matches in near real-time (0.05~0.1sec). If needed, it can also be used to generate dense disparity maps. Based on the evaluation conducted by the Middlebury Stereo Vision Research website, the new algorithm is ranked between the variable window and the Graph Cuts approaches and currently is the most accurate dynamic programming based technique. When more than one reference images are available, the accuracy can be further improved with little extra computation time.

Reference

M. Gong and Y.-H. Yang, “Near Real-Time Reliable Stereo Matching Using Programmable Graphics Hardware,” Proc. IEEE Conf. on Computer Vision and Pattern Recognition, San Diego, CA, June 20-25, 2005, pp. 924-931. Poster Presentation. (http://dx.doi.org/10.1109/CVPR.2005.246)

 

 

Fast Unambiguous Stereo Matching Using Reliability-based Dynamic Programming

An efficient unambiguous stereo matching technique is presented in this paper. Our main contribution is to introduce a new reliability measure to dynamic programming approaches in general. For stereo vision application, the reliability of a proposed match on a scanline is defined as the cost difference between the globally best disparity assignment that includes the match, and the globally best assignment that does not include the match. A reliability-based dynamic programming algorithm is derived accordingly, which can selectively assign disparities to pixels when the corresponding reliabilities exceed a given threshold. The experimental results show that the new approach can produce dense (>70% of the unoccluded pixels) and reliable (error rate<0.5%) matches efficiently (<0.2 sec on a 2GHz P4) for the four Middlebury stereo datasets.

References

M. Gong and Y.H. Yang, “Fast Unambiguous Stereo Matching Using Reliability-based Dynamic Programming,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 27, 2005, pp. 998-1003. (http://dx.doi.org/10.1109/TPAMI.2005.120)

M. Gong and Y.H. Yang, “Fast stereo matching using reliability-based dynamic programming and consistency constraints,” Proceedings of International Conference on Computer Vision, Nice, France, October 13-16, 2003, pp. 610-617. Poster Presentation. (http://dx.doi.org/10.1109/ICCV.2003.1238404)

 

 

Genetic-based Stereo and Disparity Map Evaluation

In this project, a new genetic-based stereo algorithm is presented. Our motivation is to improve the accuracy of the disparity map by removing the mismatches caused by both occlusions and false targets. In our approach, the stereo matching problem is considered as an optimization problem. The algorithm first takes advantage of multi-view stereo images to detect occlusions, and therefore, removes mismatches caused by visibility problems. By optimizing the compatibility between corresponding points and the continuity of the disparity map using a genetic algorithm, mismatches caused by false targets are removed. The quadtree structure is used to implement the multi-resolution framework. Since nodes at different level of the quadtree cover different number of pixels, selecting nodes at different levels gives a similar effect as adjusting the window size at different locations of the image. The experimental results showthat our approach can generate more accurate disparity maps than two existing approaches. In addition, we introduce a new disparity map evaluation technique, which is developed based on a similar technique employed in the image segmentation area. Comparing with two existing evaluation approaches, the new technique can evaluate the disparity maps generated without additional knowledge of the scene, such as the correct depth information or novel views.

Reference

M. Gong and Y.H. Yang."Genetic-based stereo and disparity map evaluation.''  International Journal of Computer Vision 47 (2002): 63-77. (http://dx.doi.org/10.1023/A:1014529404956)