Reading group in computer vision and robotics

Meetings :

Thursdays at 1pm, CSC 349

Contact :

Martin Jagersand, jag[at]cs.ualberta.ca
Dana Cobzas, dana[at]cs.ualberta.ca

The reading group will cover theory and practical applications related to robotics and computer vision. At each meeting 1 paper will be presented and discussed in about 1 hour. Anyone who is interested is welcome to attend. Send us an e-mail if you are like to attend and we will include you on the list.

Date	Topic	Papers	Presenter
Thu, July 12, 2007	Tracking : SSD	[Benhimane, Malis 07]	Martin Jagersand
Thu, July 19, 2007	Tracking : SSD	[Baker TR04]	Dana Cobzas
Thu, July 26, 2007	Tracking : Forward linear motion model	[Jurie,Dhome 02] [Lapreste 04]	Cam Upright Azad Shademan
Thu, August 2, 2007	Tracking : Feature based	[Pressigout, Marchand 07]	Azad Shademan
Thu, August 9, 2007	Monocular SLAM	[Williams 07]	Adam Rachmielowski
Thu, August 16, 2007	Fast SLAM	[Eade Drummond 06]	Kevin Olsen
Thu, August 23, 2007	Lie Algebra in Visual Tracking and Servoing	[Drummond & Cipolla 99]	Neil Birkbeck
Thu, October 4, 2007	Radiometry, light and reflectance models	[Dana's Radiometry Tutorial]	Dana Cobzas
Thu, October 11, 2007	Inverse light	[Okabe CVPR04]	Cam Upright
Thu, October 18, 2007	AI in robotics	[Ziemke98]	Amir massoud Farahmand
!Wed Oct24, 4:30pm! CSC 349	Learning Visual-Motor functions Robots in Human environments	IROS practice talk [Edslinger06]	Azad Shademan Martin Jagersand
Thu, Nov 1, 2007	Human pose	[Balan CVPR07]	Neil Birkbeck

Visual Tracking

In visual tracking motion information from a video sequence is distilled and unified to determine pose parameters of a moving camera or object. One way to classify tracking is into : (a) registration-based tracking, (b) feature-based tracking and (c) segmentation-based tracking.

Registration-based tracking makes use of only image intensity information and estimate the movement or deformation parameters by comparing the current image with a reference template. Often sum-of-square differences is minimized, giving the technique the popular name of SSD tracking. While early approaches were using inefficient brute search techniques, modern methods are based on numerical optimization, where a search direction is obtained from image derivatives. The first such methods require spatial image derivatives to be computed for each frame when 'forward' warping the reference patch to the current image. More efficient 'inverse' algorithms have been developed, which allow for real time tracking of a 6D affine patch [Hager PAMI98] or a 8D homography patch [Baker IJCV04]. [Benhimane, Malis 07] proposed a SSD tracking algorithm based on a second order minimization method (the ESM method) that has a high converge rate like the Newton method but does not require the computation of the Hessian. A related approach is proposed by [Jurie,Dhome 02] where instead of using spatial image derivatives, a linear basis of test image movements are used to explain the current frame, has proven equally efficient as the inverse methods during the tracking, but suffer from much longer initialization times to compute the basis, and a heuristic choice of the particular test movements. A discussion of possible extensions to 3D data (3D warps) or projection of 3D data in images (3D-2D warps) is presented in [Baker TR04].

In feature-based tracking a feature detector is used to locate the image projection of either special markers or natural image features. Then a 3D pose computation can be done by relating 2D image feature positions with their 3D model. Many approaches use image contours (edges or curves) that are matched with an a-priori given CAD model of the object [Drummond, Cipolla 02]. Most systems compute pose parameters by linearizing with respect to object motion [Pressigout, Marchand 07]. A characteristic of these algorithms is that the feature detection is relatively decoupled from the pose computation. Therefore they are quite sensitive to feature detection and cannot be applied to complex images.

In segmentation based tracking some pixel or area based property (e.g. color, texture) is used to binarize an image. Then the centroid and possibly higher moments of connected regions are computed. While the centroid and moments are sufficient to measure 2D image properties, it is typically not used for precise 3D tracking alone, but can be used to initialize more precise tracking modalities [Toyama IJCV99].

SSD Tracking

[Baker IJCV04] S. Baker and I. Matthews: Lucas-Kanade 20 Years On: A Unifying Framework, IJCV 2004

[Baker TR04] S. Baker, R. Patil, G. Cheung and I. Matthews: Lucas-Kanade 20 Years On: Part 5, CMU-RI-TR-04-64, 2004

[Hager PAMI98] Hager, G.D. and Belhumeur, P.N.: Efficient Region Tracking with Parametric Models of Geometry and Illumination, PAMI 1998

[Benhimane, Malis 07] Benhimane, S. and Malis, E.: Homography-based 2D Visual Tracking and Servoing, International Journal of Robotics Research, 2007

[Jurie,Dhome 02] Jurie, F. and Dhome, M.: Hyperplane Approximation for Template Matching, PAMI, 2002

[Lepetit 05] V. Lepetit, P. Lagger and P. Fua: Randomized Trees for Real-Time Keypoint Recognition, CVPR 2005 (similar idea with [Jurie,Dhome 02] for feature matching in the context of object recognition.

Feature-based tracking

[Pressigout, Marchand 07] M. Pressigout, E. Marchand : Real-time hybrid tracking using edge and texture information, International Journal of Robotics Research, 2007 (model-free and model-based)

[Drummond, Cipolla 02] Drummond, T. and Cipolla, R. : Real-time visual tracking of complex structures, PAMI, 2002 (model-based, edge features)

Segmentation-based tracking

[Toyama IJCV99] Toyama, K. and Hager, G.D.: Incremental Focus of Attention for Robust Vision-Based Tracking, IJCV 1999

Monocular SLAM

In simultaneous localisation and mapping (SLAM) we estimate a mobile robot's pose and simultaneously build a map of the environment it is navigating. The problem is formulated in a Bayesian framework where noisy measurements are integrated over time to create a probability distribution of the state / process (landmark positions and pose/motion parameters). In general, SLAM may rely on various sensors for making measurements including: lasers, sonar, stereo vision, GPS (global positioning system), IMU (inertial measurement unit), and odometry. Monocular SLAM tackles the problem with only a single camera (typically hand-held) as a sensor.

One of the first recent systems [Davison 03] to implemented real-time monocular SLAM used an Extended Kalman Filter (EKF) to propagate estimates and uncertainty over time. Maintaining a complete covariance matrix, which includes correlations between feature positions, in the EKF permits a small number of measurements to update the entire state, leading to a stable system with few measurements and enabling implicit loop-closing (an important sub-topic in SLAM). However, the multi-variate gaussian distributions used in the EKF lack resilience to the type of erratic camera motions common in hand-held systems. Moreover, the cost of filter updates is quadratic in the size of the state (number of camera and feature parameters), resulting in poor scaling as the number of mapped features increases.

To provide resilience to erratic motion, a particle-filter-based system was proposed [Pupilli Calway 06], in which the camera and each feature has a set of particles representing its distribution. This approach models multi-modal distributions, allowing implicit multiple hypotheses of the state estimate resulting not only in resilience to erratic motion, but improved handling of occlusions. Although the complexity of filter updates is linear, in practice a very large number of particles are required to ensure convergence and current real-time systems are limited to a small number of features. Moreover, updates occur only for those features that are measured.

To provide good scaling without relying on submapping or postponement methods, the popular FastSLAM 2.0 algorithm was applied to the monocular SLAM problem [Eade Drummond 06]. Complexity is reduced from O(N²) to O(M logN), by modeling the camera distribution by a set of M particles and each feature with an independent EKF. This approach provides the most scalable system; experiments have shown hundreds of features mapped in real-time. Although they provide advantages over the EKF approach, these particle filter approaches do not propagate correlation between features. Only measured features are updated, so loop closing must be handled explicitly.

[Davison 03] Andrew J. Davison, Real-Time Simulataneous Localisation and Mapping with a Single Camera, ICCV 2003.

[Pupilli Calway 06] Mark Pupilli and Andrew Calway. Real-time Visual SLAM with Resilience to Erratic Motion. In: IEEE Computer Vision and Pattern Recognition, June 2006.

[Eade Drummond 06] Eade, E., Drummond, T. : Scalable Monocular SLAM, CVPR 2006

[Williams 07] B. Williams, P. Smith, I. Reid: Automatic Relocalisation for a Single-Camera Simultaneous Localisation and Mapping System, ICRA 2007

[Montier 06] J.M.M. Montiel, Javier Civera and A.J. Davison: Unified Inverse Depth Parametrization for Monocular SLAM, RSS 2006

Robotics

[Ziemke98] Tom Ziemke, "Adaptive Behavior in Autonomous Agents," Presence, Vol. 7, No. 6, 1998

[Edslinger06] Edsinger, Aaron and Kemp, Charles. "Manipulation in Human Environments", Proceedings of the IEEE/RSJ International Conference on Humanoid Robotics, 2006. [PDF] (Best Paper Award)

Visual Servoing

[Lapreste 04] JT. Lapreste, F. Jurie, M. Dhome, F. Chaumette. An Efficient Method to Compute the Inverse Jacobian Matrix in Visual Servoing. In IEEE Int. Conf. on Robotics and Automation, ICRA'04, Volume 1, Pages 727-732, New Orleans, LA, April 2004.

Lie Algebra in Visual Servoing and Visual Tracking

Common transformation groups (e.g., Euclidean, similarity, full homography) can be parameterized locally using their Lie algebra. This parameterization is benificial when using a local method to optimize a cost function w.r.t. the parameters of the transformation while requiring the resulting transformation to remain in the group. Drummond & Cipolla use this parameterization in a Visual servoing application, showing that it improves convergence range using a fixed Jacobian [Drummond & Cipolla 99], and others have shown its effectiveness in the case of registration-based tracking [Benhimane, Malis 07].

[Drummond & Cipolla 99] T. Drummond and R. Cipolla. Visual tracking and control using Lie algebras. In IEEE Int. Conf. on Computer Vision and Pattern Recognition, volume 2, pages 652-657, Fort Collins, Colorado, June 1999.

[Drummond & Cipolla 00] Tom Drummond and Roberto Cipolla. Application of lie algebras to visual servoing. International Journal of Computer Vision, 37(1):21--41, June 2000.

Radiometry and inveres light

[Okabe CVPR04] Okabe T, Sato I, Sato Y Spherical harmonics vs. Haar wavelets: basis for recovering illumination from cast shadows, CVPR 2004