Modular Tracking Framework

A Highly Efficient and Extensible Library for Registration based Visual Tracking

Main Features

Fully modular implementation that is easy to extend
Based on Eigen for speed and simplicity
Python and MATLAB interfaces for ease of use
Ability to run as a stand alone library
Seamless integration with ROS
Cross platform support

Fast and high precision visual tracking is crucial to the success of several robotics and virtual reality applications like SLAM, autonomous navigation and visual servoing. In recent years, online learning and detection based trackers have been more popular in the vision community due to their robustness to changes in the object's appearance which makes them better suited to long term tracking. However, these are often unsuitable for the aforementioned applications for two reasons. Firstly, they are too slow to allow real time execution of tasks where multiple trackers have to be run simultaneously or tracking is only a small part of a larger system with more computationally intensive modules that use its result to make higher level deductions about the environment. Secondly, they are not precise enough to give the exact object pose with sub pixel alignment required for these tasks, being usually limited to the estimation of simple transformations of the target patch such as translation and scaling. As a result, registration based trackers are more suitable for these applications as being several times faster and capable of estimating more complex transformations like affine and homography.

Though several major advances have been made in this domain since the original Lucas Kanade tracker was introduced almost thirty five years ago, yet efficient open source implementations of recent trackers are surprisingly difficult to find. In fact, the only such tracker offered by the popular OpenCV library, uses a pyramidal implementation of the original algorithm. In the absence of good open source implementations of modern trackers, most robotics and VR research groups either use these out dated trackers or implement their own custom trackers. These, in turn, are often not made publicly available or are tailored to suit very specific needs and so require significant reprogramming to be useful for an unrelated project. To address this requirement, we introduce Modular Tracking Framework (MTF) - a generic system for registration based tracking that provides highly efficient implementations for a large subset of trackers introduced in literature to date and is designed to be easily extensible with additional methods.

Each tracker within this framework comprises the following 3 modules:

Please refer this paper for more details on the system design and this one for some preliminary results. The latter was published at CRV 2016 while the former has been accepted at IROS 2017. There is also this newer results paper that was published at WACV 2017. Finally, the complete thesis based on this framework is available here if even more details are needed. This can also be used as the official documentation till the Doxygen version is completed. A complete list of related papers is also given below.

The library is implemented entirely in C++ though interfaces for Python and MATLAB are also provided to aid its use in research applications. A simple interface for ROS is likewise provided for seamless integration with robotics projects. Finally, MTF comes bundled with several state of the art learning and detection based trackers whose C++ implementations are publicly available including DSST, KCF, CMT, TLD, RCT, MIL, Struck, FragTrack, GOTURN and DFT. As s result, combined with the datasets provided below, MTF can serve as a great test bed for general purpose tracking too. We are always looking to add more such trackers to MTF so please let us know if there is a tracker with open source C++ implementation that you would like to see integrated.

MTF supports both Unix and Windows platforms. Though it has been tested comprehensively only under Linux, specifically Ubuntu 14.04, it should work on Macintosh systems too. The Windows build system is in its early stages and needs some manual setting of variables but it is quite usable and we are working on making it more user friendly.

MTF is provided under BSD license and so is free for research and commercial applications. We do request, however, that this paper be cited by any publications resulting from projects that use MTF so more people can get to know about and benefit from it.

Git Repository:


Following is an extended version of the IROS supplementary video showing several usage examples:


  • Abhineet Singh and Martin Jagersand, "Modular Tracking Framework: A Fast Library for High Precision Tracking", IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), September 2017 [pdf][video]
  • Xuebin Qin, Shida He, Camilo Alfonso Perez Quintero, Abhineet Singh, Masood Dehghan, Martin Jagersand, "Real-Time Salient Closed Boundary Tracking via Line Segments Perceptual Grouping ", IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), September 2017 [pdf][video]
  • Lin Chen, Fan Zhou, Yu Shen, Xiang Tian, Haibin Ling and Yaowu Chen, "Illumination Insensitive Efficient Second-order Minimization for Planar Object Tracking", in the IEEE International Conference on Robotics and Automation (ICRA), June 2017 [pdf]
  • Mennatullah Siam, Abhineet Singh, Camilo Perez and Martin Jagersand, "4-DoF Tracking for Robot Fine Manipulation Tasks", in the 14th Conference on Computer and Robot Vision (CRV), May 2017 [pdf]
  • Abhineet Singh, "Modular Tracking Framework: A Unified Approach to Registration based Tracking", MSc Thesis, March 2017 [pdf] [ppt]
  • Abhineet Singh, Mennatullah Siam and Martin Jagersand, "Unifying Registration based Tracking: A Case Study with Structural Similarity", in the Winter Conference on Applications of Computer Vision (WACV), March 2017 [pdf] [supplementary] [ppt] [poster]
  • Vincent Zhang, "PCA based appearance model for tracking", Project Report, 2016 [pdf]
  • Mennatullah Siam, "CNN Based Appearance Model with Approximate Nearest Neigbour Search", Project Report, 2016 [pdf]
  • Abhineet Singh, Ankush Roy, Xi Zhang and Martin Jagersand, "Modular Decomposition and Analysis of Registration based Trackers", in 13th Conference on Computer and Robot Vision (CRV), 2016, pp.85-92, June 2016 [pdf] [ppt]
  • Abhineet Singh, "Hessian after Convergence: A New Perspective on Lucas Kanade Tracking", Report, November 2015 [pdf]
  • Xi Zhang, Abhineet Singh and Martin Jagersand, "RKLT: 8 DOF Real-Time Robust Video Tracking Combing Coarse Ransac Features and Accurate Fast Template Registration," in 12th Conference on Computer and Robot Vision (CRV), 2015, pp.70-77, June 2015 [pdf]


Several publicly available tracking datasets with full ground truth formatted to work with MTF out of the box are also made available here for convenience:

  • TMT: 109 sequences with 70592 frames - Download (6.64 GB), Only Ground Truth (1472 KB)
  • UCSB: 96 sequences with 6889 frames - Download (211 MB), Only Ground Truth (188 KB)
  • LinTrack: 3 sequences with 12477 frames - Download (379 MB), Only Ground Truth (288 KB)
  • PAMI: 28 sequences with 16511 frames - Download (1.70 GB), Only Ground Truth (474 KB)
  • TFMT: 24 sequences with 3682 frames (ground truth only for eye-in-hand sequences) - Download (466 MB), Only Ground Truth (47 KB)
  • PTW: 210 sequences with 105210 frames - Download (6.24 GB), Only Ground Truth (2.78 MB)
    • Sequences are in the form of avi video files rather than jpg images to save space; set img_source to m and seq_fmt to avi in mtf.cfg before using this dataset in MTF
    • The creators of this dataset have only provided ground truth for every other frame (all even numbered frames along with the first frame that is used for initialization);
    • A very high performing tracker has been used to fill in the ground truth for unlabeled frames but some frames that feature too much occlusion or too large inter-frame motion are virtually untrackable so the ground truth is not correct for these frames; we are working to correct these manually;
  • METAIO: 40 sequences with 48000 frames (incomplete ground truth) - Download (451 MB)
  • CMT: 20 sequences with 25525 frames - Download (885 MB), Only Ground Truth (359 KB)
  • VOT 2014: 25 sequences with 10442 frames - Download (433 MB), Only Ground Truth (197 KB)
  • VOT 2016: 60 sequences with 21455 frames - Download (1.25 GB), Only Ground Truth (550 KB)
  • VTB: 100 sequences with 59305 frames - Download (2.61 GB), Only Ground Truth (664 KB)
  • VIVID: 9 sequences with 16277 frames - Download (493.6 MB) (Ground truth only available for every 10th frame)
  • UAV: 1 sequence with 420 frames taken from a UAV flying over an area along with a large satellite image of the same area - Download (12.2 MB)
  • Synthetic Datasets: generated using generateSyntheticSeq application in MTF and feature 25 different objects, 2, 3, 4, 6 and 8 DOF warping, Gaussian noise and RBF illumination changes:
    • 8 DOF warping: 750 sequences with 300000 frames (both with and without noise and illumination change) - Download (8.2 GB)
    • 6 DOF warping: 250 sequences with 100000 frames (all with illumination change and noise) - Download (2.1 GB)
    • 4 DOF warping: 250 sequences with 100000 frames (all with illumination change and noise) - Download (2.1 GB)
    • 3 DOF warping: 250 sequences with 100000 frames (all with illumination change and noise) - Download (2.1 GB)
    • 2 DOF warping: 250 sequences with 100000 frames (all with illumination change and noise) - Download (2.1 GB)
    • RBF illumination change: 1250 sequences with 500000 frames (2, 3, 4, 6 and 8 DOF warping) - Download (12 GB)
Just download the zip files and extract them inside the folder set in the db_root_path configuration parameter and you are good to go !