Modular Tracking Framework

A Highly Efficient and Extensible Library for Registration based Visual Tracking

Main Features

Fully modular implementation that is easy to extend
Based on Eigen for speed and simplicity
Provides a Python interface for ease of use
Ability to run as a stand alone library
Seamless integration with ROS
Cross platform support

Fast and high precision visual tracking is crucial to the success of several robotics and virtual reality applications like SLAM, autonomous navigation and visual servoing. In recent years, online learning and detection based trackers have been more popular in the vision community due to their robustness to changes in the object's appearance which makes them better suited to long term tracking. However, these are often unsuitable for the aforementioned applications for two reasons. Firstly, they are too slow to allow real time execution of tasks where multiple trackers have to be run simultaneously or tracking is only a small part of a larger system with more computationally intensive modules that use its result to make higher level deductions about the environment. Secondly, they are not precise enough to give the exact object pose with sub pixel alignment required for these tasks, being usually limited to the estimation of simple transformations of the target patch such as translation and scaling. As a result, registration based trackers are more suitable for these applications as being several times faster and capable of estimating more complex transformations like affine and homography.

Though several major advances have been made in this domain since the original Lucas Kanade tracker was introduced almost thirty five years ago, yet efficient open source implementations of recent trackers are surprisingly difficult to find. In fact, the only such tracker offered by the popular OpenCV library, uses a pyramidal implementation of the original algorithm. In the absence of good open source implementations of modern trackers, most robotics and VR research groups either use these out dated trackers or implement their own custom trackers. These, in turn, are often not made publicly available or are tailored to suit very specific needs and so require significant reprogramming to be useful for an unrelated project. To address this requirement, we introduce Modular Tracking Framework (MTF) - a generic system for registration based tracking that provides highly efficient implementations for a large subset of trackers introduced in literature to date and is designed to be easily extensible with additional methods.

Each tracker within this framework comprises the following 3 modules:

Please refer this paper for more details on the system design and this one for some preliminary results. The latter was published at CRV 2016. The former has been submitted to IROS 2017. There is also this newer results paper that has been accepted at WACV 2017. Finally, the complete thesis based on this framework is available here if even more details are needed. This can also be used as the official documentation till the Doxygen version is completed. A complete list of related papers is also given below.

The library is implemented entirely in C++ though a Python interface called pyMTF also exists and works seamlessly with our Python Tracking Framework. A Matlab interface similar to Mexvision is currently under development too.

We also provide a simple interface for ROS called mtf_bridge (also present in the ROS sub folder) for seamless integration with robotics applications. Finally, MTF comes bundled with several state of the art learning and detection based trackers whose C++ implementations are publicly available including DSST, KCF, CMT, TLD, RCT, MIL, Struck, FragTrack, GOTURN and DFT. As s result, combined with the datasets provided below, MTF can serve as a great test bed for general purpose tracking too. We are always looking to add more such trackers to MTF so please let us know if there is a tracker with open source C++ implementation that you would like to see integrated.

MTF supports both Unix and Windows platforms. Though it has been tested comprehensively only under Linux, specifically Ubuntu 14.04, it should work on Macintosh systems too. The Windows build system is in its early stages and needs some manual setting of variables but it is quite usable and we are working on making it more user friendly.

MTF is provided under BSD license and so is free for research and commercial applications. We do request, however, that this paper be cited by any publications resulting from projects that use MTF so more people can get to know about and benefit from it.

Git Repository:


Following is an extended version of the IROS supplementary video showing several usage examples:


  • Xi Zhang, Abhineet Singh and Martin Jagersand, "RKLT: 8 DOF Real-Time Robust Video Tracking Combing Coarse Ransac Features and Accurate Fast Template Registration," in 12th Conference on Computer and Robot Vision (CRV), 2015, pp.70-77, June 2015 [pdf]
  • Abhineet Singh, "Hessian after Convergence: A New Perspective on Lucas Kanade Tracking", Report, November 2015 [pdf]
  • Abhineet Singh, Ankush Roy, Xi Zhang and Martin Jagersand, "Modular Decomposition and Analysis of Registration based Trackers", in 13th Conference on Computer and Robot Vision (CRV), 2016, pp.85-92, June 2016 [pdf] [ppt]
  • Mennatullah Siam, "CNN Based Appearance Model with Approximate Nearest Neigbour Search", Project Report, 2016 [pdf]
  • Vincent Zhang, "PCA based appearance model for tracking", Project Report, 2016 [pdf]
  • Abhineet Singh, Mennatullah Siam and Martin Jagersand, "Unifying Registration based Tracking: A Case Study with Structural Similarity", in the Winter Conference on Applications of Computer Vision (WACV), March 2017 [pdf] [supplementary] [ppt] [poster]
  • Mennatullah Siam, Abhineet Singh, Camilo Perez and Martin Jagersand, "4-DoF Tracking for Robot Fine Manipulation Tasks", in the 14th Conference on Computer and Robot Vision (CRV), May 2017 [pdf]
  • Abhineet Singh, "Modular Tracking Framework: A Unified Approach to Registration based Tracking", MSc Thesis, March 2017 [pdf] [ppt]
  • Abhineet Singh and Martin Jagersand, "Modular Tracking Framework: A Fast Library for to High Precision Tracking", submitted to the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), September 2017 [pdf]


Several publicly available tracking datasets with full ground truth formatted to work with MTF out of the box are also made available here for convenience:

  • TMT: 109 sequences with 70592 frames - Download (6.64 GB), Only Ground Truth (1472 KB)
  • UCSB: 96 sequences with 6889 frames - Download (211 MB), Only Ground Truth (188 KB)
  • LinTrack: 3 sequences with 12477 frames - Download (379 MB), Only Ground Truth (288 KB)
  • PAMI: 28 sequences with 16511 frames - Download (1.70 GB), Only Ground Truth (474 KB)
  • TFMT: 24 sequences with 3682 frames (ground truth only for eye-in-hand sequences) - Download (466 MB), Only Ground Truth (47 KB)
  • METAIO: 40 sequences with 48000 frames (incomplete ground truth) - Download (451 MB)
  • CMT: 20 sequences with 25525 frames - Download (885 MB), Only Ground Truth (359 KB)
  • VOT 2014: 25 sequences with 10442 frames - Download (433 MB), Only Ground Truth (197 KB)
  • VOT 2016: 60 sequences with 21455 frames - Download (1.25 GB), Only Ground Truth (550 KB)
  • VTB: 100 sequences with 59305 frames - Download (2.61 GB), Only Ground Truth (664 KB)
  • VIVID: 9 sequences with 16277 frames - Download (493.6 MB) (Ground truth only available for every 10th frame)
  • Synthetic Datasets: generated using generateSyntheticSeq application in MTF and feature 25 different objects, 2, 3, 4, 6 and 8 DOF warping, Gaussian noise and RBF illumination changes:
    • 8 DOF warping: 750 sequences with 300000 frames (both with and without noise and illumination change) - Download (8.2 GB)
    • 6 DOF warping: 250 sequences with 100000 frames (all with illumination change and noise) - Download (2.1 GB)
    • 4 DOF warping: 250 sequences with 100000 frames (all with illumination change and noise) - Download (2.1 GB)
    • 3 DOF warping: 250 sequences with 100000 frames (all with illumination change and noise) - Download (2.1 GB)
    • 2 DOF warping: 250 sequences with 100000 frames (all with illumination change and noise) - Download (2.1 GB)
    • RBF illumination change: 1250 sequences with 500000 frames (2, 3, 4, 6 and 8 DOF warping) - Download (12 GB)
Just download the zip files and extract them inside the folder set in the db_root_path configuration parameter and you are good to go !