Tracking Manipulation Tasks (TMT)


This dataset contains 100 videos of daily tasks that humans do. The purpose is to have a standard dataset that covers a wide range of challenges that a tracker would face, if used in a manipulation setup.

We record videos using both a human user and a robot arm. The videos are of oriented motion videos (object motion parallel to the image plane, motion around object axis and rotation around object axis) All videos are recorded at 30 fps and are tagged with the corresponding challenge(s) that it presents. All videos have publicly available ground truth files. Following "downloads" you can choose from downloading .jpg files and a tracking data file of the corresponding speed and light condition you need.

We evaluate some of the trackers existing in literature. These trackers are IVT [David Ross et. al], TLD [Kalal et. al], L1 [Mei et. al], ESM [Malis et. al], IC [Baker et. al] and NNIC [Travis et. al]. We thoughtflly chose 3 state of the art online learned trackers and 3 regsitration based trackers and subsequently analyse the results. The wide range of motion that this dataset presents brings out the usability of a tracker for manipulation. Evaluation scripts and codes of the trackers are also provided.

Test Sequences and Ground Truth

Oriented Motion Tasks

  • Eight Video Categories
  • Simple structured motion
  • Tasks come in five different speeds plus one with increasing speed during motion.
  • The videos were recorded in two light conditions:
    • Normal light (unmodified office light)
    • Diffuse light (screen kept out direct light) with less reflections on the objects.
  • [ Detailed Dataset Page] [Sample Videos]

Composite Motion Tasks

  • Four video categories
  • Complex motion
  • Task has varying speed
  • The videos were recorded in two light conditions:
    • Normal light (unmodified office light)
    • Diffused light (screen kept out direct light) with less reflections on the objects.
  • [Detailed Dataset Page] [Sample Videos]

Robot Recorded Oriented Motion Tasks

Fine Manipulation Tasks

  • This includes sequences for the following fine manipulation tasks in both eye-in-hand and eye-to-hand configurations:
  • 1) Inserting a thread into a fishing lure.
    2) Inserting a rivet into an industrial part.
    3) Inserting a key into a lock.
  • Each task is performed at two speeds and each configuration has two cameras resulting in a total of 24 sequences.
  • Ground truth is currently available only for the eye-in-hand sequences.
  • Recorded using a 4 d.o.f WAM Arm with a hand gimbal tele-operating another 7 d.o.f. WAM Arm using a Barett Hand.
  • [Paper with detailed description] [Download]

Codes and Usage Information

  • Codes for the following measures with their USAGE instructions are posted
  • Overall Success
  • Average Drift
  • Speed Sensitivity
  • [CODES]

Evaluation Results

Further details about recording the dataset and analysis are contained in the technical report. [TECHNICAL REPORT]

If you are using this dataset or the associated codes please cite the following,
  title={Tracking Benchmark and Evaluation for Manipulation Tasks},
  author={Roy, Ankush and Zhang, Xi and Wolleb, Nina and Perez, Quenterio, Camilo and Jagersand, Martin},
  booktitle={International Conference on Robotics and Automation},

CONTACT: Martin Jagersand