MOTS: Multi-Object Tracking and Segmentation


News: We are hosting a workshop with 3 challenges at CVPR 2020


MOTS: Multi-Object Tracking and Segmentation


Video showing annotations and baseline results


mots tools on github

TrackR-CNN code on github

Annotation Format

We provide two alternative and equivalent formats, one encoded as png images, and one encoded as txt files. The txt files are smaller, and faster to be read in, but the cocotools are needed to decode the masks. For code to read the annotations also see mots_tools/blob/master/mots_common/io.py

Note that in both formats an id value of 10,000 denotes an ignore region and 0 is background. The class id can be obtained by floor divison of the object id by 1000 (class_id = obj_id // 1000) and the instance id can be obtained by the object id modulo 1000 (instance_id = obj_id % 1000). The object ids are consistent over time.

The class ids are the following

car 1
pedestrian 2

png format

The png format has a single color channel with 16 bits and can for example be read like this:

import PIL.Image as Image
img = np.array(Image.open("000005.png"))
obj_ids = np.unique(img)
# to correctly interpret the id of a single object
obj_id = obj_ids[0]
class_id = obj_id // 1000
obj_instance_id = obj_id % 1000

When using a TensorFlow input pipeline for reading the annotations, you can use

ann_data = tf.read_file(ann_filename)
ann = tf.image.decode_image(ann_data, dtype=tf.uint16, channels=1)

txt format

Each line of an annotation txt file is structured like this (where rle means run-length encoding from COCO):

time_frame id class_id img_height img_width rle

An example line from a txt file:

52 1005 1 375 1242 WSV:2d;1O10000O10000O1O100O100O1O100O1000000000000000O100O102N5K00O1O1N2O110OO2O001O1NTga3

Which means

time frame 52
object id 1005 (meaning class id is 1, i.e. car and instance id is 5)
class id 1
image height 375
image width 1242
rle WSV:2d;1O10000O10000O1O100O100O1O100O1000000000000000O100O...1O1N

image height, image width, and rle can be used together to decode a mask using cocotools.



Images (hosted on original KITTI webpage, train+val+test)

Annotations in png format (train+val)

Annotations in txt format (train+val)

TrackR-CNN detections (train+val)

TrackR-CNN tracking result (val)

Split/seqmap into train, val, test, and fulltrain (train+val). You can use the validation data to train for producing the testset results. Note that the test set sequence ids start from 0 as well, but they are different sequences.


Recommended: Images+annotations bundled

Images (hosted on original MOTChallenge webpage)

Annotations in png format

Annotations in txt format

TrackR-CNN detections

TrackR-CNN tracking result (leaving 1 out cross validation)

Detections for Tracking Only Challenge 2020

Tracking Only Challenge Detections

Copyright for the Annotations

Creative Commons License

Annotations on this page are published under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License. This means that you must attribute the work in the manner specified by the authors, you may not use this work for commercial purposes and if you alter, transform, or build upon this work, you may distribute the resulting work only under the same license.


If you use our annotations, please cite

author = {Paul Voigtlaender and Michael Krause and Aljo\u{s}a O\u{s}ep and Jonathon Luiten
           and Berin Balachandar Gnana Sekar and Andreas Geiger and Bastian Leibe},
title = {{MOTS}: Multi-Object Tracking and Segmentation},
booktitle = {CVPR},
year = {2019},

and also the original datasets:

  author = {Andreas Geiger and Philip Lenz and Raquel Urtasun},
  title = {Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite},
  booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},   year = {2012}
    title = {{MOT}16: {A} Benchmark for Multi-Object Tracking},
    shorttitle = {MOT16},
    url = {http://arxiv.org/abs/1603.00831},
    journal = {arXiv:1603.00831 [cs]},
    author = {Milan, A. and Leal-Taix\'{e}, L. and Reid, I. and Roth, S. and Schindler, K.},
    month = mar,
    year = {2016},
    note = {arXiv: 1603.00831},
    keywords = {Computer Science - Computer Vision and Pattern Recognition}


If you have questions, please contact Paul Voigtlaender via voigtlaender@vision.rwth-aachen.de or Michael Krause via michael.krause@rwth-aachen.de

Disclaimer Home Visual Computing institute RWTH Aachen University