Welcome to the Computer Vision Group at RWTH Aachen University!

The Computer Vision group has been established at RWTH Aachen University in context with the Cluster of Excellence "UMIC - Ultra High-Speed Mobile Information and Communication" and is associated with the Chair Computer Sciences 8 - Computer Graphics, Computer Vision, and Multimedia. The group focuses on computer vision applications for mobile devices and robotic or automotive platforms. Our main research areas are visual object recognition, tracking, self-localization, 3D reconstruction, and in particular combinations between those topics.

We offer lectures and seminars about computer vision and machine learning.

You can browse through all our publications and the projects we are working on.

We have two papers accepted at the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2017.

June 15, 2017

We have two papers accepted at the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017. One oral and one spotlight.

Feb. 28, 2017

We have two papers accepted at the IEEE Winter Conference on Applications of Computer Vision (WACV) 2017.

Jan. 4, 2017

We have a paper on Scene Flow Propagation for Semantic Mapping and Object Discovery in Dynamic Street Scenes at IROS 2016

Aug. 19, 2016

We have three papers accepted at the British Machine Vision Conference (BMVC) 2016.

Aug. 19, 2016

We have a paper on Joint Object Pose Estimation and Shape Reconstruction in Urban Street Scenes Using 3D Shape Priors at GCPR 2016

June 19, 2016

Recent Publications

Full-Resolution Residual Networks for Semantic Segmentation in Street Scenes

Conference on Computer Vision and Pattern Recognition (CVPR'17) Oral

Semantic image segmentation is an essential component of modern autonomous driving systems, as an accurate understanding of the surrounding scene is crucial to navigation and action planning. Current state-of-the-art approaches in semantic image segmentation rely on pre-trained networks that were initially developed for classifying images as a whole. While these networks exhibit outstanding recognition performance (i.e., what is visible?), they lack localization accuracy (i.e., where precisely is something located?). Therefore, additional processing steps have to be performed in order to obtain pixel-accurate segmentation masks at the full image resolution. To alleviate this problem we propose a novel ResNet-like architecture that exhibits strong localization and recognition performance. We combine multi-scale context with pixel-level accuracy by using two processing streams within our network: One stream carries information at the full image resolution, enabling precise adherence to segment boundaries. The other stream undergoes a sequence of pooling operations to obtain robust features for recognition. The two streams are coupled at the full image resolution using residuals. Without additional processing steps and without pre-training, our approach achieves an intersection-over-union score of 71.8% on the Cityscapes dataset.


Semi-Supervised Deep Learning for Monocular Depth Map Prediction

IEEE International Conference on Computer Vision and Pattern Recognition (CVPR'17), Spotlight

Supervised deep learning often suffers from the lack of sufficient training data. Specifically in the context of monocular depth map prediction, it is barely possible to determine dense ground truth depth images in realistic dynamic outdoor environments. When using LiDAR sensors, for instance, noise is present in the distance measurements, the calibration between sensors cannot be perfect, and the measurements are typically much sparser than the camera images. In this paper, we propose a novel approach to depth map prediction from monocular images that learns in a semi-supervised way. While we use sparse ground-truth depth for supervised learning, we also enforce our deep network to produce photoconsistent dense depth maps in a stereo setup using a direct image alignment loss. In experiments we demonstrate superior performance in depth map prediction from single images compared to the state-of-the-art methods.


Combined Image- and World-Space Tracking in Traffic Scenes

IEEE Int. Conference on Robotics and Automation (ICRA'17), to appear

Tracking in urban street scenes plays a central role in autonomous systems such as self-driving cars. Most of the current vision-based tracking methods perform tracking in the image domain. Other approaches, e.g. based on LIDAR and radar, track purely in 3D. While some vision-based tracking methods invoke 3D information in parts of their pipeline, and some 3D-based methods utilize image-based information in components of their approach, we propose to use image- and world-space information jointly throughout our method. We present our tracking pipeline as a 3D extension of image-based tracking. From enhancing the detections with 3D measurements to the reported positions of every tracked object, we use world- space 3D information at every stage of processing. We accomplish this by our novel coupled 2D-3D Kalman filter, combined with a conceptually clean and extendable hypothesize-and-select framework. Our approach matches the current state-of-the-art on the official KITTI benchmark, which performs evaluation in the 2D image domain only. Further experiments show significant improvements in 3D localization precision by enabling our coupled 2D-3D tracking.

Disclaimer Home Visual Computing institute RWTH Aachen University