M.Sc. István Sárándi
Room 127
Phone: +49 241 80 20 769
Fax: +49 241 80 22 731
Office hours: email me


I'm a PhD student at the Computer Vision Group of RWTH Aachen University. My research interests lie in automated visual analysis of humans, especially for applications such as human-robot interaction and collaborative robotics. I am currently focusing on estimating articulated human body pose using deep learning methods. My position is primarily funded by a scholarship from the Bosch Research Foundation.



István Sárándi, Timm Linder, Kai Oliver Arras, Bastian Leibe
IEEE/RSJ Int. Conference on Intelligent Robots and Systems (IROS'18) Workshops

Occlusion is commonplace in realistic human-robot shared environments, yet its effects are not considered in standard 3D human pose estimation benchmarks. This leaves the question open: how robust are state-of-the-art 3D pose estimation methods against partial occlusions? We study several types of synthetic occlusions over the Human3.6M dataset and find a method with state-of-the-art benchmark performance to be sensitive even to low amounts of occlusion. Addressing this issue is key to progress in applications such as collaborative and service robotics. We take a first step in this direction by improving occlusion-robustness through training data augmentation with synthetic occlusions. This also turns out to be an effective regularizer that is beneficial even for non-occluded test cases.

» Show BibTeX

title={How Robust is 3D Human Pose Estimation to Occlusion?},
author={S{\'a}r{\'a}ndi, Istv{\'a}n and Linder, Timm and Arras, Kai O and Leibe, Bastian},
booktitle={IROS Workshop - Robotic Co-workers 4.0},

István Sárándi, Timm Linder, Kai Oliver Arras, Bastian Leibe
Extended abstract for the ECCV PoseTrack Workshop 2018

In this paper we present our winning entry at the 2018 ECCV PoseTrack Challenge on 3D human pose estimation. Using a fully-convolutional backbone architecture, we obtain volumetric heatmaps per body joint, which we convert to coordinates using soft-argmax. Absolute person center depth is estimated by a 1D heatmap prediction head. The coordinates are back-projected to 3D camera space, where we minimize the L1 loss. Key to our good results is the training data augmentation with randomly placed occluders from the Pascal VOC dataset. In addition to reaching first place in the Challenge, our method also surpasses the state-of-the-art on the full Human3.6M benchmark when considering methods that use no extra pose datasets in training. Code for applying synthetic occlusions is availabe at

» Show BibTeX

author = {S{\'a}r{\'a}ndi, Istv{\'a}n and Linder, Timm and Arras, Kai O and Leibe, Bastian},
title = {Synthetic Occlusion Augmentation with Volumetric Heatmaps for the 2018 {E}{C}{C}{V} {P}ose{T}rack Challenge on 3{D} Human Pose Estimation},
year = {2018}

István Sárándi
Master Thesis

In this thesis we examine the task of estimating how many pedestrians cross a given line in a surveillance video, in the presence of high occlusion and dense crowds. We show that a prior, blob-based pedestrian line counting method fails on our newly annotated private dataset, which is more challenging than those used in the literature.

We propose a new spatiotemporal slice-based method that works with simple low-level features based on optical flow, background subtraction and edge detection and show that it produces good results on the new dataset. Furthermore, presumably due to the very simple and general nature of the features we use, the method also performs well on the popular UCSD vidd dataset without additional hyperparameter tuning, showing the robustness of our approach.

We design new evaluation measures that generalize the precision and recall used in information retrieval and binary classification to continuous, instantaneous pedestrian flow estimations and we argue that they are better suited to this task than currently used measures.

We also consider the relations between pedestrian region counting and line counting by comparing the output of a region counting method with the counts that we derive from line counting. Finally we show a negative result, where a probabilistic method for combining line and region counter outputs does not lead to the hoped result of mutually improved counters.

Disclaimer Home Visual Computing institute RWTH Aachen University