Exploring Spatial Context for 3D Semantic Segmentation of Point Clouds

Francis Engelmann, Theodora Kontogianni, Alexander Hermans, Bastian Leibe
Computer Vision Group, RWTH Aachen University
(† : equal contribution)

Deep learning approaches have made tremendous progress in the field of semantic segmentation over the past few years. However, most current approaches operate in the 2D image space. Direct semantic segmentation of unstructured 3D point clouds is still an open research problem. The recently proposed PointNet architecture presents an interesting step ahead in that it can operate on unstructured point clouds, achieving decent segmentation results. However, it subdivides the input points into a grid of blocks and processes each such block individually. In this paper, we investigate the question how such an architecture can be extended to incorporate larger-scale spatial context. We build upon PointNet and propose two extensions that enlarge the receptive field over the 3D scene. We evaluate the proposed strategies on challenging indoor and outdoor datasets and show improved results in both scenarios.




  • engelmann@vision.rwth-aachen.de
  • kontogianni@vision.rwth-aachen.de


We evaluated our method on the following datasets:

  • Stanford Large-Scale 3D Indoor Spaces Dataset (S3DIS) - Link
  • Virtual KITTI (VKITTI) - Link

For S3DIS, we followed the evaluation methodology from PointNet using the same six-fold cross validation. For VKITTI we split up the original sequences into non-overlapping sub-sequences to perform the same 6-fold cross validation. The details of this splitting a given below. Additionally, for each sub-sequence we selected 15 scenes at uniform distances to avoid overlapping data. We provide the precomputed training and test files under Downloads.

Original Ours Train/Test Frames - VKITTI
1 1 0 - 170: 0,12,24,36,48,60,72,85,97,109,121,133,145,157, 170
1 2 230 - 420: 230,243,257,270,284,297,311,325,338,352,365,379,392,406,420
2 3 0 - 232: 0,15,31,47,63,79,95,111,127,143,159,175,191,207,223
18 4 30 - 338 30,52,74,96,118,140,162,184,206,228,250,272,294,316,338
20 5 80 - 444: 80,106,132,158,184,210,236,262,288,314,340,366,392,418,444
20 6 500 - 800: 500,521,542,564,585,607,628,650,671,692,714,735,757,778,800


  author    = {Francis Engelmann and
               Theodora Kontogianni and
               Alexander Hermans and
               Bastian Leibe},
  title     = {Exploring Spatial Context for 3D Semantic Segmentation of Point Clouds},
  booktitle = {{IEEE} International Conference on Computer Vision, 3DRMS Workshop, {ICCV}},
  year      = {2017}

Disclaimer Home Visual Computing institute RWTH Aachen University