M2F3D: Mask2Former for 3D Instance Segmentation

Jonas Schult, Alexander Hermans, Francis Engelmann, Siyu Tang, Otmar Hilliges, Bastian Leibe
Transformers for Vision Workshop at CVPR 2022 (Spotlight)

In this work, we show that the top performing Mask2Former approach for image-based segmentation tasks works surprisingly well when adapted to the 3D scene understanding domain. Current 3D semantic instance segmentation methods rely largely on predicting centers followed by clustering approaches and little progress has been made in applying transformer-based approaches to this task. We show that with small modifications to the Mask2Former approach for 2D, we can create a 3D instance segmentation approach, without the need for highly 3D specific components or carefully hand-engineered hyperparameters. Initial experiments with our M2F3D model on the ScanNet benchmark are very promising and sets a new state-of-the-art on ScanNet test (+0.4 mAP50).

Please see our extended work Mask3D: Mask Transformer for 3D Instance Segmentation accepted at ICRA 2023.

Disclaimer Home Visual Computing institute RWTH Aachen University