M2F3D: Mask2Former for 3D Instance Segmentation

Jonas Schult, Alexander Hermans, Francis Engelmann, Siyu Tang, Otmar Hilliges, Bastian Leibe
Transformers For Vision Workshop
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2022 (Spotlight)

In this work, we show that the top performing Mask2Former approach for image-based segmentation tasks works surprisingly well when adapted to the 3D scene understanding domain. Current 3D semantic instance segmentation methods rely largely on predicting centers followed by clustering approaches and little progress has been made in applying transformer-based approaches to this task. We show that with small modifications to the Mask2Former approach for 2D, we can create a 3D instance segmentation approach, without the need for highly 3D specific components or carefully hand-engineered hyperparameters. Initial experiments on the ScanNet benchmark are very promising and sets a new state-of-the-art on ScanNet test (+ 0.4 mAP50).

Disclaimer Home Visual Computing institute RWTH Aachen University