This page showcases our team project for 3D volumetric segmentation from egocentric views using the Microsoft HoloLens 2. We implemented Theodora Kontogianni et al.'s paper Interactive Object Segmentation in 3D Point Clouds on the HoloLens 2.


3D semantic segmentation is important for various applications in general scene understanding tasks. However, annotating ground truth datasets is a time-consuming and costly process. We propose an auto-labeling tool using the Microsoft HoloLens 2 for interactive egocentric object segmentation. We implement a tool on the HoloLens capable of visualizing point clouds, segmentation results, iterative refinement of segmentations, as well as a 3D mesh visualizer of the final segmentation.

The backbone of the segmentation model is a neural network architecture based on the Nvidia Minkowski Engine trained for volumetric segmentation on point clouds on the Scannet dataset. User labels, which are gathered by pointing a finger towards objects in the scene, as well as the full point cloud of the scene are the inputs to the network. A binary classification head outputs a segmentation mask for the desired object.

This interactive first-person semantic segmentation tool significantly improves the time and effort required for labeling tasks in custom environments and enables faster workflows for other research endeavors. Furthermore, it allows for real-time verification and correction of segmentations with minimal user annotation.


Demo Video

Tools Used

  • C#, Python, and Unity.
  • Libraries used:
    • PyTorch for the segmentation model
    • Python Socket Library for TCP client-server connection between an off-board linux computer and Hololens