Weakly supervised segmentation in industrial processes using non-conventional annotations and multimodal data.
In industrial processes like waste sorting and food quality control, human operators manually remove anomalous items from a stream of cluttered objects. To automate these processes, instance segmentation of anomalous items would be crucial. However, considering the extensive variability of specific tasks involving this setting, tailor-made solutions employing fully supervised methods would be impractical, due to the numerous manual annotations necessary for training. Our research aims to develop an alternative weakly supervised solution that exploits the visual differences between images collected before and after the manual removal process, as before scenes contain anomalies to remove while after scenes contain only valid items, learning directly from the operator's work. Our goal is to build a scalable segmentation pipeline to learn selection criteria directly from human interventions, leveraging multimodal data such as videos and 3D. Specifically, we collect videos from before and after cameras and exploit temporal consistency to re-identify items from the two scenes, to understand which ones have been removed. Moreover, we employ videos from single cameras and depth estimation to perform 3D reconstruction in order to increase the number of features known for every item and enhance the ability of the model to re-identify it even if in different positions or light conditions.
Back to Current Students