Hello everyone,
I’m currently working on a computer vision project where I need to detect and segment walls and floors from images or video frames, primarily in indoor environments. My end goal is to identify these structural components either for scene understanding, robotic navigation, or AR applications.
I would like to ask the following:
- Are there any state-of-the-art pretrained models specifically fine-tuned for detecting walls, floors, ceilings, and possibly other architectural components?
- Is it advisable to use general-purpose semantic segmentation models (e.g., DeepLabV3+, HRNet, or Mask R-CNN) and fine-tune them on a suitable dataset? If so, what datasets would be best suited for this task (e.g., SUN RGB-D, ScanNet, Matterport3D)?
- Are there any tools or pipelines (possibly in PyTorch or TensorFlow) that can speed up the process of training or deploying such models?
- How well do monocular vs. depth-based approaches perform in this context?