Instance segmentation failing even on the training images

Worked on this project I was testing the segmentation using some of the annotated images used as training set (i.e. this one) and it totally fails, detecting with ultralow confidence just one of the categories. Testing other images get the same result, or even worse find a totally wrong category.

Due the original dataset was 41 images with 18 classes, with around 800 images in the dataset after augmentation, I was expecting a low confidence, but not a totally unusable result.

Wanted to ask if there might be something wrong in the training setup, or my expectations was just wrong and I will need a more balanced dataset (this is the health check result)

This is the segmentation coming from the v1 version of the model on the linked image

So just two classes, no segmentation, accuracy 6%

This are the very same image annotations
Brenner-base-tunnel-Mules.jpg > Tunneling (

