Questions about Model Sizes and Resolutions

I noticed that there are new options for model sizes when training RFDETR-Seg on Roboflow-UI. From my understanding the new sizes are L (33M, 504x resolution ), XL (126M, 624x resolution), 2XL(126M, 768x resolution).

Considering my dataset of 1400 Images, avg resolution 1800x1000, with instance segmentation masks as small as 40x90 px, would the 2XL model have a significant improvement in accuracy on small objects?

Also, I have been using the RFDETR-Seg model at the previous default size on roboflow UI and getting effective results, however the MAP@50 is struggling on small objects 45% vs. 94% on medium obj. Previously I have been training on dataset versions with no resizing step to get these results. I am wondering If it is better to add a resizing step, and if is it better to go for the default models size like 624x or keep high resolution for accuracy on small objects?

I am only interested in improving model performance, willing to increase train time and resources. also curios about latency.

  • Project Type: Instance Segmentation
  • Operating System & Browser: MacOS, Chrome
  • Project Universe Link or Workspace/Project ID: player_detection_seg
  • Do you grant Roboflow Support permission to access your Workspace for troubleshooting? (Yes/No): Yes

Hi! For the new segmentation releases all the sizes are refreshed for higher accuracy and lower latency. The larger sizes are especially differentiated versus the preview offering. In our experiments most of the improvement from the preview version to the 2XL is on small objects. If latency is not a concern and small objects are of specific interest I would highly recommend trying it out!

I would encourage you to add an explicit resize step to the optimized resolutions. In the future it will become more clear when the resolution selected is not optimal for the model, but in the meantime it is likely safer to be explicit about it.

For these models you can always increase the resolution if you need even more accuracy. However, RF-DETR models are created via Neural Architecture Search, which chooses the optimal architecture of the model given the target dataset and target hardware, and resolution is part of what it considers when doing that search. For more details please see our paper (which was just accepted to ICLR!). That means that if you’re training an RF-DETR-Seg small model at the recommended resolution, you’ll probably get better performance for the latency by training RF-DETR-Seg medium than running the small at a higher resolution. Having said that, we don’t have a 3XL version so scaling resolution on the 2XL is the only way for users to get even better performance than the 2XL gets out of the box!

1 Like

I appreciate your answer. After reading your explanation and the NAS paper, I understand the architecture and resolution tradeoffs much better. I also found the comparison tables towards the end to be a good resource. I decided to train the 2XL at 1288x to see what that accuracy is like for my case.