It appears cnn training involves images that are square and all the same size. This may be convenient of optimum gpu memory usage but not practical for realistic images used for training. It appears the roboflow handles this issue internally. I have noticed that yolov5 has a qualifier call --imagesz for the maximum image dimension and the have two sets of models, i.e., one for 640 and another for 1280 image sizes.
- Should one scale the images to have all the same image size but keep the aspect ratio so object distortion does not occur?
- I assume some type of tiling occurs in roboflow (and yolov5) but I can not find much detail on this. Can you provide additional guidance?
- Many models have used 640x40 image size for training. Does this not restrict the smallest object that can be inferenced, especially when an image is 1920x1080?
So long as you sample images from your deployment environment and continue labeling them and resizing them the same way, your model will get better at inference over time.
The images are resized for inference, so be sure the Resize metric you choose on Roboflow’s pre-processing matches the image size setting you set for train.py and detect.py.
There are multiple Resize options on Roboflow, and you can select “Fit Within” or “Stretch To” for example depending on your preferences. Just keep it standard.
Tiling your images is a preprocessing step on Roboflow. For details on YOLOv5’s augmentations, here is the GitHub repository: Architecture Summary
You can also add augmentation during testing and inference with YOLOv5: Test-Time Augmentation (TTA) Tutorial · Issue #303 · ultralytics/yolov5 · GitHub
For more on how to work with small objects: Small Object Detection Guide