Resizing large ~16:9 (2688x1520) images for use with yolov5?

Hi, I’m new to Roboflow. I’m labeling images of animals we caught on our cameras for better detection purposes. The images we have are all large 16:9-ish images, 2688x1520 pixels.

Following Ultralytics tutorial it suggests resizing them to the yolov5 640x640 default size, which takes the nice identifiable images of the animals and squishes them. I read through the two articles on resizing (Preprocess Images - Roboflow Docs and You Might Be Resizing Your Images Incorrectly), so I’m guessing the crop option will be better than the strech/resize option. Padding is also an option, but I’m wondering if the animals might end up being too small.

My question is, would it better if I did some editing on them first? Cropped or resized them in photoshop? Depending on where the bounding box is, it would unfortunate if roboflow just cropped it right out lol

Thanks

Hey there!

Have you taken a look at the small object resources as well?

Specifically, tiling has yielded good results for use cases like this:

Take a look and let me know if that solves it!

Hi, I’m not sure what I was asking was clear. I’m going to add some images here to illustrate my point. And it’s not easy because of new user restrictions…

This is the original capture from camera:


If I upload it as is to roboflow and add bounding boxes this is the result:

Then when I go to generate a Dataset with the “Resize: Strech to 640x640” option (as recommend by the ultralytics tutorial), the model is going to be learning what very squished looking deer look like:


Am I correct so far?

But if I manually crop the edges off the longest dimension (the width), keeping the shorter dimension untouched. I.e:

Adding bounding boxes, I get:


Finally, using the “Resize: Strech to 640x640” option again, results in:

Now just for arguemnts sake, going back and testing my newly generated model (of 1 image lol) with my orignal image. Won’t I get a 100% label/match detection with the pre-cropped “resized to 640x460” version? While if the model was trained with the squished “640x640 version” it may not recognize that the original image contains any deer. Which is just a long way of saying, I would think keeping the correct aspect ratio is important?

1 Like

Hi, I am very interested to know what did you do finally regarding your aspect concern

As I described in the one picture, I manually crop the wide image square (without cropping the one (shorter) dimension). It’s a time consuming process but can be made quicker by loading dozens of images into Photoshop at once, locking the crop aspect ratio to 1:1 and using keyboard shortcuts to save/close the images.

One of the developers got back to me elsewhere and said that manually cropping is a bad idea because that can’t be replicated at runtime. But what really can’t be replicated at run time is taking my widescreen camera image and squeezing it down to a 1:1 aspect ratio. :roll_eyes: (I.e. Training a model with a widescreen-source image using Stretch to: 640x640.)

FWIW, looking at these pictures again, those bounding boxes I used in the examples are terrible and way too big. I am pretty proud of my model though, it can be very accurate with only a couple hundred images per class. Not exactly “degraded performance” imo.

Hey there @EvanVanVan

Congrats on your well-performing model.

I think there might have been a bit of miscommunication here. The reason why our platform recommends you resize your images to 1:1 aspect ratio squares (without cropping) is that most object detection architectures (including but not limited to YOLOv5) use square input images, both for training and inference.

While it’s likely that there are some exceptions out there, computer vision models do best when they are trained on the images they will infer on “in the wild”:

  • If you infer on a model that takes a 1:1 aspect ratio image (which is most models), even if you provide it with an image with a different aspect ratio, it will end up going into the model “squished”. (This is an preprocessing step that happens automatically)
  • Therefore, to give the model training data that most resemble the images they will infer on, we stretch (or “shrink”) the images to a 1:1 aspect ratio size.

Nevertheless, we’re happy you were able to train an accurate model. Feel free to follow up or create a new topic if you have any other issues or questions.