Seeking Advice on Optimal Inference Image Size

Hi everyone,

I’ve been working on developing a model to detect stairs in architectural plans, inspired by this existing model:

How to Use the stairs detection Object Detection API

Since the weights of the original model are not available, I’ve trained my own model using the same dataset. After comparing the graphs of both models, they appear to be very similar in structure. Below are the graphs of my model for reference:

The model performs reasonably well, successfully detecting stairs in most architectural plans. However, I’m encountering an issue where the results vary significantly depending on the image inference size used during testing.

For example, when processing a floor plan with dimensions of 10,459 x 7,483, I’ve tested various inference sizes:

  • Using the largest dimension (either height or width) takes too much time due to the image size.
  • Using half of the largest dimension yields no results.
  • Using a 2368px inference size produces good results.

My question is: How should I determine the optimal image size for inference to achieve consistent and accurate results?

Do you have a specific time you need a result in?

One thing to keep in mind is that each model is trained on a given image size / resolution, and it’s best to match image size to the model. In your case with irregular image sizes, I recommend resizing to a square before training using our fit (black edge) preprocessing step.

It sounds like you’re thinking about things the right way - it’s best to test and play around. However, my gut is you’re best off focusing on data quality (are all images labeled exactly the way you want the model to respond? are all images resized to the same resolution? do you have enough data?).

One other thing that could help out is the SAHI block in our workflow feature which helps you detect small objects in large images.

Hi Jabob, thanks for your response.

For now I’m focusing in quality over performance.

I can understand that is best to match the trained image size with the input image size during inference. Perhaps that is my problem.

I have an dataset with over 800 images with diferent sizes (the majority of the images are bigger than 2000px). I don’t think that my problem is the quality of the dataset, wich is basically fine.

For large images how can I define the image size for training ? I suppose that if I have an image with let’s say 3000px and then I resize this image to 640px I’m going to lose on quality.

When you create a version in Roboflow you can use the resize preprocessing step to set the image size.

From my experience, every dataset has some labeling issues. I highly recommend using our detailed model evaluation tool just to doublecheck where you have issues.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.