Solid Dataset for Food Detection Model

Hello all

I am relatively new to ML and Object Detection and would love to get some insights on what is a good dataset for my desired model.

I am trying to train an object detection model for the following products:

I will be using a camera that is filming a flat surface from above at a distance of about 0.5-1 metre. It should simualate a checkout, meaning that “customers” can purchase any combination of these products.

Since I am quite unsure about the best approach for a solid dataset I would appreciate any response on the following questions:

  1. Is it better to only provide pictures where all of the products are visible, or should I also provide images with only one product in it? Or should I mix it up with a specific ration of images with all products in it / images with only one product in it?

  2. If the objects to detect are always placed on the same flat surface, is it needed/a good idea to provide images to the dataset that show the objects on different backgrounds?

  3. How many images will I roughly need? I was told that a good rule of thumb is to provide 1-2k pictures per product. This somewhat interfers with my first questions. Should I provide pictures per product or pictures with all products in them.

  4. Right now I am using a Roboflow trials account that is locked to a duplication of 5x for a dataset. It states that more duplications require an upgrade. What upgrade do i need to unlock these? Is it the normal monthly subscription?

  5. Is it better to provide more “raw” images or does the duplication of the uploaded images a better job at providing useful images for the dataset, rather than just uploading more images on my own?

  6. My plan is to train the model on my local machine, since I have read, that there is no possibility to directly download a trained model from roboflow? The model needs to run on an offline device.

I would really appreciate if you could give me a rough estimation of what / how many raw pictures I should provide to the dataset. Maybe there is even a good choice of preprocessing / augmentation steps I should use for this specific usecase?

Any help is appreciated
Thanks a lot

Greetings

  • Project Type: Object Detection

Hi @FoodDetection - thanks for posting!

Responses below:

  1. Is it better to only provide pictures where all of the products are visible, or should I also provide images with only one product in it? Or should I mix it up with a specific ration of images with all products in it / images with only one product in it?

The two golden rules of computer vision: 1) If a human can see it, so can a model. 2) Your training data should look like your production data. To that end, I suggest you use the same set of cameras / camera angles / locations as you expect your model to see in the wild. Generally, more variance is good.

  1. If the objects to detect are always placed on the same flat surface, is it needed/a good idea to provide images to the dataset that show the objects on different backgrounds?

If the images will always be on the same flat surface, you don’t need to vary the backgrounds. However, it may help improve the robustness of the model (e.g., if lighting or other contexts change).

  1. How many images will I roughly need? I was told that a good rule of thumb is to provide 1-2k pictures per product. This somewhat interfers with my first questions. Should I provide pictures per product or pictures with all products in them.

If you are only looking at these objects on the same consistent background, I could see ~2,000 well-varied images total being fine. Make sure to use different lighting and contexts. My advice is always to start small (try training on ~200 images) to get a sense of how your model improves with increased sample size. You can also use your early models to pre-label new images.

  1. Right now I am using a Roboflow trials account that is locked to a duplication of 5x for a dataset. It states that more duplications require an upgrade. What upgrade do i need to unlock these? Is it the normal monthly subscription?

We can provide more variants per image after you purchase a starter plan; just shoot us an email at starter-plan@roboflow.com once you upgrade and we’ll let you test it out. Generally, going more than 5x will overfit your model. However, if you have an extremely constrained environment you want to run the model in, overfitting might not be horrible.

  1. Is it better to provide more “raw” images or does the duplication of the uploaded images a better job at providing useful images for the dataset, rather than just uploading more images on my own?

Raw images are always going to improve model performance more than duplicats / augments.

  1. My plan is to train the model on my local machine, since I have read, that there is no possibility to directly download a trained model from roboflow? The model needs to run on an offline device.

You can run models from Roboflow on your own device! Check out inference.roboflow.com

I would really appreciate if you could give me a rough estimation of what / how many raw pictures I should provide to the dataset. Maybe there is even a good choice of preprocessing / augmentation steps I should use for this specific usecase?

Again, my advice is to start at 200, train a model quickly, and add more images as you need. You may need to get to ~2,000 well-labeled images to get a model that works without common errors. This is assuming you are only detecting the images in question from the angle / background you’re showing.

Hey @Jacob_Witt

Thank you for your response. Your explanations really helped a lot.

Just for clarification:
I should provide roughly 2000 raw images by myself- correct?
And then do a 5x, so that i end up with ~10k images?

Thank you for your help.

I wouldn’t even count the augmented images together with your source images - they do not provide nearly as much incremental value to your model.

Yes, you should aim for 2,000 raw images, but start training on ~200. I suggest experimenting with no augs / some augs / many augs to see what works best.