What is the recommended ratio of background images (negative samples) for training a YOLOv11 model? Why?
I have about 800 positive samples now
I saw that Ultralytics have a blog saying 10% is recommended, but many other forums says 50% I wonder why and how do I know which is the suitable number for me. thx
I have always heard 1-10% is the best mix. I’ve not heard 50% - that would likely be for unusual cases where your end deployment will rarely see detections. I think it’s mostly a matter of what works best as you test. I would generally start around 5% and then I increase if I still get a lot of false positives. If you start missing detections, you might have reached a point of too many nulls which makes the model start to be more conservative.
More thoughts FYI: Why nulls: Just to set the background for the why (as in why go with 5% vs 10% etc) in general you are trying to avoid false positives. So you put in some nulls to tell the model “you might think this is a truck, but there is no truck in this image so when you see stuff like this don’t flag it as a truck”. When you are trying to decide between 5%, 10%, etc, a primary check is “do I have a lot of false positives”? If yes, then you want more nulls to train the model away from items that only look like your desired classes.
Why false positives: The other side of this then is why would some results yield more false positives than others. Every model is trained on different datasets. Then users pick that model up and adapt it for new tasks, like maybe detecting a trampoline. The base training may have randomly ended up with a model that’s not great for trampolines. So when someone goes to fine tune on trampolines they might find they need more null images to guide the model. The point being that the number of null images needed will vary depending on your use-case and how well the model works for you out-of-the-box. I believe this is part of what Roboflow was working on with their RF-DETR model when they say “This gives us exceptional ability to adapt to novel domains based on the knowledge stored in the pre-trained DINOv2 backbone.” The model starts in a better place “out-of-the-box” as you fine-tune for your given use-case. You can read more about that in this post.
So there’s a few thoughts to get you started. One of the experts out here might have a more concrete answer on the technical why of inserting null images as well. Good luck!