PROJECT TYPE: Object detection
Hi, I would like to have a model with a lot of classes for example 200.
Do you think that is better for have more specific model instead of one that recognize 200 objects?
I mean, a lot of model that do computation on a server maybe is too much?
What is the best practice about this kind of topic?
Hey there Releow!
I have used YOLOv4 and YOLOv5 to create models that consist of over 100+ different classes.
While I don’t know if you will be able to detect all those different classes on the screen at once. I do know it is possible to get a model that can detect a lot of different classes while 3-10 objects are on the screen.
My use case was building a robust wildlife detection model that can be used for all the common birds of the world. Similar to the Merlin project, but the model wasn’t split by region. It was successful, but there is some nuances.
Our team only began to see significant accuracy when we used a lot of training data from all types of lighting environments. I recommend that you have at least 1,000 annotations per class if you are going to try and build a model that can recognize hundreds of classes. Currently our wildlife dataset sits at around 1.2M photos with approximately 5,000 photos per class (We may be able to scale down images required with YOLOv7).
Another thing to consider is the training time to build a model like this. Our average training time was 3 days to get through 40 epochs. We used 4x NVIDIA 1080 GPUs, which probably could be improved if we bought better quality GPUs or a machine learning server. There is always a speed to cost ratio when building models of this scale, so you either have to budget extra training time or spend cash on a really big training server.