Hello guys… now i am currently working a project which detect the promotion from the bottle caps and count the caps. The counting process is live counting. So it counts the live bottle caps and tell how many caps are there. I use the gopro hero 12 as a webcam. The problem is in my app, the labels are not correct and even some labels are not correct. I train in roboflow with first image. The second is the result in my application. I didn’t adjust the lighting in the second image. But even with good lighting, the labels are not correct and some caps are not detected. There are 93 caps (93 labels, promotions) in my project. So, instead of training with one caps like the first image, should i train with many caps like the second image. What is the best way to get the accurate and correct labels for the project.
And for the second, when running the app, when not detect the frame is like 30 or above. But when detect, it drops to 5 to 15 and the motion are bad. My current laptop is i7 8750h, gtx 1050 4gb. For smooth motion, what specification do i need?
The #1 rule of computer vision is that your training data should look like your production data. So, if you are going to send data like the second image to your model, your training data should be images like the second image.
I’ve seen some users have ‘framing’ models in the past, where the model just tells the photo-taker if the angle/lighting is appropriate - might be helpful here, as some of those caps look very blurry.
Additionally, adding more images will help a lot with accuracy. I’m not sure if you have 93 images or 93 objects in your dataset, but both sound small. ~500 images will significantly improve performance for this type of project.
Finally, on performance: you can improve framerate by not inferencing on every single frame. You will have some detection lag, but the rest of the app will run more smoothly. Are you using our hosted API or running the model locally? Our inference repo is a great way to run models locally, which should give you a good performance speedup.
I see that I should train the image with my production image. I have the 93 caps which are the different promotion. The example labels are like (Dog, Cat, People). So it has 3 labels. Now my project has 93 labels and so it has many different caps’ promotion. Now should I take a image to train with full caps (93 caps) in one image. Or can I train like 10 caps in a image?
Will you eventually have 93 different caps in one image in production? If so, you should train on that. Otherwise, I recommend training on a range of images: some that contain one cap, some that contain ~10, and a couple that contain no caps.