I have attached my detection result after 150 epoch. As you can see some of the data is detected properly but some are not, the data ‘NIK’ in particular. While the other data have MAP50-95 of roughly 0.8 to 0.9, NIK has only 0.62 something. This is my project: KTP Scanner Final Object Detection Dataset by Distant Space.
My question is, what is wrong with my dataset, why cannot it detect all the data perfectly? I need the detection to be perfect because it needs to capture all the letters.
Thanks in advance. If you need further inquiry please let me know
Interesting project! Looking at your project, I have a few things that could help with your project:
Looking at your generated dataset, you have images like these (below) where you using a bounding-box level rotation augmentation.
The most important part of training data for computer vision models is for them to look as similar to the data you will be predicting them on. The bounding-box level rotation causes the text to appear on the image in ways that would likely never happen on real prediction data. I recommend trying out different augmentations (I think whole image-level augmentations will be best) by creating various versions and training your model with augmented images that would be similar to actual images your model will encounter
While your current approach, along with implementing the above feedback, might work well,the similarity in text might cause confusion to the model.
It might be worth trying a different approach, like annotating the entire section with information as one larger class, then using text processing to extract information
Hi @leo . Thank you for your quick response. I have tried whole image augmentation and unfortunately no improvement. however I think the second part of your explanation about simpler annotation is the key. I will try to re-annotate and see what happens.