How can I improve my dataset for increased mAP

Frame (23)

Hi all,

I want to use yolov4 object detector to detect LED matrices like the one in the attached picture. The goal of my project is to perform automated RoI of these types of LED matrices in vehicular scenarios, mainly.

Unfortunatelly, these type of objects are not very popular and I could not find a way to produce a good dataset for training. I’ve tried to train yolov4 algoritm with different cfg parameters but 2 things always happens:

  1. Overfitting
  2. Alghoritm does not converge and no detection is performed.

Do you have any tips on how can I improve my dataset? This kind of object is not very popular.

Hi @mfa

Sounds like an interesting project. I’ve also experimented with some unpopular object detection use cases before. The easiest way to improve your dataset from overfitting is to augment it via Roboflow’s augmentation settings when generating your dataset.

Also, are you collecting the data yourself?
If so, it might be worth looking into automating your data collection. For my use case, I was able to automate a lot of my data collection and annotation by using the upload API and the annotation API, or using model-assisted labeling.


Great answer @stellasphere!!

@mfa - A few more questions to get to the root of the issue:

  1. What are your current mAP/precision/recall scores?
  2. How many images do you have labeled?
  3. How many classes are in the dataset?
  4. How many labels per class?

Questions 2-4 can be answered by viewing the “Health Check” page on your project:

Also be sure that you are creating tight bounding boxes around each object you are trying to label, and label every instance of said object in every image that is being used for the model training.

1 Like

Hi stellasphere,

Tks for the answer. And yes, I am collecting the data myself and I am actually using these tools to facilitate the preprocessing step. However, it is being hard to think in some ways to get samples with different angles / views, as data augmentation does not seen to be enough to eliminate overfitting.

My dataset has 370 images, is it enough or should I collect more?

Hi Mohamed,

Tks for the answer and sorry for the delay. Answering your questions, I am attaching the Health Check statistics:

  1. There is only one class, the LED Matrix, as I want to detect and remove other objects that may be near of it. My main goal is to plug it on a car’s backlight and be able to detect / recognize the LED Matrix from a camera embedded on another car during driving.

It appears you’re making one label for the entire set of LED lights, correct?

Your dataset is just going to need more labeled examples, in this case. Once you get to around 300-500 labeled examples and generate a version to train, you’ll see some better results.

Active Learning is another great process to try to improve future model performance on newly trained models: Active Learning - Roboflow

Sorry for the delay again.

Answering your questions:

Yes, I have only one label for the entire set. I will try to increase my dataset diversity. The main problem is that I don’t have too many samples to work with.

Do you have any tips on how can I create this dataset to be diverse and not produce overfitting after training?



You could try synthetic data: