Does omitting a class eliminate the images from the dataset?

Hello , i’ve already used the aquarium dataset with yolov8 but i would like to delete the classes of penguins and puffins . i tried to omit the starfish class in my first dataset version but it seems that when i evaluate the model on the test set those images are still in my dataset , this also could be good to train the model that not all images have to have the desired objects.
after omitting those classes (penguins and puffins ) , should i leave them in my dataset or should i delete some of them manually or usually a more random images with no class assigned?
Also , how many null images should i keep in my dataset ? i have right now 1200 images in total

Hello @edoardo

The Modify Classes augmentation only works to omit the annotations for a given class, not the images themselves.

Are you trying to create a dataset for a scenario that does not have penguins or puffins entirely? If you are trying to create a model to identify certain classes while ignoring other ones, it may be benefical to include them, then omit them in the preprocessing in order to prevent the model from becoming confused by something it has never seen.

As for keeping null images, there is no set rule as to how many null images you should keep in a dataset. It is important, as you said, to train the model that not all images have the desired objects. That is something you’ll want to play around with.

thank you @leo , so omitted classes are basically nulls images ? or should i mark them null afterwards

You’re very welcome @edoardo

Images with omitted classes aren’t necessarily null images. If they have classes that aren’t omitted, they will still have those. (so not null) If they were images that only had classes that are omitted, they are practically null images.

If the images you are talking about only contain the classes you omitted, then there’s no need to mark them null.

Here are some resources that might help:

thanks @leo , i have another doubt : i split my new dataset version into ( 70 , 20 , 10 ) , i did some preprocessing and augmentation operations and when i generate the dataset i get a different split ( 82, 12 , 6 ) . i don’t understand , why the split is different from what i’ve set ?

Hi @edoardo

I was confused about that at first too when I was getting used to Roboflow.

create new images based on existing images in your project training set
from Generate Image Augmentations with Roboflow

They only augment images from the training dataset. So the number of images in the valid and test set doesn’t change, but the training dataset will get bigger.

Hope this helps

1 Like