I’m trying to export a dataset I have created on Roboflow for local training with Apple’s CreateML application to create a CoreML model.
The dataset is quite large; it’s based off this Coco dataset from Universe. The only difference is I’m only using the following classes in the pre-processing step:
backpack
book
cell phone
chair
clock
cup
dining table
handbag
keyboard
laptop
microwave
mouse
person
remote
refrigerator
potted plant
scissors
sofa
suitcase
tv monitor
When I add the data to CreateML, I see that there are 3 classes in the Training Data section, and 20 in the Validation Data section. These should be the same number of classes.
When I look at the respective _annotations.createml.json files, they do reflect this class difference.
There are many under represented classes as shown here, but, I’m only using a few compared to the total number. This is again from the original Microsoft coco dataset I linked above.
Are you filtering null images as well? Can you confirm that there’s definitely more than three classes tagged in the training dataset images?
I’m sure it’s highly unlikely, but if you didn’t filter out null images, I can imagine a scenario where only the three classes that you filtered (and the other ones that aren’t) made it into your training dataset.
@leo It looks like I’m not filtering out null images.
I can confirm, weirdly, that classes that are present in the Valid set are not present in the Train set - see the difference in the two screenshots below:
Could you share the project Universe link (if public) or the workspace and project ID? Is it the COCO dataset Universe link you shared? (I can’t seem to click on it)
Was it the same number of classes that were getting added to a specific split, or a different number?
Still working on your issue and haven’t found a clear cause or solution yet. In the meantime, I’ve filed a bug report so the team can take a deeper look.