Discrepancy in CreateML annotations JSON file

I’m trying to export a dataset I have created on Roboflow for local training with Apple’s CreateML application to create a CoreML model.

The dataset is quite large; it’s based off this Coco dataset from Universe. The only difference is I’m only using the following classes in the pre-processing step:

  • backpack
  • book
  • cell phone
  • chair
  • clock
  • cup
  • dining table
  • handbag
  • keyboard
  • laptop
  • microwave
  • mouse
  • person
  • remote
  • refrigerator
  • potted plant
  • scissors
  • sofa
  • suitcase
  • tv monitor

When I add the data to CreateML, I see that there are 3 classes in the Training Data section, and 20 in the Validation Data section. These should be the same number of classes.

When I look at the respective _annotations.createml.json files, they do reflect this class difference.

There are many under represented classes as shown here, but, I’m only using a few compared to the total number. This is again from the original Microsoft coco dataset I linked above.

Hi Nick,

Are you filtering null images as well? Can you confirm that there’s definitely more than three classes tagged in the training dataset images?

I’m sure it’s highly unlikely, but if you didn’t filter out null images, I can imagine a scenario where only the three classes that you filtered (and the other ones that aren’t) made it into your training dataset.

@leo It looks like I’m not filtering out null images.

I can confirm, weirdly, that classes that are present in the Valid set are not present in the Train set - see the difference in the two screenshots below:

Hi Nick,

Could you try filtering out null images and rebalancing your dataset to see if that solves your issues?

@leo dumb question, but, how do I filter out the null images?

Hi @Nick_Arner

No worries at all. Null filtering is a preprocessing feature. Here’s how to enable it:

  1. When generating a new version, during step 3 “Preprocessing”, add a new preprocessing step

  2. Select Filter Null and select the percentage of null images you’d like to remove from that version of your dataset.

Thank you @leo

I tried both those things, but still getting the same result as before

Hi Nick,

Could you share the project Universe link (if public) or the workspace and project ID? Is it the COCO dataset Universe link you shared? (I can’t seem to click on it)

Was it the same number of classes that were getting added to a specific split, or a different number?

Hey @leo; workspace is “Stitch”; project is “stitch-coco”

Was it the same number of classes that were getting added to a specific split, or a different number?
I think the same number

Hey @Nick_Arner, let me look into this for you and I’ll let you know when I have an update.

Awesome; thank you kindly

Hey @leo just wanted to see if you had any news on your end - thank you!

Hi @Nick_Arner

We’re still working on it. Can you confirm that you already tried to rebalance your dataset?

@leo thank you, Leo
Yes, I did try filtering out null images and rebalancing the dataset

Hi @Nick_Arner

Still working on your issue and haven’t found a clear cause or solution yet. In the meantime, I’ve filed a bug report so the team can take a deeper look.