After Merging Datasets, Re-balancing (Train/Val/Test) excludes multiple classes in VAL split

Sparrowtech · July 18, 2022, 8:48pm

After merging several object datasets (already uploaded to Roboflow) into one, I am unable to get a fairly balanced split between classes when generating a new version. After reviewing similar topics, a suggestion to manually move more images from one class into the “VAL” split is not feasible with ~17,500 total images.

Currently, have to export the entire “Merged” Dataset, properly shuffle/splits (e.g. 75/20/5) all objects into new Train/Val/Test splits… THEN upload AGAIN the properly balanced dataset to Roboflow while keeping the existing splits. Have no idea what happens when you add augmentations or if for some reason needed to adjust the balance between Train/Val/Test.

Previously asked about same issue, don’t believe ever received any further update after this response:

Please see below export stats and screenshots:

Train Object Stats:
5347 handgun
4648 rifle
4748 knife
1437 hammer

Val Object Stats :
0 handgun
0 rifle
995 knife
2587 hammer

josephofiowa · July 19, 2022, 2:34pm

Thanks for the note here.

If understood correctly, the desired outcome is a merged dataset with classes randomly shuffled among training, validation, and testing. Is that correct?

Sparrowtech · July 19, 2022, 8:36pm

Yes, almost. There is a bunch of terminology about what we are trying to insure is accomplished but generally speaking we desire a random “sampling” of each class such that there would be a balance of objects in the respective Train,Val,Test splits. Then shuffling those splits so there is no perceived order amongst classes.

E.G.

5 classes of objects in a merged Roboflow dataset - each with 100 images per class totaling 500 total images.

Tiger - 100 images
Lion -100 images
Elephant -100 images
Zebra - 100 images
Monkey -100 images

Seek to split 500 images into Train/Val/Test with 80/15/5 ratio.

OUTCOME WOULD BE:

Train:
Tiger 80
Lion 80
Elephant 80
Zebra 80
Monkey 80

Val:

Tiger 15
Lion 15
Elephant 15
Zebra 15
Monkey 15

Test:

Tiger 5
Lion 5
Elephant 5
Zebra 5
Monkey 5

In each of those balanced splits, all of the images would be shuffled.

Hope this makes sense and as you can see from the exported stats above, the splits are not balanced at all and even exclude entire object classes in the Val Split.

Thanks.

Sparrowtech · July 25, 2022, 7:09pm

Any update?

brad · July 26, 2022, 2:03am

You can accomplish this by uploading each class separately so you can choose the split per class. (Or if you upload them already organized in train valid and test folders we’ll parse those from the file path and let you keep those choices.

Sparrowtech · July 26, 2022, 5:06pm

Thanks Brad. We generally upload each class to Roboflow by itself and put all into train so can re-balance later upon export depending upon split-need. Pretty cumbersome to export already uploaded datasets to local machine, shuffle-split, then reload back to Roboflow, then merge, and finally generate new export… don’t you think?. Kinda makes your “re-balance” feature in Generate obsolete. Any chance can incorporate a “shuffle-sample” feature into re-balance dataset? Either behind the scene automatically, which I think everyone would want, or a button-slide feature to equally balance classes upon choosing the re-balance function?

Please advise and thanks.
Sean

25benjaminli · February 19, 2023, 4:12am

I agree - will this be fixed? The train val split feature is somewhat obsolete right now due to it being so imbalanced. Also, is there a way to re-split images after labeling is finished?

Jeff_Cochran · March 11, 2024, 5:18pm

As far as I can tell, they haven’t done anything to address this. Really not sure why the rebalancing wouldn’t be random by default; seems like a silly design choice.

Topic		Replies	Views
Need Ability to randomly shuffle "merged dataset" when allocating Train/Val/Test Sets Feedback split-after-upload , bugs	3	869	October 9, 2023
Splitting Multi-Class Dataset & Rebalancing create severly unbalanced Val/Test sets at Export Community Help split-after-upload , bugs	2	435	April 27, 2022
Perform Train/Test Split After Preprocessing and Split by Classes Feedback feature-request	4	250	May 16, 2025
How to Quickly Reallocate "Test train and Valid" Community Help	1	1859	July 26, 2023
Problem face when using Train/Test Split Community Help split-after-upload , bugs	5	1729	March 10, 2023

After Merging Datasets, Re-balancing (Train/Val/Test) excludes multiple classes in VAL split

Related topics