Need Ability to randomly shuffle "merged dataset" when allocating Train/Val/Test Sets

Sparrowtech · April 25, 2022, 12:55pm

First off… Roboflow is an awesome platform!!

An issue I discovered concerns the distribution of images when creating or re-balancing the dataset for Train-Val-Test. It appears that there is no random shuffle and in most cases, we are ending up with a Validation Set solely consisting of only one object class. For example, a dataset consisting of 5 object classes (car, bus, van, truck, boat) that were independent datasets uploaded to Roboflow and then subsequently “merged” and then re-balanced (80/15/5) when creating a new version results in the Validation set solely consisting of e.g. “bus” and no other images from any other object class. Same goes for the Test Set which isn’t really that big of a deal but clearly we would want to have a good balance of all classes when performing validation.

Don’t think I have missed some function inside of Roboflow but if so, please advise. Having to export 5 classes after being merged, apply random shuffle before splitting into Train/Val/Test, and then having to re-upload the properly balanced dataset before generating an formatted export seems counter productive.

Please advise and as always, keep up the great work.

Emiliyan_Gospodinov · May 24, 2022, 11:54am

I also discovered the same issue yesterday evening, have you found a solution for that? Thanks in advance

brad · May 24, 2022, 12:29pm

Thanks for reporting.

This sounds like a recent regression stemming from us keeping uploaded images in sequential order to make it easier for annotators to label sequences of video frames.

Let me look at the code and see what I can do to restore the randomness for the model while preserving the ordering for humans in the tool.

Fredrik_T · October 9, 2023, 4:36am

What happened here? I’m seeing the same behavior still.

Topic		Replies	Views
After Merging Datasets, Re-balancing (Train/Val/Test) excludes multiple classes in VAL split Feedback split-after-upload , bugs	7	2667	March 11, 2024
Splitting Multi-Class Dataset & Rebalancing create severly unbalanced Val/Test sets at Export Community Help split-after-upload , bugs	2	425	April 27, 2022
How to shuffle images? Community Help split-after-upload	2	1244	November 20, 2023
How to Quickly Reallocate "Test train and Valid" Community Help	1	1755	July 26, 2023
Problem face when using Train/Test Split Community Help split-after-upload , bugs	5	1693	March 10, 2023

Need Ability to randomly shuffle "merged dataset" when allocating Train/Val/Test Sets

Related topics