What method is used in roboflow to perform the train, valid, and test split on the dataset?

MheadHero · April 14, 2022, 1:04am

I see there is no mentioned of what method is used in roboflow to perform the train, valid, and test split on the dataset? Anyone have idea?

Mohamed · April 14, 2022, 2:27pm

Hi,

Your train/valid/test split is set when assigning the images for labeling during the initial dataset upload (you can toggle the settings to set your split during upload).

You can also change your split when generating a new dataset version. Just note that augmenting your images will rebalance the train/valid/test split, as augmented images will be generated in your training set.

MheadHero · April 17, 2022, 12:25pm

I mean like, if we split it manually there methods such as randomly split/ stratified split. What about roboflow? What method it used to split dataset? How do I know if it split randomly to secure no bias?

Mohamed · April 20, 2022, 3:17pm

I can say that we have it written to split randomly

MheadHero · April 21, 2022, 7:48am

Where did you guys write this statement? I actually did search thoroughly maybe not its not that thorough tho.

Mohamed · April 21, 2022, 1:05pm

I’m not sure that it’s written explicitly, but I can attest that it is randomized in our system and duplicate image uploads are also left out to reduce the chances of train/test bleed.

Mohamed · April 26, 2022, 2:07am

Hi after talking to the team and digging deeper for confirmation on how the system works, this is what I learned:

It is still random which images we move (but we try to do it deterministically so that if you go from 70/20/10 to 70/10/20 then back to 70/20/10 you end up with the same 70/20/10 as you started with)

Notably, you can set your own train test split for specific batches of images on upload too: https://blog.roboflow.com/train-test-split-with-roboflow/

Topic		Replies	Views
How to Quickly Reallocate "Test train and Valid" Community Help	1	1656	July 26, 2023
How to shuffle images? Community Help split-after-upload	2	1220	November 20, 2023
Need Ability to randomly shuffle "merged dataset" when allocating Train/Val/Test Sets Feedback split-after-upload , bugs	3	829	October 9, 2023
One dataset for Testing after merging with another Community Help split-after-upload	3	633	May 2, 2022
Problem face when using Train/Test Split Community Help split-after-upload , bugs	5	1656	March 10, 2023

What method is used in roboflow to perform the train, valid, and test split on the dataset?

Related topics