So basically I wanted to output the class label distribution according to train/validation/test set. But if using the YOLOv5 PyTorch format, I have no idea how to code it.
Hi @mheadhero, you’d need to write a script to do this count. I probably wouldn’t use the
YOLOv5 PyTorch format because then you have to deal with mapping numeric class identifiers back to strings via the labelmap.
YOLOv5 Oriented Bounding Boxes format is similar but includes the labels in the annotations directly so it makes this task easier.
Here’s an example of doing it using bash with the playing cards dataset from Roboflow Universe.
Playing Cards.v1-v1.yolov5-obb/train/labelTxt directory, running this command:
ls | grep .txt | xargs cat | cut -d" " -f9 | sort | uniq -c | sort -nr
Does the following:
- List all the files
- Find all the .txt files
- Output their contents
- Split by space & grab the 9th column (the class name)
- Sort them so all the same classes are next to each other
- Count the number of lines in a row that are the same
- Sort them in descending order by the class with the most examples
So for the
train set this gives me the following output:
1374 QS 1367 5C 1335 4D 1329 7S 1320 4C 1308 6S 1296 4H 1295 9D 1286 5D 1281 QD 1278 3C 1274 5S 1271 10S 1269 10H 1268 AD 1266 9H 1262 JC 1261 KS 1261 3S 1260 7H 1259 KD 1257 6D 1253 AS 1232 7C 1227 AC 1225 10D 1221 8D 1217 2D 1215 QC 1214 2S 1211 KH 1203 KC 1196 QH 1188 8H 1182 2H 1180 3H 1174 9S 1173 4S 1166 AH 1162 10C 1157 JH 1149 2C 1145 8C 1143 JS 1143 7D 1140 5H 1134 3D 1121 6H 1121 6C 1112 JD 1108 9C 1091 8S
Meaning there are 1374 Queen of Spades, 1367 Five of Clubs, etc.
You can do this in the
test label directories as well to get counts for those.