I am working on Yolov8 and SAM. I am trying to use the same annotations for both the models.
We wanted to convert the annotations that we got done on your platform, to masks to train SAM. However, when I use the code that uses supervision library for(pasting it below). My masks are crooked and not like how they are annotated.
## treat all labels as masks/polygons (vs how ultralytics converts some to bboxes)
# import supervision (open source set of cvutils)
import supervision as sv
# grab our data
project = rf.workspace("").project("")
dataset = project.version(3).download("yolov8")
# for each image, load YOLO annotations and require mask format for each
for subset in ["train", "test", "valid"]:
ds = sv.DetectionDataset.from_yolo(
images_directory_path=f"{dataset.location}/{subset}/images",
annotations_directory_path=f"{dataset.location}/{subset}/labels",
data_yaml_path=f"{dataset.location}/data.yaml",
force_masks=True
)
ds.as_yolo(annotations_directory_path=f"{dataset.location}/{subset}/labels")
After doing this, I took a look at the txt file before and after.
This is before:
0 0.24367703857421874 0.5947080708007813 0.32577574609375 0.5946408608398438 0.3256521215820313 0.584001533203125 0.38616494580078126 0.5838778208007812 0.38616494580078126 0.44245848974609375 0.24366789990234375 0.4424227666015625 0.24367703857421874 0.5947080708007813
0 0.4681246420898437 0.576236357421875 0.5384933212890625 0.5761529790039063 0.5382433774414063 0.4423616875 0.3912994501953125 0.4423616875 0.39114250390625 0.594635390625 0.4681246420898437 0.5945798715820313 0.4681246420898437 0.576236357421875
0 0.19015809912109374 0.532453453125 0.179838498046875 0.5323361000976562 0.17960396533203124 0.49474487890625 0.049611380859375 0.49474487890625 0.04761782568359375 0.4960357807617187 0.02815130517578125 0.49615313818359374 0.028172552734375 0.6419591596679688 0.19027537451171875 0.6423729399414062 0.19015809912109374 0.532453453125
0 0.20617587744140625 0.6494069682617187 0.11866647216796875 0.649729455078125 0.11880823681640625 0.6924327993164062 0.09655086279296875 0.6925746689453125 0.0968343916015625 0.65015507421875 0.0251399677734375 0.6496627353515625 0.024852654296875 0.8023033012695312 0.24352058984375 0.8023752939453125 0.24407509423828125 0.6602119057617187 0.20595053466796875 0.6604566748046875 0.20617587744140625 0.6494069682617187
0 0.11568 0.6509393481445312 0.100116763671875 0.6509393481445312 0.10020867333984375 0.6883252021484375 0.1157992568359375 0.6883736821289063 0.11568 0.6509393481445312
This is after:
0 0.24365 0.44238 0.24365 0.59424 0.32568 0.59424 0.32568 0.58936 0.32520 0.58887 0.32520 0.58447 0.32568 0.58398 0.35547 0.58398 0.35596 0.58350 0.38574 0.58350 0.38574 0.44238
0 0.39111 0.44189 0.39111 0.59424 0.46777 0.59424 0.46777 0.57666 0.46826 0.57617 0.50293 0.57617 0.50342 0.57568 0.53809 0.57568 0.53809 0.44189
0 0.04932 0.49463 0.04883 0.49512 0.04834 0.49512 0.04785 0.49561 0.03809 0.49561 0.03760 0.49609 0.02783 0.49609 0.02783 0.64160 0.10889 0.64160 0.10938 0.64209 0.18994 0.64209 0.18994 0.53223 0.18018 0.53223 0.17969 0.53174 0.17969 0.51367 0.17920 0.51318 0.17920 0.49463
0 0.16260 0.64893 0.16211 0.64941 0.11865 0.64941 0.11865 0.69189 0.11816 0.69238 0.09668 0.69238 0.09619 0.69189 0.09619 0.67139 0.09668 0.67090 0.09668 0.64990 0.06104 0.64990 0.06055 0.64941 0.02490 0.64941 0.02490 0.72559 0.02441 0.72607 0.02441 0.80225 0.24316 0.80225 0.24316 0.73145 0.24365 0.73096 0.24365 0.66016 0.20605 0.66016 0.20557 0.65967 0.20557 0.65479 0.20605 0.65430 0.20605 0.64893
0 0.10010 0.65088 0.10010 0.68799 0.11572 0.68799 0.11572 0.66992 0.11523 0.66943 0.11523 0.65088
0 0.24854 0.64941 0.24854 0.80176 0.39014 0.80176 0.39014 0.73145 0.39062 0.73096 0.39062 0.66016 0.33008 0.66016 0.32959 0.65967 0.32959 0.64941
You can see that the values change quite a lot. However the problem is when I convert these into masks.
right side is the mask visualized before passing to sv library. left side is after passing through library. You can see my mask on the left is crooked, and not straight. This is due to int rounding, I guess. I am not sure.
I wanted some clarity on why the annotation values in txt change so much after using supervision. Also, I can generate the masks without passing to sv and get done for the use case of SAM. However, now that we are going to feed these to YOLOv8(I need to use sv as there is an ultralytics mismatch problem), the model could suffer due to the 1-2 pixel error that is happening.
please advice.