COCO pretrained YOLO v8 debugging (class index issues)

I’m using a YOLOv8 pretrained on COCO on my class dataset, focused on 3 classes that are also in COCO. Using Roboflow webapp Grounding Dino annotater I annotated a dataset on bicycles, boats, cars. This dataset is indexed, after extracting, as 0,1,2 respectively, because I extracted it as YOLOv8. I need it as YOLOv8, because after running it like this, I will fine-tune using that dataset.

This is not the same as COCO, where those 3 classes have 1,2,8 as index. Now I’m facing issues when Im validating on my test dataset labels. The data is running, predicting correctly and locating the labels for my test data correctly.

image 28/106 test-127-_jpg.rf.08a36d5a3d959b4abe0e5a267f293f59.jpg: Predicted: 1 boat [GT: 1 boat]
image 29/106 test-128-_jpg.rf.bf3f57e995e27e68da74691a1c30effd.jpg: Predicted: 1 boat [GT: 1 boat]
image 30/106 test-129-_jpg.rf.01163a19c5b241dcd9fbb765afae533c.jpg: Predicted: 4 boat [GT: 2 boat]
image 31/106 test-13-_jpg.rf.40a610771968be6fda3931ec1063182f.jpg: Predicted: 2 boat [GT: 1 boat]
image 32/106 test-130-_jpg.rf.296913d2a5cb563a4e81f7e656adac59.jpg: Predicted: 7 boat [GT: 3 boat]
image 33/106 test-14-_jpg.rf.b53326d248c7e0bb309ea45292d49102.jpg: Predicted: 3 bicycle [GT: 1 bicycle]

GT shows that the ground truth label is the same as the one predicted. However.

               all        106         86      0.381      0.377      0.384      0.287
           bicycle         21         25          0          0   0.000833    0.00066
               car         54         61      0.762      0.754      0.767      0.572

Speed: 6.1ms preprocess, 298.4ms inference, 0.0ms loss, 4.9ms postprocess per image
Results saved to runs/detect/val16

— Evaluation Metrics —
mAP50: 0.3837555367935218
mAP50-95: 0.28657243641136704
This statistics showw that boats was not even validated and bicycle was indexed wrong. But again, it is finding the right labels (according to the GT values).

Does anyone know how to fix this?

Hi @T_O!
I am sorry you have run into this issue! To resolve this, you’ll need to remap your dataset’s class indices to match the original COCO indexing, since your model was pretrained on COCO.

Here’s a sample starter script that remaps your class IDs and updates your YOLO label files (assuming your label files are stored locally and in the standard YOLO directory structure).

import os

# Mapping: Roboflow export index → COCO index
index_map = {0: 1, 1: 8, 2: 2}

# List of label directories to update
label_dirs = [
    "path/to/train/labels",
    "path/to/val/labels",
    "path/to/test/labels"
]

for label_dir in label_dirs:
    if not os.path.exists(label_dir):
        print(f"Directory not found: {label_dir}")
        continue

    for fname in os.listdir(label_dir):
        if not fname.endswith(".txt"):
            continue

        fpath = os.path.join(label_dir, fname)
        with open(fpath, "r") as f:
            lines = f.readlines()

        new_lines = []
        for line in lines:
            parts = line.strip().split()
            if not parts:
                continue
            old_cls = int(parts[0])
            new_cls = index_map.get(old_cls, old_cls)
            new_line = " ".join([str(new_cls)] + parts[1:])
            new_lines.append(new_line)

        with open(fpath, "w") as f:
            f.write("\n".join(new_lines) + "\n")

        print(f"Updated: {fpath}")

Thank you for contributing to the Roboflow community, happy building!!

I did this already, didn’t help. This is how I matched the ground truth to what it predicted. I also dont mind exporting the dataset just for this model in COCO format. However that just totally doesn’t work.

Hi @T_O!
The Roboflow Platform will be able to handle this issue for you.

Unfortunately, I am unable to troubleshoot custom code outside of the Roboflow Platform.

Thank you for contributing to the Roboflow Community!!

I think i found the problem. YOLOv8 coco wants you to use all classes (and does), afterwards you can filter out the rest. This means your yaml has to include them and means that it’s not a true comparison between, for example, fine tuned models just on those classes (performance is better the least classes finetuned).

I have not found a way around this, so I will just use this I guess.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.