Assistance Needed: Invalid RLE Mask Representation in New Dataset

Faris-Faiz · December 30, 2024, 7:33pm

Hi everyone,

I’m currently facing an issue with a dataset I generated using Roboflow, and I’m hoping someone here can help me resolve it. Here’s a summary of the problem and what I’ve done so far to debug it:

Issue:

I have a segmentation dataset that I generated about a month ago (original_dataset.json) (Dataset Generated on Nov 29, 2024), which works perfectly fine. However, I recently generated a new version of the dataset (new_dataset.json) (Dataset Generated on Dec 31, 2024), and when I try to use it with SAM2.1, I encounter the following error:
ValueError: Invalid RLE mask representation

This error occurs specifically with the new dataset, and I’ve confirmed that the issue lies in the RLE encoding of the segmentation masks in new_dataset.json.

What I’ve Done to Debug:

Compared the Datasets:
- I compared the original_dataset.json and new_dataset.json files and noticed differences in the RLE encoding format of the segmentation field.
Attempted to Fix the RLE Encoding:
- I wrote a Python script using pycocotools.mask.decode and encode to re-encode the masks in the new dataset. However, some annotations (IDs 27, 29, 32, and 33) still fail to decode, throwing the same Invalid RLE mask representation error.
Skipped Problematic Annotations:
- I updated my script to skip problematic annotations and log them for further inspection. The problematic annotations are: 27, 29, 32, and 33.

Request for Assistance:

Could anyone help me understand why the RLE encoding in the new dataset is invalid? Specifically:

Are there known issues with RLE encoding in newer versions of Roboflow?
How can I ensure that the RLE encoding in the new dataset matches the format expected by pycocotools?
Is there a way to regenerate the dataset with correct RLE encoding?

Here’s a snippet of the problematic RLE encoding from new_dataset.json for annotation ID 27:

"segmentation": {
    "counts": "QoQ3o0Qo0PPWMPPi2PPi2PPWM0000000000000000PPnLPPR3PPR3PPnL0000000000000000PPeLPP[3PP[3PPeL0000000000000000PP\\LXPd3hoc3PP\\L0000000000000000PPSLPPm3PPm3PPSL0000000000000000PPjKPPV4PPV4PPjK000000000000PPUKPPk4PPk4PPUKPPaKPP_4PP_4PPaK0000000000000000PPXKPPh4PPh4PPXK0000000000000000PPoJPPQ5PPQ5PPoJ0000000000000000PPfJPPZ5PPZ5PPfJ0006J00000000000PP]JPPc5PPc5PP]J0000000000000000PPTJPPl5PPl5PPTJ0000000000000000PPkIPPU6PPU6PPkI0000000O10000000PPbIPP^6PP^6PPbI0000000000000000PPYIPPg6PPg6PPYI0000000000000000PPPIPPP7PPP7PPPI000002N5K0000000QPgHooX7ooX7QPgH0000000000000000QP^Hooa7ooa7QP^H0000000000000000QPUHooj7ooj7QPUHPPZGPPf8PPf8PPZG000000000000PPlGPPT8PPT8PPlG00000N2000000000PPcGPP]8PP]8PPcG0000000000000000PPZGPPf8PPf8PPZG000000000004L000PPQGPPo8PPo8PPQG0000000000000000PPhFPPX9PPX9PPhF0000000000000000PP_FPPa9PPa9PP_F0000000000000000PPVFPPj9PPj9PPVF0000000000000000PPmEPPS:PPS:PPmE0000000000000000PPdEPP\\:PP\\:PPdE0000000000000000PP[EPPe:PPe:PP[E0000000000000000PPREPPn:PPn:PPRE0000000000PPhCPPX<PPX<PPhC00PPiDPPW;PPW;PPiD0000000000000000PP`DPP`;PP`;PP`D0000000000000000PPWDPPi;PPi;PPWD0000000000000000PPnCPPR<PPR<PPnC0000000000PPhCWOiPX<0000000PPdNPP\\1PP\\1PPdN0000PP\\NPPd1PPd1PP\\N00000000PP[NPPe1PPe1PP[N0001O0000002N000PPRNPPn1PPn1PPRN0000000000000000PPiMPPW2PPW2PPiM0000000000000000PP`MPP`2PP`2PP`M0000000000000nPnl0",
    "size": [1024, 1024]
}

Additional Context:

The original_dataset.json works perfectly fine with the same code and tools.
I’m using pycocotools for decoding and encoding the masks.
The problematic annotations seem to have corrupted or improperly formatted RLE encoding.

Required Info:

Project Type: Instance Segmentation
Operating System & Browser: Windows / Google Chrome
Project ID: black-parts-panorama-image

Also, I made sure the original dataset has the exact same augmentations applied on the new dataset, and the error still persists.

Version of pycocotools is 2.0.8

Any guidance or assistance would be greatly appreciated!

Thanks in advance,
Faris

Karan_Santra · December 30, 2024, 9:23pm

I am facing a similar issue.

komsstudent · December 31, 2024, 9:41am

Same issue, i thought it was my bad. Finally, my masks are valid, since exporting in coco format works fine. Already have faced with issue on 28.12.

How have you fixed some wrong masks? I would be grateful if you share it.

iurisilvio · December 31, 2024, 2:47pm

Hi all! Thanks for all the detailed reports! I found a recent optimization was generating wrong RLE masks.

I reverted the change and I believe you’ll be able to generate good masks again.

Please, generate new versions and export from them. If you export again in the same version, it’ll just serve the same file already generated.

Let me know if it works!

Faris-Faiz · January 2, 2025, 7:03pm

I’ve tested the new version, from basic testing, it works. Im not sure about training SAM2.1 though now as I’m facing issues in regards to the yaml config file.

system · January 23, 2025, 7:04pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Downloaded SAM 2 annotations are incorrect? Community Help bugs , export	2	80	November 27, 2024
Instance segmentation dataset with COCO RLE annotations not uploading Community Help formats	0	193	May 10, 2023
Downloaded dataset only include README.dataset.txt and README.roboflow.txt Community Help bugs , export	13	936	February 28, 2023
Training Issues Community Help	5	637	February 5, 2024
Error with annotations when generating new version Community Help bugs , convert	15	1863	January 21, 2024