Assistance Needed: Invalid RLE Mask Representation in New Dataset

Hi everyone,

I’m currently facing an issue with a dataset I generated using Roboflow, and I’m hoping someone here can help me resolve it. Here’s a summary of the problem and what I’ve done so far to debug it:

Issue:

I have a segmentation dataset that I generated about a month ago (original_dataset.json) (Dataset Generated on Nov 29, 2024), which works perfectly fine. However, I recently generated a new version of the dataset (new_dataset.json) (Dataset Generated on Dec 31, 2024), and when I try to use it with SAM2.1, I encounter the following error:
ValueError: Invalid RLE mask representation

This error occurs specifically with the new dataset, and I’ve confirmed that the issue lies in the RLE encoding of the segmentation masks in new_dataset.json.

What I’ve Done to Debug:

  1. Compared the Datasets:

    • I compared the original_dataset.json and new_dataset.json files and noticed differences in the RLE encoding format of the segmentation field.
  2. Attempted to Fix the RLE Encoding:

    • I wrote a Python script using pycocotools.mask.decode and encode to re-encode the masks in the new dataset. However, some annotations (IDs 27, 29, 32, and 33) still fail to decode, throwing the same Invalid RLE mask representation error.
  3. Skipped Problematic Annotations:

    • I updated my script to skip problematic annotations and log them for further inspection. The problematic annotations are: 27, 29, 32, and 33.

Request for Assistance:

Could anyone help me understand why the RLE encoding in the new dataset is invalid? Specifically:

  • Are there known issues with RLE encoding in newer versions of Roboflow?
  • How can I ensure that the RLE encoding in the new dataset matches the format expected by pycocotools?
  • Is there a way to regenerate the dataset with correct RLE encoding?

Here’s a snippet of the problematic RLE encoding from new_dataset.json for annotation ID 27:

"segmentation": {
    "counts": "QoQ3o0Qo0PPWMPPi2PPi2PPWM0000000000000000PPnLPPR3PPR3PPnL0000000000000000PPeLPP[3PP[3PPeL0000000000000000PP\\LXPd3hoc3PP\\L0000000000000000PPSLPPm3PPm3PPSL0000000000000000PPjKPPV4PPV4PPjK000000000000PPUKPPk4PPk4PPUKPPaKPP_4PP_4PPaK0000000000000000PPXKPPh4PPh4PPXK0000000000000000PPoJPPQ5PPQ5PPoJ0000000000000000PPfJPPZ5PPZ5PPfJ0006J00000000000PP]JPPc5PPc5PP]J0000000000000000PPTJPPl5PPl5PPTJ0000000000000000PPkIPPU6PPU6PPkI0000000O10000000PPbIPP^6PP^6PPbI0000000000000000PPYIPPg6PPg6PPYI0000000000000000PPPIPPP7PPP7PPPI000002N5K0000000QPgHooX7ooX7QPgH0000000000000000QP^Hooa7ooa7QP^H0000000000000000QPUHooj7ooj7QPUHPPZGPPf8PPf8PPZG000000000000PPlGPPT8PPT8PPlG00000N2000000000PPcGPP]8PP]8PPcG0000000000000000PPZGPPf8PPf8PPZG000000000004L000PPQGPPo8PPo8PPQG0000000000000000PPhFPPX9PPX9PPhF0000000000000000PP_FPPa9PPa9PP_F0000000000000000PPVFPPj9PPj9PPVF0000000000000000PPmEPPS:PPS:PPmE0000000000000000PPdEPP\\:PP\\:PPdE0000000000000000PP[EPPe:PPe:PP[E0000000000000000PPREPPn:PPn:PPRE0000000000PPhCPPX<PPX<PPhC00PPiDPPW;PPW;PPiD0000000000000000PP`DPP`;PP`;PP`D0000000000000000PPWDPPi;PPi;PPWD0000000000000000PPnCPPR<PPR<PPnC0000000000PPhCWOiPX<0000000PPdNPP\\1PP\\1PPdN0000PP\\NPPd1PPd1PP\\N00000000PP[NPPe1PPe1PP[N0001O0000002N000PPRNPPn1PPn1PPRN0000000000000000PPiMPPW2PPW2PPiM0000000000000000PP`MPP`2PP`2PP`M0000000000000nPnl0",
    "size": [1024, 1024]
}

Additional Context:

  • The original_dataset.json works perfectly fine with the same code and tools.
  • I’m using pycocotools for decoding and encoding the masks.
  • The problematic annotations seem to have corrupted or improperly formatted RLE encoding.

Required Info:

  • Project Type: Instance Segmentation
  • Operating System & Browser: Windows / Google Chrome
  • Project ID: black-parts-panorama-image

Also, I made sure the original dataset has the exact same augmentations applied on the new dataset, and the error still persists.

Version of pycocotools is 2.0.8

Any guidance or assistance would be greatly appreciated!

Thanks in advance,
Faris

I am facing a similar issue.

Same issue, i thought it was my bad. Finally, my masks are valid, since exporting in coco format works fine. Already have faced with issue on 28.12.

How have you fixed some wrong masks? I would be grateful if you share it.

Hi all! Thanks for all the detailed reports! I found a recent optimization was generating wrong RLE masks. :disappointed:

I reverted the change and I believe you’ll be able to generate good masks again.

Please, generate new versions and export from them. If you export again in the same version, it’ll just serve the same file already generated.

Let me know if it works!

I’ve tested the new version, from basic testing, it works. Im not sure about training SAM2.1 though now as I’m facing issues in regards to the yaml config file.