Hi everyone,
I’m currently facing an issue with a dataset I generated using Roboflow, and I’m hoping someone here can help me resolve it. Here’s a summary of the problem and what I’ve done so far to debug it:
Issue:
I have a segmentation dataset that I generated about a month ago (original_dataset.json) (Dataset Generated on Nov 29, 2024), which works perfectly fine. However, I recently generated a new version of the dataset (new_dataset.json) (Dataset Generated on Dec 31, 2024), and when I try to use it with SAM2.1, I encounter the following error:
ValueError: Invalid RLE mask representation
This error occurs specifically with the new dataset, and I’ve confirmed that the issue lies in the RLE encoding of the segmentation masks in new_dataset.json
.
What I’ve Done to Debug:
-
Compared the Datasets:
- I compared the
original_dataset.json
andnew_dataset.json
files and noticed differences in the RLE encoding format of thesegmentation
field.
- I compared the
-
Attempted to Fix the RLE Encoding:
- I wrote a Python script using
pycocotools.mask.decode
andencode
to re-encode the masks in the new dataset. However, some annotations (IDs 27, 29, 32, and 33) still fail to decode, throwing the sameInvalid RLE mask representation
error.
- I wrote a Python script using
-
Skipped Problematic Annotations:
- I updated my script to skip problematic annotations and log them for further inspection. The problematic annotations are: 27, 29, 32, and 33.
Request for Assistance:
Could anyone help me understand why the RLE encoding in the new dataset is invalid? Specifically:
- Are there known issues with RLE encoding in newer versions of Roboflow?
- How can I ensure that the RLE encoding in the new dataset matches the format expected by
pycocotools
? - Is there a way to regenerate the dataset with correct RLE encoding?
Here’s a snippet of the problematic RLE encoding from new_dataset.json
for annotation ID 27:
"segmentation": {
"counts": "QoQ3o0Qo0PPWMPPi2PPi2PPWM0000000000000000PPnLPPR3PPR3PPnL0000000000000000PPeLPP[3PP[3PPeL0000000000000000PP\\LXPd3hoc3PP\\L0000000000000000PPSLPPm3PPm3PPSL0000000000000000PPjKPPV4PPV4PPjK000000000000PPUKPPk4PPk4PPUKPPaKPP_4PP_4PPaK0000000000000000PPXKPPh4PPh4PPXK0000000000000000PPoJPPQ5PPQ5PPoJ0000000000000000PPfJPPZ5PPZ5PPfJ0006J00000000000PP]JPPc5PPc5PP]J0000000000000000PPTJPPl5PPl5PPTJ0000000000000000PPkIPPU6PPU6PPkI0000000O10000000PPbIPP^6PP^6PPbI0000000000000000PPYIPPg6PPg6PPYI0000000000000000PPPIPPP7PPP7PPPI000002N5K0000000QPgHooX7ooX7QPgH0000000000000000QP^Hooa7ooa7QP^H0000000000000000QPUHooj7ooj7QPUHPPZGPPf8PPf8PPZG000000000000PPlGPPT8PPT8PPlG00000N2000000000PPcGPP]8PP]8PPcG0000000000000000PPZGPPf8PPf8PPZG000000000004L000PPQGPPo8PPo8PPQG0000000000000000PPhFPPX9PPX9PPhF0000000000000000PP_FPPa9PPa9PP_F0000000000000000PPVFPPj9PPj9PPVF0000000000000000PPmEPPS:PPS:PPmE0000000000000000PPdEPP\\:PP\\:PPdE0000000000000000PP[EPPe:PPe:PP[E0000000000000000PPREPPn:PPn:PPRE0000000000PPhCPPX<PPX<PPhC00PPiDPPW;PPW;PPiD0000000000000000PP`DPP`;PP`;PP`D0000000000000000PPWDPPi;PPi;PPWD0000000000000000PPnCPPR<PPR<PPnC0000000000PPhCWOiPX<0000000PPdNPP\\1PP\\1PPdN0000PP\\NPPd1PPd1PP\\N00000000PP[NPPe1PPe1PP[N0001O0000002N000PPRNPPn1PPn1PPRN0000000000000000PPiMPPW2PPW2PPiM0000000000000000PP`MPP`2PP`2PP`M0000000000000nPnl0",
"size": [1024, 1024]
}
Additional Context:
- The
original_dataset.json
works perfectly fine with the same code and tools. - I’m using
pycocotools
for decoding and encoding the masks. - The problematic annotations seem to have corrupted or improperly formatted RLE encoding.
Required Info:
- Project Type: Instance Segmentation
- Operating System & Browser: Windows / Google Chrome
- Project ID: black-parts-panorama-image
Also, I made sure the original dataset has the exact same augmentations applied on the new dataset, and the error still persists.
Version of pycocotools
is 2.0.8
Any guidance or assistance would be greatly appreciated!
Thanks in advance,
Faris