How to upload multimodal (vision language) dataset

Samuel_Lima_Braz · February 3, 2025, 7:57pm

I have a dataset of invoices with images and data extraction in JSON format. Like an OCR. I created a folder where there is a subfolder with all the images in JPG and a file annotaitons.jsonl, where each line contains:

{
'image': "path/to/image.jpg",
'prefix': "",
'suffix': "extracted data"
}

I’m trying to upload this to Roboflow, but it’s not interpreting the annotations. Either I just keep the images and go to manual annotation, or it gets stuck on this screen

Should I upload these files somewhere and exchange the ‘image’ key with the urls? Should I use another format?

leandro_roboflow · February 3, 2025, 9:09pm

Hello @Samuel_Lima_Braz, welcome to the forum, and thank you very much for contacting us.

I would like to ask two things so that we can validate this. Could you please provide the workspace and the project so that we can get more details?

But looking at your JSON format, it seems that the problem is there. You used single quotes instead of double quotes, which is invalid JSON. If that is the case, we can easily solve the problem.

But if it still doesn’t work, please contact me again and I will be happy to help you.

Best regards,
Leandro Rosemberg

Samuel_Lima_Braz · February 3, 2025, 9:25pm

Now I was able to upload using the CLI. Actually, I was using double quotes, but I mistyped them in the question.

The format I used was:

{
  "image": "20210322_172436.jpg",
  "prefix": "<JSON>",
  "suffix": "{\"company\": \"intermarche\", \"date\": \"2019/06/20\", \"address\": \"armação de pera\", \"total\": \"24.99\", \"invoice_number\": \"0EAA061219/134\", \"buyer_nif\": \"514407395\", \"vat_value\": \"4.67\", \"seller_nif\": \"508162294\"}"
}

However, it didn’t work through the interface—or at least there was no indication that any upload was happening. With 814 files, using the CLI command roboflow import -w tech-ysdkk -p brazilian-documents data/invoices/ worked perfectly.

system · February 10, 2025, 9:25pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to upload json annotation file format for Instance Segmentation Community Help bugs , feature-request , segmentation , formats , convert	3	2946	March 7, 2023
Upload API question Community Help	11	379	February 15, 2022
COCO dataset format upload Community Help formats	1	664	September 17, 2023
Specifying Annotation format from Python SDK Community Help formats	4	417	December 26, 2023
Cannot upload image through Roboflow API Community Help	9	1721	September 9, 2023

How to upload multimodal (vision language) dataset

Related topics