Issues with Exporting Florence-2 Batch Workflow Annotations for Training an Object Detection Model

Summary:
I am building a workflow that runs Florence-2 locally to automatically generate annotations for images. I used the batch processing feature to run the workflow on a dataset of images. I would like to export the resulting annotations together with the corresponding images so they can be uploaded back and used to train an object detection model for a new project.

The Workflow: I have successfully configured a Florence-2 model block within Roboflow Workflows to detect humans. Using the Batch Utility, I am able to generate high-volume inference results.

The Problem: I am unable to “bridge” these VLM results back into a Roboflow project as valid ground-truth annotations. My current outputs are either:

  • Visualized Images: The bounding boxes are “burnt-in” to the image pixels, making them uneditable and unusable for training.

  • JSON Metadata: I have the coordinate data, but I cannot find a way to re-upload this JSON alongside my raw images so that Roboflow recognizes them as actual labels.

The Goal: I need a way to ensure the “reasoning” from the VLM (Florence-2) persists as structured data that can be used to train an RF-DETR model. What should I do to turn these batch outputs into editable labels in the Annotate tab?

Workflow Link

Project Type: Object Detection

Operating System & Browser: Windows / Google Chrome

Do you grant Roboflow Support permission to access your Workspace for troubleshooting?: Yes

Hi @aarnavshah12 ,

If you plan to use Florence2 results for annotations in datasets, you’d need to convert the Florence2 results into dataset-compatible annotation formats (see here), so you’d be able to upload them to Roboflow (cli docs here).

If you want this automated, workflow would be like:

  • Start batch job (* batch CLI docs), wait for it to finish
  • Download batch results (Florence2 outputs), use same CLI tool
  • Convert Florence2 outputs to match dataset-compatible format
  • Upload images+annotations to Roboflow

Thoughts?

Hi Eric, any idea on how I can convert the Florence2 outputs to match dataset-compatible formats? That’s the main issue I’m facing.

Sure, could you please share an output from Florence2?

Yeah sure. The image I’ve uploaded is one of them. Also, I’m getting CSV files with information like so:

{“output_3”: “<deducted_image>”, “output_1”: “<deducted_image>”, “output_2”: {“image”: “<deducted_image>”}, “output_4”: {“image”: “<deducted_image>”}, “output_5”: false, “output_6”: {“image”: {“width”: 500, “height”: 413}, “predictions”: [{“width”: 228.0, “height”: 356.0, “x”: 219.0, “y”: 234.0, “confidence”: 1.0, “class_id”: 0, “class”: “humans”, “detection_id”: “44ee815b-a447-4866-ace3-00ca640b8610”, “parent_id”: “image”}]}, “output_7”: “ecd8c9c7-df2b-4e22-8fc9-e088920123f6”, “output_8”: {“error_status”: false, “predictions”: {“image”: {“width”: 500, “height”: 413}, “predictions”: [{“width”: 228.0, “height”: 356.0, “x”: 219.0, “y”: 234.0, “confidence”: 1.0, “class_id”: 0, “class”: “humans”, “detection_id”: “44ee815b-a447-4866-ace3-00ca640b8610”, “parent_id”: “image”}]}, “inference_id”: “ecd8c9c7-df2b-4e22-8fc9-e088920123f6”}, “output_9”: “{\“bboxes\”: [[105, 56, 333, 412]], \“bboxes_labels\”: [\“humans\”], \“polygons\”: [], \“polygons_labels\”: []}”, “output_10”: {“raw_output”: “{\“bboxes\”: [[105, 56, 333, 412]], \“bboxes_labels\”: [\“humans\”], \“polygons\”: [], \“polygons_labels\”: []}”, “parsed_output”: {“bboxes”: [[105, 56, 333, 412]], “bboxes_labels”: [“humans”], “polygons”: [], “polygons_labels”: []}, “classes”: [“humans”]}, “output_11”: {“bboxes”: [[105, 56, 333, 412]], “bboxes_labels”: [“humans”], “polygons”: [], “polygons_labels”: []}, “output_12”: [“humans”]}

Hi @aarnavshah12 , you should ask an LLM/coding agent to convert the output you sent to eg coco-json dataset format, 2 things to note:

  • Correct mapping between bbox_labels (“humans”, output from florence2) to class id in your dataset. So eg. “humans” → class_id=0
  • Mapping between batch outputs and input images, so you can correctly assign annotations to image ids

Once you have coco-json annotations and images you can use CLI to upload the dataset to Roboflow platform

Mhm. Thanks! I understand!

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.