If I fine-tune Paligemma, can I use it with inference.js?

DHDPIC · December 4, 2024, 1:54pm

Question is in the title: I’d like to get paligemma working on a downloaded web model using inference.js. Is this possible? Am I required to fine-tune it to get a model on to roboflow?

I was looking at this resource:

peter_roboflow · December 8, 2024, 4:52am

We meet again!

The good news is that there are already PaliGemma base weights uploaded to roboflow at Paligemma Pretrains Object Detection Dataset and Pre-Trained Model by paligemma

The bad news is that it’s not technically feasible to have these running in the browser – these are super large models that take a long time to do inference with.

PaliGemma’s smallest size is 3 billion parameters, which is about 1000 times larger than a yolov8n which has 3 million parameters. This is due in part to the 2B paramater language model baked in.

It is possible to get PaliGemma running in real-time on a very powerful GPU as in GitHub - sumo43/loopvlm: run paligemma in real time, but that repo is using a lot of specific tricks and a very powerful GPU.

You might like to check out our Florence-2 offerings, (which are also fine-tunable in app), which come in at a 10th of the size of PaliGemmas, but still would be impossible to run in real time in the browser right now.

You might also be interested to hear that it’s possible to fine-tune PaliGemma 2 in app, as well.

DHDPIC · December 11, 2024, 6:41pm

Indeed! Thanks so much for this, I had not considered the size of the model.

I see I can deploy Paligemma on to a device, but can Paligemma (or Florence-2) be accessed through your hosted API? I see self-hosted but I am not familiar with the hardware necessary for this.

I basically want a website which gives visitors access to VLM/VQA/Object Detection.

Thanks, David

peter_roboflow · December 19, 2024, 8:03am

Paligemma and Florence aren’t currently available through hosted api, but you can get an endpoint to hit them through our dedicated deployments

Goonsoo_Kim · December 24, 2024, 9:05am

Hi peter_roboflow,

You mean that powerful GPU is needed to run fined-tuned PaliGemma model ?
Can you tell me how powerful it should be ?

In my side, it takes 1 sec on A100, 3 sec on T4 GPU when running my PaliGemma2 for Object Detection.
But it takes over 14 sec when wrong prediction. I think I need to fine tune the model again with more data.

My aim is running fine-tuned PaliGemma2 on Jetson Orin Nano.
As I know, LLM 7B is possible on Jetson Orin Nano.
But I wonder if PaliGemma2 3B Object Detection is possible on this edge.

If you have any thoughts about this, could you please let me know?

By the way,
Now, I am trouble in deploying my fine-tuned PaliGemma model to Roboflow.
After finishing fine-tuning and version.deploy() , I can see the message in notebook like below.
…
Share your model with the world at: How to Use the find bottle2 Object Detection API
…
But when I go to the model page by click the url , it shows model loading … forever.

Could you give me help?

Does it be related with my free account?

Thanks

system · January 14, 2025, 9:05am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
I fine tuned PaliGemma model by watching your video Community Help	1	122	June 28, 2024
Unable to inference Florence-2 Community Help	0	27	January 13, 2025
Problems inferencing any Yolo model with the Webcam sample Community Help	3	666	May 10, 2023
Live inference using webcam doesn't work Community Help bugs	1	479	April 28, 2023
Can't get inference.js to work Community Help	18	69	November 28, 2024

If I fine-tune Paligemma, can I use it with inference.js?

Related topics