If I fine-tune Paligemma, can I use it with inference.js?

Question is in the title: I’d like to get paligemma working on a downloaded web model using inference.js. Is this possible? Am I required to fine-tune it to get a model on to roboflow?

I was looking at this resource:

We meet again!

The good news is that there are already PaliGemma base weights uploaded to roboflow at Paligemma Pretrains Object Detection Dataset and Pre-Trained Model by paligemma

The bad news is that it’s not technically feasible to have these running in the browser – these are super large models that take a long time to do inference with.

PaliGemma’s smallest size is 3 billion parameters, which is about 1000 times larger than a yolov8n which has 3 million parameters. This is due in part to the 2B paramater language model baked in.

It is possible to get PaliGemma running in real-time on a very powerful GPU as in GitHub - sumo43/loopvlm: run paligemma in real time, but that repo is using a lot of specific tricks and a very powerful GPU.

You might like to check out our Florence-2 offerings, (which are also fine-tunable in app), which come in at a 10th of the size of PaliGemmas, but still would be impossible to run in real time in the browser right now.

You might also be interested to hear that it’s possible to fine-tune PaliGemma 2 in app, as well.

1 Like

Indeed! Thanks so much for this, I had not considered the size of the model.

I see I can deploy Paligemma on to a device, but can Paligemma (or Florence-2) be accessed through your hosted API? I see self-hosted but I am not familiar with the hardware necessary for this.

I basically want a website which gives visitors access to VLM/VQA/Object Detection.

Thanks, David

Paligemma and Florence aren’t currently available through hosted api, but you can get an endpoint to hit them through our dedicated deployments

Hi peter_roboflow,

You mean that powerful GPU is needed to run fined-tuned PaliGemma model ?
Can you tell me how powerful it should be ?

In my side, it takes 1 sec on A100, 3 sec on T4 GPU when running my PaliGemma2 for Object Detection.
But it takes over 14 sec when wrong prediction. I think I need to fine tune the model again with more data.

My aim is running fine-tuned PaliGemma2 on Jetson Orin Nano.
As I know, LLM 7B is possible on Jetson Orin Nano.
But I wonder if PaliGemma2 3B Object Detection is possible on this edge.

If you have any thoughts about this, could you please let me know?

By the way,
Now, I am trouble in deploying my fine-tuned PaliGemma model to Roboflow.
After finishing fine-tuning and version.deploy() , I can see the message in notebook like below.

Share your model with the world at: How to Use the find bottle2 Object Detection API

But when I go to the model page by click the url , it shows model loading … forever.

Could you give me help?

Does it be related with my free account?

Thanks