Inference takes too long the first time

Hi, I am using the Roboflow Hosted API to deploy a Workflow but everytime I run the code for the first time after a while I either get a 500 error response or it takes around 20-30 seconds to infer. After that, any new run takes around 3 seconds, doesn´t matter if I change the image, which is okay. But I am wondering why it takes too long the first time, is it normal? My code is exactly as it is in the documentation.
I also tried running the inference server locally and I have the same issue. I have a 4060 GPU and after the first run that takes some time (one time I waited like 3 minutes) all other runs take like 3 seconds. I would like to know if anyone has this same issue or if is not an issue and that is just how it works.

Hi @Crisaq: in both cases we need to load the Workflow and the models used within, I would expect some delay on the first request.

In the case of the 4060 it might be poor internet connection? I can check but 3 minutes sounds like far too long.

@Crisaq - one other thing, if you are using models like SAM or Florence, those are quite large models (several GB) that need to get loaded.

That makes sense then, is usually not an issue but I was wondering if I was the only one since I couldn´t find anything about it.
In the other case I believe I was using a Workflow with both Florence and SAM so may be the long wait makes sense.
Thank you very much for your answers!

One thing that might help is our Dedicated Deployments functionality that lets you spin up a GPU pre-configured with your workflow. It will still have that early delay, but then will stay “warm” as you need it.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.