Trying to get GPU usage with inference server and Docker

Automatez · August 7, 2025, 3:26am

Project Type: Keypoint detection workflow via local inference
Operating System & Browser: Windows 11

I don’t think my GPU is being used when I run my workflow via inference server in a python script. I’m trying to run a keypoint detection model against an .mkv video file. Not sure if this troubleshootable via a forum like this but thought I’d try in case there was an easy answer I’m just not stumbling on.

(I had everything working at one point, but then I upgraded to Windows 11 and I had to set some things up again (including WSL2) and now I can’t quite get back to a working state.)

The workflow does run, but it’s under 1 frame per second. That combined with the info below seem to indicate it’s using CPU instead of GPU.

Here’s some pieces that might help on where I’m currently at with this. Or let me know if you need to see something else I’m forgetting. I’ll take any and all suggestions. TIA!

I have a GPU.

image1674×454 27.3 KB
Docker is running and I think can see my GPU, though I don’t see it in the stat graphs…

image1935×1103 332 KB
Per Docker, I checked that it can use the GPU
To confirm GPU access is working inside Docker, run the following:
```
 docker run --rm -it --gpus=all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
```

gave this result:

4. when I initiated my above container with “inference server start” it did say something to the effect of “gpu recognized” so that seemed positive. My understanding is it will find a GPU if available and use it automatically with a flag like –gpu

When I track the GPU usage as inference is running it appears to not be in use
image1207×577 17.1 KB

Grzegorz · August 7, 2025, 3:02pm

Hi @Automatez ,

When starting inference in docker, can you pass -e ONNXRUNTIME_EXECUTION_PROVIDERS="[CUDAExecutionProvider]" and post here any errors that may pop up?

Thanks!

Grzegorz

Automatez · August 7, 2025, 3:57pm

Thanks! I did get the following:

inference server start -e ONNXRUNTIME_EXECUTION_PROVIDERS=“[CUDAExecutionProvider]”
GPU detected. Using a GPU image.
[Errno 2] No such file or directory: ‘ONNXRUNTIME_EXECUTION_PROVIDERS=[CUDAExecutionProvider]’

And on a related note, my python script has the following line in it:
os.environ[“ONNXRUNTIME_EXECUTION_PROVIDERS”] = ‘[CUDAExecutionProvider]’

Does any of that help?

Grzegorz · August 7, 2025, 4:56pm

Ah sorry, I meant adding -e when you start docker

If you export this env before running inference ONNXRUNTIME_EXECUTION_PROVIDERS="[CUDAExecutionProvider]", you would instruct it to only use CUDA

Automatez · August 8, 2025, 2:40am

Thanks. I’ll keep playing with this idea of setting it in advance. I’m using PowerShell so I think I was correct to set an Env variable before running “inference server start” (which sets up a container in Docker). So I did this:

$Env:ONNXRUNTIME_EXECUTION_PROVIDERS = “[‘CUDAExecutionProvider’]”

And then checked it with this:

docker inspect keen_knuth --format ‘{{.HostConfig.DeviceRequests}}’

And got this:
[{ 0 [all] [[gpu]] map}]

So maybe it’s close? But I ran the python script and it was still slow. Maybe it’s just that slow with my model…? (Seems really slow though.) OpenAI was even nice enough to launch ChatGPT5 for me today so I could try and troubleshoot, but I’m still not quite there.

Grzegorz · August 10, 2025, 3:34pm

if you exec into running container, can you run env | grep ONNXRUNTIME_EXECUTION_PROVIDERS to confirm env is set? While in there, would be helpful to also grab the output of nvidia-smi (from inside of the container)

I have no windows machine on my desk so we will have to go through a few rounds of run this run that

Automatez · August 13, 2025, 9:04pm

No Windows machine - aren’t you fancy! Ha ha! I think I’ve got this all solved finally. TLDR - CUDAExecutionProvider failed to load because my CUDA/cuDNN didn’t match the ORT GPU wheel.

The reason appeared to be a mess of dependency conflicts. I think the most significant show-stopper ended up being numpy. A recent version of inference-gpu (0.52.0) started to require a Numpy>=2.x, but my GPU needed onnxruntime-gpu==1.16.3 and that required CUDA 11.8 and THAT required a Numpy<2.x! So every time I fixed one package and got it working, the other would break. Lots of iteration over multiple days on this one. Thanks for being so willing to help as usual @Grzegorz !

Grzegorz · August 14, 2025, 10:00am

Thanks for sharing your findings!

We updated numpy to >2 around inference==0.45.2, I guess you recently updated the package? In some of our docker images we still have hard requirement on numpy<2, in such cases we reinstall numpy (like here: inference/docker/dockerfiles/Dockerfile.onnx.jetson.5.1.1 at main · roboflow/inference · GitHub)

Thanks again!

Grzegorz

system · August 21, 2025, 10:01am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to use my GPU with inference library 🤝 Community Help	3	44	July 28, 2025
RSTP Local inference running slow/jerky 🤝 Community Help	25	416	December 18, 2024
Error at Inference Server Start 🤝 Community Help	3	268	October 2, 2024
Running Trained Model on GPU 🤝 Community Help	6	327	August 13, 2024
Can't run GPU inference server - missing libnvidia-ml.so.1 🤝 Community Help	2	98	April 1, 2025

Trying to get GPU usage with inference server and Docker

Related topics