Trying to get GPU usage with inference server and Docker

  • Project Type: Keypoint detection workflow via local inference
  • Operating System & Browser: Windows 11

I don’t think my GPU is being used when I run my workflow via inference server in a python script. I’m trying to run a keypoint detection model against an .mkv video file. Not sure if this troubleshootable via a forum like this but thought I’d try in case there was an easy answer I’m just not stumbling on.

(I had everything working at one point, but then I upgraded to Windows 11 and I had to set some things up again (including WSL2) and now I can’t quite get back to a working state.)

The workflow does run, but it’s under 1 frame per second. That combined with the info below seem to indicate it’s using CPU instead of GPU.

Here’s some pieces that might help on where I’m currently at with this. Or let me know if you need to see something else I’m forgetting. I’ll take any and all suggestions. TIA!

  1. I have a GPU. :slight_smile:

  2. Docker is running and I think can see my GPU, though I don’t see it in the stat graphs…

  3. Per Docker, I checked that it can use the GPU
    To confirm GPU access is working inside Docker, run the following:

     docker run --rm -it --gpus=all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
    

gave this result:

4. when I initiated my above container with ā€œinference server startā€ it did say something to the effect of ā€œgpu recognizedā€ so that seemed positive. My understanding is it will find a GPU if available and use it automatically with a flag like –gpu

  1. When I track the GPU usage as inference is running it appears to not be in use

Hi @Automatez ,

When starting inference in docker, can you pass -e ONNXRUNTIME_EXECUTION_PROVIDERS="[CUDAExecutionProvider]" and post here any errors that may pop up?

Thanks!

Grzegorz

Thanks! I did get the following:

inference server start -e ONNXRUNTIME_EXECUTION_PROVIDERS=ā€œ[CUDAExecutionProvider]ā€
GPU detected. Using a GPU image.
[Errno 2] No such file or directory: ā€˜ONNXRUNTIME_EXECUTION_PROVIDERS=[CUDAExecutionProvider]’

And on a related note, my python script has the following line in it:
os.environ[ā€œONNXRUNTIME_EXECUTION_PROVIDERSā€] = ā€˜[CUDAExecutionProvider]’

Does any of that help?

Ah sorry, I meant adding -e when you start docker :slight_smile:

If you export this env before running inference ONNXRUNTIME_EXECUTION_PROVIDERS="[CUDAExecutionProvider]", you would instruct it to only use CUDA

Thanks. I’ll keep playing with this idea of setting it in advance. I’m using PowerShell so I think I was correct to set an Env variable before running ā€œinference server startā€ (which sets up a container in Docker). So I did this:

$Env:ONNXRUNTIME_EXECUTION_PROVIDERS = ā€œ[ā€˜CUDAExecutionProvider’]ā€

And then checked it with this:

docker inspect keen_knuth --format ā€˜{{.HostConfig.DeviceRequests}}’

And got this:
[{ 0 [all] [[gpu]] map}]

So maybe it’s close? But I ran the python script and it was still slow. Maybe it’s just that slow with my model…? (Seems really slow though.) OpenAI was even nice enough to launch ChatGPT5 for me today so I could try and troubleshoot, but I’m still not quite there. :slight_smile:

if you exec into running container, can you run env | grep ONNXRUNTIME_EXECUTION_PROVIDERS to confirm env is set? While in there, would be helpful to also grab the output of nvidia-smi (from inside of the container)

I have no windows machine on my desk so we will have to go through a few rounds of run this run that :wink:

No Windows machine - aren’t you fancy! Ha ha! :smiley: I think I’ve got this all solved finally. TLDR - CUDAExecutionProvider failed to load because my CUDA/cuDNN didn’t match the ORT GPU wheel.

The reason appeared to be a mess of dependency conflicts. I think the most significant show-stopper ended up being numpy. A recent version of inference-gpu (0.52.0) started to require a Numpy>=2.x, but my GPU needed onnxruntime-gpu==1.16.3 and that required CUDA 11.8 and THAT required a Numpy<2.x! So every time I fixed one package and got it working, the other would break. Lots of iteration over multiple days on this one. Thanks for being so willing to help as usual @Grzegorz !

1 Like

Thanks for sharing your findings!

We updated numpy to >2 around inference==0.45.2, I guess you recently updated the package? In some of our docker images we still have hard requirement on numpy<2, in such cases we reinstall numpy (like here: inference/docker/dockerfiles/Dockerfile.onnx.jetson.5.1.1 at main Ā· roboflow/inference Ā· GitHub)

Thanks again!

Grzegorz

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.