Project Type: Keypoint detection workflow via local inference
Operating System & Browser: Windows 11
I donāt think my GPU is being used when I run my workflow via inference server in a python script. Iām trying to run a keypoint detection model against an .mkv video file. Not sure if this troubleshootable via a forum like this but thought Iād try in case there was an easy answer Iām just not stumbling on.
(I had everything working at one point, but then I upgraded to Windows 11 and I had to set some things up again (including WSL2) and now I canāt quite get back to a working state.)
The workflow does run, but itās under 1 frame per second. That combined with the info below seem to indicate itās using CPU instead of GPU.
Hereās some pieces that might help on where Iām currently at with this. Or let me know if you need to see something else Iām forgetting. Iāll take any and all suggestions. TIA!
4. when I initiated my above container with āinference server startā it did say something to the effect of āgpu recognizedā so that seemed positive. My understanding is it will find a GPU if available and use it automatically with a flag like āgpu
When I track the GPU usage as inference is running it appears to not be in use
inference server start -e ONNXRUNTIME_EXECUTION_PROVIDERS=ā[CUDAExecutionProvider]ā
GPU detected. Using a GPU image.
[Errno 2] No such file or directory: āONNXRUNTIME_EXECUTION_PROVIDERS=[CUDAExecutionProvider]ā
And on a related note, my python script has the following line in it:
os.environ[āONNXRUNTIME_EXECUTION_PROVIDERSā] = ā[CUDAExecutionProvider]ā
Thanks. Iāll keep playing with this idea of setting it in advance. Iām using PowerShell so I think I was correct to set an Env variable before running āinference server startā (which sets up a container in Docker). So I did this:
So maybe itās close? But I ran the python script and it was still slow. Maybe itās just that slow with my modelā¦? (Seems really slow though.) OpenAI was even nice enough to launch ChatGPT5 for me today so I could try and troubleshoot, but Iām still not quite there.
if you exec into running container, can you run env | grep ONNXRUNTIME_EXECUTION_PROVIDERS to confirm env is set? While in there, would be helpful to also grab the output of nvidia-smi (from inside of the container)
I have no windows machine on my desk so we will have to go through a few rounds of run this run that
No Windows machine - arenāt you fancy! Ha ha! I think Iāve got this all solved finally. TLDR - CUDAExecutionProvider failed to load because my CUDA/cuDNN didnāt match the ORT GPU wheel.
The reason appeared to be a mess of dependency conflicts. I think the most significant show-stopper ended up being numpy. A recent version of inference-gpu (0.52.0) started to require a Numpy>=2.x, but my GPU needed onnxruntime-gpu==1.16.3 and that required CUDA 11.8 and THAT required a Numpy<2.x! So every time I fixed one package and got it working, the other would break. Lots of iteration over multiple days on this one. Thanks for being so willing to help as usual @Grzegorz !