Inference workflows process-video ... is always falling back to CPU, not using the GPU

Hi Friends, I am having issues using from a CLI workflow with video processing my server GPU. I am alway falling back to CPU, which is toooo. I spent a day on trying docu and chatgtp, but haven’t got any success. Any ideas or how-to’s would be greatly welcomed. thx and cheers, Thomas

Infrastructure:
inference version: v0.59.0
inference-sdk version: v0.59.0
inference-cli version: v0.59.0
running on ubuntu linux server, within docker
starting with “inference server start” as well as with a shell script. Same effects.
I have verified that
GPU is working
can be reached thru CUDA from the container
here is my script

1_aves_del_pozo_detection_nusakan

inference workflows process-images-directory --help

script is verified to work, but seems not to use GPU

updated inference on 27th of Oct. to v50

updated /etc/docker/daemon.json on 27th of Oct. to include “default-runtime”: “nvidia”

script shall be run on nusakan server

import os
import subprocess
video_path = “/nunki/magellanes/uploads/cam4/2025/04/14/magellanes-cam4_00_20250414153636.mp4”
output_dir = “/nunki/yatina/output/video_output/”
workspace_name = “yatinanet”
workflow_id = “aves-time-in-zone”
api_key = “mykey”
processing_target = “api”

api_url = “http://inference.yatina.net:9001” # installed on Hermann’s nusakan / Docker, verified to run

aggregation_format = “jsonl”

maxfps = “25”

threads =“1”

import os
output_path = os.path.abspath(output_dir)
print(f"Roboflow is writing files to: {output_path}")

— force workflow to use GPU —

os.environ[“ROBOFLOW_DEVICE”] = “cuda”
os.environ[“ROBOFLOW_TARGET”] = “api”

— debug: check GPU before running —

gpu_test_cmd = [
“inference”, “workflows”, “process-video”,
“–help”
]

Run help command just to ensure CLI can see environment

result = subprocess.run(gpu_test_cmd, check=False, capture_output=True, text=True, env=os.environ)
if “cuda” in result.stdout.lower() or “cuda” in result.stderr.lower():
print(“GPU detected by inference CLI :white_check_mark:”)
else:
print(“:warning: Warning: GPU may not be detected by inference CLI”)
print(“STDOUT:”, result.stdout)
print(“STDERR:”, result.stderr)
command = [
“inference”, “workflows”, “process-video”,
“-v”, video_path,
“-o”, output_dir,
“-pt”, processing_target,
“–workspace_name”, workspace_name,
“–workflow_id”, workflow_id,
“–api-key”, api_key,

“–api_url”, api_url,

“–aggregate”,

“–output_file_type”, aggregation_format,

"--max_fps", maxfps,
"--save_out_video",
"--allow_override",
"--debug_mode",

“–threads”, threads

]
print(“Executing command:”)
print(" ".join(command)) # Prints the full command as a string

Print standard output and errors

result = subprocess.run(command, check=False, capture_output=True, text=True)
print(“STDOUT:\n”, result.stdout)
print(“STDERR:\n”, result.stderr)

subprocess.run(command, check=False)


Hi @Thomas,
Great question and happy to help here! To start, can you confirm that you have installed inference for GPU using pip install inference-gpu? Here is a link to our documentation that walks through this process.

Next, please verify that the GPU is properly attached to your container. NVIDIA’s Container Toolkit Configuration guide walks through this configuration process.

Finally, please verify that CUDA is available inside your container (see CUDA verification), and the ONNX CUDAExecutionProvider is listed (guide).

Hi Ford, thanks a lot for offering help and guidance.
Here is my reply/status on your questions:

  1. I have done pip install inference-gpu

  2. container toolkit

  3. CUDA verification

  4. ONNX verification…that could be the issue

what do you suggest me to do to fix that, please?
Thank you very much indeed
cheers
Thomas

Meanwhile I found the following: there is a command difference inference-process-images-directory (which does use my GPU) versus inference process-video (which apparently DOES NOT use my GPU)

nference-process-images-directory uses my server GPU

processing_target = “api”
api_url = “http://localhost:9001” # installed on my Docker / Ubuntu Linux, verified to run with CUDA (see screenshot below)

inference process-video does NOT use my server GPU, it only uses CPU, which is inefficient for video

For whatever reason, your CLI comand inference process-video does not allow me to direct the processing target to an API , which means I do not see, where it is processing/infering the video file. I just see that only CPU is being consumed, and that it takes quite long.

Observations:

  • The command inference process-video does not have a --pt or --api_url option, so it sends the video analysos somewhere per default…possibly you can help me how to direct this to “http://localhost:9001”
  • I tried to use in the command inference process-video the same --pt and --api_url as in the other command inference-process-images-directory, but it still remains on CPU…and I do not get any related errors.

Here is the script, I am using, and the below screenshots


script is verified to work, but seems not to use GPU

updated inference on 27th of Oct. to v50.1

updated /etc/docker/daemon.json on 27th of Oct. to include “default-runtime”: “nvidia”

verified that workflows process-images-directory does use GPU

script shall be run on nusakan server

Run help command just to ensure CLI can see environment

import os
import subprocess

— Config —

video_path = “/nunki/magellanes/uploads/cam4/2025/04/14/magellanes-cam4_00_20250414153636.mp4”
output_dir = “/nunki/yatina/output/video_output/”
workspace_name = “yatinanet”
workflow_id = “aves-time-in-zone”
api_key = “edited”
processing_target = “api”

api_url = “http://localhost:9001” # GPU Docker server

api_url = “http://0.0.0.0:9001” # GPU Docker server

api_url = “http://172.17.0.2:9001” # GPU Docker server IP fount out with docker inspect -f ‘{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}’ inference-server-yatina

maxfps = 25

— Force GPU —

os.environ[“ROBOFLOW_DEVICE”] = “cuda”
os.environ[“ROBOFLOW_TARGET”] = “api”

— Ensure output directory exists —

os.makedirs(output_dir, exist_ok=True)
print(f"Roboflow is writing files to: {os.path.abspath(output_dir)}")

— Debug GPU detection —

gpu_test_cmd = [“inference”, “workflows”, “process-video”, “–help”]
result = subprocess.run(gpu_test_cmd, check=False, capture_output=True, text=True, env=os.environ)
stdout_lower = result.stdout.lower()
stderr_lower = result.stderr.lower()
if “cuda” in stdout_lower or “cuda” in stderr_lower or “tensorrt” in stdout_lower or “tensorrt” in stderr_lower:
print(“GPU detected by inference CLI :white_check_mark:”)
else:
print(“:warning: Warning: GPU may not be detected by inference CLI”)
print(“STDOUT:”, result.stdout)
print(“STDERR:”, result.stderr)

— Build and run workflow command —

command = [
“inference”, “workflows”, “process-video”,
“-v”, video_path,
“-o”, output_dir,
“-pt”, processing_target,
“–workspace_name”, workspace_name,
“–workflow_id”, workflow_id,
“–api-key”, api_key,
“–api_url”, api_url,
“–max_fps”, str(maxfps),
“–save_out_video”,
“–allow_override”,
“–debug_mode”
]

print(“Executing command:”)
print(" ".join(command))

result = subprocess.run(command, check=False, capture_output=True, text=True, env=os.environ)
print(“STDOUT:\n”, result.stdout)
print(“STDERR:\n”, result.stderr)

if result.returncode != 0:
print(f"Workflow failed with return code {result.returncode}")
else:
print(“Workflow completed successfully!”)




good news, I finally got it working to use inference workflows process-videos on my linux server with GPU.
I needed to run the process inside the container, which can be done with a shell script and mounting the necessary directories. Hopefully that result helps other folks suffering the same issues to waste less time than I did.

#!/bin/bash
docker exec -it inference-server-yatina
inference workflows process-video
[and the rest of your options]

Have fun!

Good evening @Thomas!
This is fantastic, thank you very much for sharing to help the community at large!

We greatly appreciate your contribution here, will certainly be a valuable resource for many other Roboflow community members. Happy building!