Inference workflows process-video ... is always falling back to CPU, not using the GPU

Thomas · October 31, 2025, 12:10am

Hi Friends, I am having issues using from a CLI workflow with video processing my server GPU. I am alway falling back to CPU, which is toooo. I spent a day on trying docu and chatgtp, but haven’t got any success. Any ideas or how-to’s would be greatly welcomed. thx and cheers, Thomas

Infrastructure:
inference version: v0.59.0
inference-sdk version: v0.59.0
inference-cli version: v0.59.0
running on ubuntu linux server, within docker
starting with “inference server start” as well as with a shell script. Same effects.
I have verified that
GPU is working
can be reached thru CUDA from the container
here is my script

1_aves_del_pozo_detection_nusakan

inference workflows process-images-directory --help

script is verified to work, but seems not to use GPU

updated inference on 27th of Oct. to v50

updated /etc/docker/daemon.json on 27th of Oct. to include “default-runtime”: “nvidia”

script shall be run on nusakan server

import os
import subprocess
video_path = “/nunki/magellanes/uploads/cam4/2025/04/14/magellanes-cam4_00_20250414153636.mp4”
output_dir = “/nunki/yatina/output/video_output/”
workspace_name = “yatinanet”
workflow_id = “aves-time-in-zone”
api_key = “mykey”
processing_target = “api”

api_url = “http://inference.yatina.net:9001” # installed on Hermann’s nusakan / Docker, verified to run

aggregation_format = “jsonl”

maxfps = “25”

threads =“1”

import os
output_path = os.path.abspath(output_dir)
print(f"Roboflow is writing files to: {output_path}")

— force workflow to use GPU —

os.environ[“ROBOFLOW_DEVICE”] = “cuda”
os.environ[“ROBOFLOW_TARGET”] = “api”

— debug: check GPU before running —

gpu_test_cmd = [
“inference”, “workflows”, “process-video”,
“–help”
]

Run help command just to ensure CLI can see environment

result = subprocess.run(gpu_test_cmd, check=False, capture_output=True, text=True, env=os.environ)
if “cuda” in result.stdout.lower() or “cuda” in result.stderr.lower():
print(“GPU detected by inference CLI ”)
else:
print(“ Warning: GPU may not be detected by inference CLI”)
print(“STDOUT:”, result.stdout)
print(“STDERR:”, result.stderr)
command = [
“inference”, “workflows”, “process-video”,
“-v”, video_path,
“-o”, output_dir,
“-pt”, processing_target,
“–workspace_name”, workspace_name,
“–workflow_id”, workflow_id,
“–api-key”, api_key,

“–api_url”, api_url,

“–aggregate”,

“–output_file_type”, aggregation_format,

"--max_fps", maxfps,
"--save_out_video",
"--allow_override",
"--debug_mode",

“–threads”, threads

]
print(“Executing command:”)
print(" ".join(command)) # Prints the full command as a string

Print standard output and errors

result = subprocess.run(command, check=False, capture_output=True, text=True)
print(“STDOUT:\n”, result.stdout)
print(“STDERR:\n”, result.stderr)

subprocess.run(command, check=False)

Ford · October 31, 2025, 2:23pm

Hi @Thomas,
Great question and happy to help here! To start, can you confirm that you have installed inference for GPU using pip install inference-gpu? Here is a link to our documentation that walks through this process.

Next, please verify that the GPU is properly attached to your container. NVIDIA’s Container Toolkit Configuration guide walks through this configuration process.

Finally, please verify that CUDA is available inside your container (see CUDA verification), and the ONNX CUDAExecutionProvider is listed (guide).

Thomas · October 31, 2025, 6:01pm

Hi Ford, thanks a lot for offering help and guidance.
Here is my reply/status on your questions:

I have done pip install inference-gpu
container toolkit

image1010×464 12.2 KB
CUDA verification

image971×301 10.7 KB
ONNX verification…that could be the issue

image1098×160 7.33 KB

what do you suggest me to do to fix that, please?
Thank you very much indeed
cheers
Thomas

Thomas · November 1, 2025, 5:54pm

Meanwhile I found the following: there is a command difference inference-process-images-directory (which does use my GPU) versus inference process-video (which apparently DOES NOT use my GPU)

nference-process-images-directory uses my server GPU

processing_target = “api”
api_url = “http://localhost:9001” # installed on my Docker / Ubuntu Linux, verified to run with CUDA (see screenshot below)

inference process-video does NOT use my server GPU, it only uses CPU, which is inefficient for video

For whatever reason, your CLI comand inference process-video does not allow me to direct the processing target to an API , which means I do not see, where it is processing/infering the video file. I just see that only CPU is being consumed, and that it takes quite long.

Observations:

The command inference process-video does not have a --pt or --api_url option, so it sends the video analysos somewhere per default…possibly you can help me how to direct this to “http://localhost:9001”
I tried to use in the command inference process-video the same --pt and --api_url as in the other command inference-process-images-directory, but it still remains on CPU…and I do not get any related errors.

Here is the script, I am using, and the below screenshots

script is verified to work, but seems not to use GPU

updated inference on 27th of Oct. to v50.1

updated /etc/docker/daemon.json on 27th of Oct. to include “default-runtime”: “nvidia”

verified that workflows process-images-directory does use GPU

script shall be run on nusakan server

Run help command just to ensure CLI can see environment

import os
import subprocess

— Config —

video_path = “/nunki/magellanes/uploads/cam4/2025/04/14/magellanes-cam4_00_20250414153636.mp4”
output_dir = “/nunki/yatina/output/video_output/”
workspace_name = “yatinanet”
workflow_id = “aves-time-in-zone”
api_key = “edited”
processing_target = “api”

api_url = “http://localhost:9001” # GPU Docker server

api_url = “http://0.0.0.0:9001” # GPU Docker server

api_url = “http://172.17.0.2:9001” # GPU Docker server IP fount out with docker inspect -f ‘{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}’ inference-server-yatina

maxfps = 25

— Force GPU —

os.environ[“ROBOFLOW_DEVICE”] = “cuda”
os.environ[“ROBOFLOW_TARGET”] = “api”

— Ensure output directory exists —

os.makedirs(output_dir, exist_ok=True)
print(f"Roboflow is writing files to: {os.path.abspath(output_dir)}")

— Debug GPU detection —

gpu_test_cmd = [“inference”, “workflows”, “process-video”, “–help”]
result = subprocess.run(gpu_test_cmd, check=False, capture_output=True, text=True, env=os.environ)
stdout_lower = result.stdout.lower()
stderr_lower = result.stderr.lower()
if “cuda” in stdout_lower or “cuda” in stderr_lower or “tensorrt” in stdout_lower or “tensorrt” in stderr_lower:
print(“GPU detected by inference CLI ”)
else:
print(“ Warning: GPU may not be detected by inference CLI”)
print(“STDOUT:”, result.stdout)
print(“STDERR:”, result.stderr)

— Build and run workflow command —

command = [
“inference”, “workflows”, “process-video”,
“-v”, video_path,
“-o”, output_dir,
“-pt”, processing_target,
“–workspace_name”, workspace_name,
“–workflow_id”, workflow_id,
“–api-key”, api_key,
“–api_url”, api_url,
“–max_fps”, str(maxfps),
“–save_out_video”,
“–allow_override”,
“–debug_mode”
]

print(“Executing command:”)
print(" ".join(command))

result = subprocess.run(command, check=False, capture_output=True, text=True, env=os.environ)
print(“STDOUT:\n”, result.stdout)
print(“STDERR:\n”, result.stderr)

if result.returncode != 0:
print(f"Workflow failed with return code {result.returncode}")
else:
print(“Workflow completed successfully!”)

Thomas · November 1, 2025, 8:47pm

good news, I finally got it working to use inference workflows process-videos on my linux server with GPU.
I needed to run the process inside the container, which can be done with a shell script and mounting the necessary directories. Hopefully that result helps other folks suffering the same issues to waste less time than I did.

#!/bin/bash
docker exec -it inference-server-yatina
inference workflows process-video
[and the rest of your options]

Have fun!

Ford · November 4, 2025, 3:30am

Good evening @Thomas!
This is fantastic, thank you very much for sharing to help the community at large!

We greatly appreciate your contribution here, will certainly be a valuable resource for many other Roboflow community members. Happy building!

Topic		Replies	Views
Trying to get GPU usage with inference server and Docker 🤝 Community Help	7	248	August 14, 2025
How to get Roboflow Inference working in desktop GPU? 🤝 Community Help	1	34	January 12, 2026
Need help deploying a workflow 🤝 Community Help	3	33	March 2, 2026
Unable to run model on GPU using Nvidia Orin AGX and Jetpack 6.2 🤝 Community Help	7	104	February 3, 2026
SourceConnectionError using local video against Docker inference server 🤝 Community Help	5	99	October 30, 2025

Inference workflows process-video ... is always falling back to CPU, not using the GPU

Hi Friends, I am having issues using from a CLI workflow with video processing my server GPU. I am alway falling back to CPU, which is toooo. I spent a day on trying docu and chatgtp, but haven’t got any success. Any ideas or how-to’s would be greatly welcomed. thx and cheers, Thomas

1_aves_del_pozo_detection_nusakan

inference workflows process-images-directory --help

script is verified to work, but seems not to use GPU

updated inference on 27th of Oct. to v50

updated /etc/docker/daemon.json on 27th of Oct. to include “default-runtime”: “nvidia”

script shall be run on nusakan server

api_url = “http://inference.yatina.net:9001” # installed on Hermann’s nusakan / Docker, verified to run

aggregation_format = “jsonl”

threads =“1”

— force workflow to use GPU —

— debug: check GPU before running —

Run help command just to ensure CLI can see environment

“–api_url”, api_url,

“–aggregate”,

“–output_file_type”, aggregation_format,

“–threads”, threads

Print standard output and errors

subprocess.run(command, check=False)

Here is the script, I am using, and the below screenshots

script is verified to work, but seems not to use GPU

updated inference on 27th of Oct. to v50.1

updated /etc/docker/daemon.json on 27th of Oct. to include “default-runtime”: “nvidia”

verified that workflows process-images-directory does use GPU

script shall be run on nusakan server

Run help command just to ensure CLI can see environment

— Config —

api_url = “http://localhost:9001” # GPU Docker server

api_url = “http://172.17.0.2:9001” # GPU Docker server IP fount out with docker inspect -f ‘{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}’ inference-server-yatina

— Force GPU —

— Ensure output directory exists —

— Debug GPU detection —

— Build and run workflow command —

Related topics