Selfhosting Roboflow

  • Project Type: Detect, Count, and Visualize - TEST
  • Operating System & Browser: Windows Server 2022 / Edge Browser Version 138.0.3351.95 (Official build) (64-bit)
  • Operating System Inference Server: Ubuntu 20.04.6 LTS / nVidia A100 NVIDIA-SMI 550.54.15 - Driver Version: 550.54.15 - CUDA Version: 12.4
  • Project Universe Link or Workspace/Project ID: NA

Ahoi community,

I recently rediscovered roboflow for internal testing and demos.
Inhouse we have a ESXi environment with an Nvidia A100 GPU.

So far I managed to follow the from here Self-Hosted Deployment | Roboflow Docs to here Install on Linux - Roboflow Inference and got everything running

I can access the inference server via the browser via 192.168.10.14:9001

I don’t know if this helps but 192.168.10.14:9001/build throws an error

However when I use login to the roboflow app and choose an inference server I cannot connect

My understanding is that I design the workflow in the roboflow app, connect to my local server inference server and test everything.

Thank you in advance for your feedback and help.
Cheers!

Hi @m4xwe11o!
I’m sorry you’re running into this issue! I am going to consult with the team to get you the best possible answer.

Thank you for your patience and understanding.

Ahoi there!
Amazing, let me know if you need to know details on my setup.
My VM configuration is:

the Linux VM is build according to the latest nvidia manual: Creating Your First NVIDIA AI Enterprise System — NVIDIA AI Enterprise: Bare Metal Deployment Guide

Cheers

This is because your browser won’t let you connect over http from an https page. Easiest thing to do is use SSH port forwarding so you can connect through a localhost tunnel.

To do it “for real” you need to put a reverse proxy (like nginx) up front that will offload SSL for you.

We’re working on automating this. But those are the options for now.

Ahoi there,

form this point of view it makes sense that the connection from a different machine like in my case the server.

I did setup my NGINX Proxy Manager reverse connect to point to the server

Unfortunately the connection cannot be established

Or is it that I need to follow the selfhosted cloud manual when I want to host the inference server onpremise: Azure - Roboflow Inference

Kind Regards,
Maximilian

Hi @m4xwe11o!
You’re very close! You just need to configure Let’s Encrypt SSL on your NGINX Proxy Manager!

To confirm, you are using the right guide.

Ahoi Ford,

I made it work, it was a firewall issue internally from the NGINX Proxy to the roboflow container.
Have you seen this type of error?

[ONNXRuntimeError] : 1 : FAIL : CUDA failure 102: device doesn't have valid Grid license;
GPU=0;
hostname=bf0a00e3d1a0;
file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider.cc;
line=424;
expr=cudaSetDevice(GetDeviceId());

When I run the same workflow internally on the vm with localhost the workflow can be executed

Also I can see the connection attempt in the docker logs

INFO:     192.168.30.49:34420 - "OPTIONS / HTTP/1.1" 200 OK
INFO:     192.168.30.49:34426 - "HEAD / HTTP/1.1" 200 OK
INFO:     192.168.30.49:34442 - "OPTIONS /workflows/run HTTP/1.1" 200 OK
2025-07-28 20:12:46.369928887 [E:onnxruntime:Default, cuda_call.cc:123 CudaCall] CUDA failure 102: device doesn't have valid Grid license ; GPU=0 ; hostname=bf0a00e3d1a0 ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider.cc ; line=424 ; expr=cudaSetDevice(GetDeviceId());
{"event": "Execution of step $steps.detection encountered error.", "timestamp": "2025-07-28 20:12.46", "exception": {"type": "Fail", "message": "[ONNXRuntimeError] : 1 : FAIL : CUDA failure 102: device doesn't have valid Grid license ; GPU=0 ; hostname=bf0a00e3d1a0 ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider.cc ; line=424 ; expr=cudaSetDevice(GetDeviceId()); ", "stacktrace": [{"filename": "/app/inference/core/workflows/execution_engine/v1/executor/core.py", "lineno": 130, "function": "safe_execute_step", "code": "run_step("},
{"filename": "/app/inference/core/workflows/execution_engine/v1/executor/core.py", "lineno": 160, "function": "run_step", "code": "return run_simd_step("}, {"filename": "/app/inference/core/workflows/execution_engine/v1/executor/core.py", "lineno": 184, "function": "run_simd_step", "code": "return run_simd_step_in_batch_mode("},
{"filename": "/app/inference/core/workflows/execution_engine/v1/executor/core.py", "lineno": 224, "function": "run_simd_step_in_batch_mode", "code": "outputs = step_instance.run(**step_input.parameters)"},
{"filename": "/app/inference/core/workflows/core_steps/models/roboflow/instance_segmentation/v1.py", "lineno": 211, "function": "run", "code": "return self.run_locally("},
{"filename": "/app/inference/core/workflows/core_steps/models/roboflow/instance_segmentation/v1.py", "lineno": 281, "function": "run_locally", "code": "predictions = self._model_manager.infer_from_request_sync("},
{"filename": "/app/inference/core/managers/decorators/fixed_size_cache.py", "lineno": 158, "function": "infer_from_request_sync", "code": "return super().infer_from_request_sync(model_id, request, **kwargs)"},
{"filename": "/app/inference/core/managers/decorators/base.py", "lineno": 106, "function": "infer_from_request_sync", "code": "return self.model_manager.infer_from_request_sync(model_id, request, **kwargs)"}, {"filename": "/app/inference/core/managers/active_learning.py", "lineno": 196, "function": "infer_from_request_sync", "code": "prediction = super().infer_from_request_sync("},
{"filename": "/app/inference/core/managers/active_learning.py", "lineno": 54, "function": "infer_from_request_sync", "code": "prediction = super().infer_from_request_sync("},
{"filename": "/app/inference/core/managers/base.py", "lineno": 231, "function": "infer_from_request_sync", "code": "rtn_val = self.model_infer_sync("},
{"filename": "/app/inference/core/managers/base.py", "lineno": 294, "function": "model_infer_sync", "code": "return self._models[model_id].infer_from_request(request)"},
{"filename": "/app/inference/core/models/base.py", "lineno": 134, "function": "infer_from_request", "code": "responses = self.infer(**request.dict(), return_image_dims=False)"},
{"filename": "/app/inference/core/models/instance_segmentation_base.py", "lineno": 97, "function": "infer", "code": "return super().infer("},
{"filename": "/app/inference/core/models/roboflow.py", "lineno": 771, "function": "infer", "code": "return super().infer(image, **kwargs)"},
{"filename": "/app/inference/usage_tracking/collector.py", "lineno": 693, "function": "sync_wrapper", "code": "res = func(*args, **kwargs)"},
{"filename": "/app/inference/core/models/base.py", "lineno": 29, "function": "infer", "code": "predicted_arrays = self.predict(preproc_image, **kwargs)"},
{"filename": "/app/inference/models/yolov8/yolov8_instance_segmentation.py", "lineno": 42, "function": "predict", "code": "predictions, protos = run_session_via_iobinding("},
{"filename": "/app/inference/core/utils/onnx.py", "lineno": 36, "function": "run_session_via_iobinding", "code": "predictions = session.run(None, {input_name: input_data})"},
{"filename": "/usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", "lineno": 270, "function": "run", "code": "return self._sess.run(output_names, input_feed, run_options)"}]}, "filename": "core.py", "func_name": "safe_execute_step", "lineno": 142}
{"positional_args": ["StepExecutionError", "StepExecutionError(\"[ONNXRuntimeError] : 1 : FAIL : CUDA failure 102: device doesn't have valid Grid license ; GPU=0 ; hostname=bf0a00e3d1a0 ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider.cc ; line=424 ; expr=cudaSetDevice(GetDeviceId()); \")"], "event": "%s: %s", "request_id": "116752b975354d27bd951878d04753b0", "timestamp": "2025-07-28 20:12.46", "exception": {"type": "StepExecutionError", "message": "[ONNXRuntimeError] : 1 : FAIL : CUDA failure 102: device doesn't have valid Grid license ; GPU=0 ; hostname=bf0a00e3d1a0 ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider.cc ; line=424 ; expr=cudaSetDevice(GetDeviceId()); ", "stacktrace": [{"filename": "/app/inference/core/interfaces/http/http_api.py", "lineno": 283, "function": "wrapped_route", "code": "return await route(*args, **kwargs)"}, {"filename": "/app/inference/usage_tracking/collector.py", "lineno": 728, "function": "async_wrapper", "code": "res = await func(*args, **kwargs)"},
{"filename": "/app/inference/core/interfaces/http/http_api.py", "lineno": 1388, "function": "infer_from_workflow", "code": "return process_workflow_inference_request("},
{"filename": "/app/inference/core/interfaces/http/http_api.py", "lineno": 826, "function": "process_workflow_inference_request", "code": "workflow_results = execution_engine.run("},
{"filename": "/app/inference/core/workflows/execution_engine/core.py", "lineno": 73, "function": "run", "code": "return self._engine.run("},
{"filename": "/app/inference/core/workflows/execution_engine/v1/core.py", "lineno": 107, "function": "run", "code": "result = run_workflow("},
{"filename": "/app/inference/usage_tracking/collector.py", "lineno": 693, "function": "sync_wrapper", "code": "res = func(*args, **kwargs)"},
{"filename": "/app/inference/core/workflows/execution_engine/profiling/core.py", "lineno": 264, "function": "wrapper", "code": "return func(*args, **kwargs)"},
{"filename": "/app/inference/core/workflows/execution_engine/v1/executor/core.py", "lineno": 62, "function": "run_workflow", "code": "execute_steps("},
{"filename": "/app/inference/core/workflows/execution_engine/profiling/core.py", "lineno": 264, "function": "wrapper", "code": "return func(*args, **kwargs)"},
{"filename": "/app/inference/core/workflows/execution_engine/v1/executor/core.py", "lineno": 108, "function": "execute_steps", "code": "_ = run_steps_in_parallel("},
{"filename": "/app/inference/core/workflows/execution_engine/v1/executor/utils.py", "lineno": 14, "function": "run_steps_in_parallel", "code": "return list(inner_executor.map(_run, steps))"},
{"filename": "/usr/lib/python3.10/concurrent/futures/_base.py", "lineno": 621, "function": "result_iterator", "code": "yield _result_or_cancel(fs.pop())"},
{"filename": "/usr/lib/python3.10/concurrent/futures/_base.py", "lineno": 319, "function": "_result_or_cancel", "code": "return fut.result(timeout)"},
{"filename": "/usr/lib/python3.10/concurrent/futures/_base.py", "lineno": 458, "function": "result", "code": "return self.__get_result()"},
{"filename": "/usr/lib/python3.10/concurrent/futures/_base.py", "lineno": 403, "function": "__get_result", "code": "raise self._exception"},
{"filename": "/usr/lib/python3.10/concurrent/futures/thread.py", "lineno": 58, "function": "run", "code": "result = self.fn(*self.args, **self.kwargs)"}, {"filename": "/app/inference/core/workflows/execution_engine/v1/executor/utils.py", "lineno": 37, "function": "_run", "code": "return fun()"},
{"filename": "/app/inference/core/workflows/execution_engine/profiling/core.py", "lineno": 264, "function": "wrapper", "code": "return func(*args, **kwargs)"},
{"filename": "/app/inference/core/workflows/execution_engine/v1/executor/core.py", "lineno": 144, "function": "safe_execute_step", "code": "raise StepExecutionError("}]}, "filename": "http_api.py", "func_name": "wrapped_route", "lineno": 475}
INFO:     192.168.30.49:34456 - "POST /workflows/run HTTP/1.1" 500 Internal Server Error

Is it that this is getting more and more complicated using an nVidia A100?

BR

Ahoi me again,

I’ll check tomorrow again if I may have to renew the nVidia AIE license…

BR

Hi @m4xwe11o!
I haven’t seen that error before, but according to my research it indicates that you need to renew your GRID license.

Ahoi,

I renewed my Grid license… unclear why it caused an issue though ^^

Also restarted the inference container, and it pulled a new container image roboflow/roboflow-inference-server-gpu:latest now the inference works again.

I think I solved the issue :slight_smile:

Thanks for your input

Cheers

Glad to hear you were able to get it running!! Happy building.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.