Workflow failing via API but not in sandbox

I have a workflow (doors-and-windows-sahi) set up to detect objects using YOLOv11 Object Detection (Accurate) + SAHI, which was working via the API (https://serverless.roboflow.com/glazier-software/workflows/doors-and-windows-sahi) until about a week or two ago. Since then I get a 500 response with the following body - it still works via the plaground/sandbox UI (Run Workflow).

{“message”:“Error in execution: Non-zero status code returned while running FusedMatMul node. Name:‘/model.10/m/m.0/attn/MatMul/MatmulTransposeFusion//MatMulScaleFusion/’ Status Message: /onnxruntime_src/onnxruntime/core/framework/bfc_arena.cc:376 void* onnxruntime::BFCArena::AllocateRawInternal(size_t, bool, onnxruntime::Stream*, bool, onnxruntime::WaitNotificationFn) Failed to allocate memory for requested buffer of size 632320000\n”,“error_type”:“StepExecutionError”,“context”:“workflow_execu…/m.0/attn/MatMul/MatmulTransposeFusion//MatMulScaleFusion/’ Status Message: /onnxruntime_src/onnxruntime/core/framework/bfc_arena.cc:376 void* onnxruntime::BFCArena::AllocateRawInternal(size_t, bool, onnxruntime::Stream*, bool, onnxruntime::WaitNotificationFn) Failed to allocate memory for requested buffer of size 632320000\n”,“blocks_errors”:[{“block_id”:“model”,“block_type”:“roboflow_core/roboflow_object_detection_model@v1”,“block_details”:null,“property_name”:null,“property_details”:null}]}

  • Project Type: Object Detection
  • Operating System & Browser:
  • Project Universe Link or Workspace/Project ID: doors-windows-9k-givyl/1
  • Do you grant Roboflow Support permission to access your Workspace for troubleshooting? (Yes/No): Yes

Good Morning @Kevin_Gorski!
My name is Ford, and I’m a Support Engineer at Roboflow. Happy to help here!

To help me triage further, how are you running inference on your workflow when you encounter this error? Via the in app “Test Workflow”, Serverless API, dedicated deployment, or self hosted inference?

Additionally, are you running inference on images, videos, or live video streams?

Thank you for the additional context!

It works when running in the app via “Run Workflow” (as stated), the error is via the serverless API (per the URL I provided).

This is always for images.

Checking in again on Monday

Good Afternoon @Kevin_Gorski,
Unfortunately I was unable to replicate this error when running inference on your workflow via the Serverless V2 API.

To help me triage further, can you please provide a sample image for testing that consistently generates the error? Also, what resolution/size are the images that you are passing to inference? Finally, does inference fail on every image passed or only a subset?

Any additional details about your setup, including the script you’re using to call the API, would also be helpful. Thank you!

@Ford

Sample image: https://gz-documents-dev.s3.amazonaws.com/business_1/job_11/d71e5678-9212-4835-b75b-9c020715d75e_2026-02-26T22:30:44.394Z

The images sizes/resolutions can vary, the example is 6000x4000 / ~2 MB

So far it’s failed for all of the images I’ve tried (<10), though they’re usually a similar size if that turns out to be the issue.

const url = new URL(`https://serverless.roboflow.com/${workspaceName}/workflows/${params.workflowId}`);
		const body: Record<string, any> = {
			api_key: this.apiKey,
		};

		if ('imageUrl' in params) {
			body.inputs = {
				image: {
					type: 'url',
					value: params.imageUrl,
				},
			};
		}

		const response = await fetch(url, {
			body: JSON.stringify(body),
			headers: {
				'Content-Type': 'application/json',
			},
			method: 'POST',
		});

		if (response.ok) {
			return response.json();
		} else {
			throw new Error(`Roboflow request failed with status ${response.status}`, { cause: response });
		}

Good Morning @Kevin_Gorski!
To help me triage further, do you encounter this error if you resize the 6000x4000 image to 2048 x 1365 (<=2048px) before passing the image to inference?

The smaller image succeeded, but I also just retried the original version and that seems to be succeeding again. :face_with_spiral_eyes:

Trying it again the small image is succeeding and the original is still failing

Error in execution: Non-zero status code returned while running Conv node. Name:'/model.1/conv/Conv' Status Message: /onnxruntime_src/onnxruntime/core/framework/bfc_arena.cc:376 void* onnxruntime::BFCArena::AllocateRawInternal(size_t, bool, onnxruntime::Stream*, bool, onnxruntime::WaitNotificationFn) Failed to allocate memory for requested buffer of size 1153433600\n","error_type":"StepExecutionError","context":"workflow_execution | step_execution","inner_error_type":"str","inner_error_message":"Error in execution: Non-zero status code returned while running Conv node. Name:'/model.1/conv/Conv' Status Message: /onnxruntime_src/onnxruntime/core/framework/bfc_arena.cc:376 void* onnxruntime::BFCArena::AllocateRawInternal(size_t, bool, onnxruntime::Stream*, bool, onnxruntime::WaitNotificationFn) Failed to allocate memory for requested buffer of size 1153433600\n","blocks_errors":[{"block_id":"model","block_type":"roboflow_core/roboflow_object_detection_model@v1","block_details":null,"property_name":null,"property_details":null}]}

We’re now running an automated test with the original image at the top of every hour and it seems like around noon Mountain time it starts failing, not sure if it’s because of other traffic/work being done on the same (Roboflow) infrastructure (we’re still making a very low volume of calls overall).

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.