Can I deploy two different models to my jetson at the same time?

I need to do two stage object detection at the edge. Can I deploy two different trained models from Roboflow onto the same device (Jetson performance allowing)?

1 Like

Definitely! The Roboflow Inference Server for NVIDIA Jetson will handle loading and caching the weights in memory and using the correct ones based on the endpoint you call.

I’ve not used the inference server on my 4GB Jetson Nano, as it doesn’t get the performance I’d expect from the device. I load three models: two 608x608x3 and one 608x608x1. I train a Yolov4 darknet model on Google Colab, then on the nano convert it to ONNX. I then compile the ONNX to TensorRT, and use the python bindings to pass opencv images to the tensorrt model. If you avoid TensorRT, you have a lot more flexability in terms of models, but the performance won’t be as optimized.

What speeds do you see there compared to the inference server?

Just the context.execute_async(bindings=bindings, stream_handle=stream.handle) section runs in 0.006s or about 166 fps, and I’m inferencing a single image. Potentially it could inference 8 images concurrently in the same time if using one model.

The bulk of the time my program spends preprocessing the image and the apply nms to the result, using opencv and python. This brings it down to 0.07s or 14 fps.

That makes sense; the more fair compare is against the full time per frame (vs just the model inference) because the server is doing all that other image processing and NMS stuff also vs just calling the model. Have you tried our Jetson Docker to see what fps you get on that?