Tensorrt converted weights not working with supervision

ok, so it seems that the following are true:

  • throughput is comparable in both cases (openCV reading frames directly and InferencePipeline)
  • latency seems to be higher for InferencePipeline - OpenCV since start of loop adds only ~4ms, whereas InferencePipeline adds ~50ms.

This result truly worries me. All our tests conducted so far indicated only negligible overhead for latency in decoding, providing much better stability in real-world scenarios, which we assumed to be a good trade-off.

Could we verify one more thing? We have set the video consumption mode to limit decoding activity of consumer for the price of increased latency - maybe that is the cause, yet we expected lower overhead.
To tune the option please set:

from inference.core.interfaces.camera.video_source import (
    BufferConsumptionStrategy,
    BufferFillingStrategy,
)

pipeline = InferencePipeline.init_with_custom_logic(
        # pass custom model 
        video_reference=rtsp_url,
        on_video_frame=sink.infer,
        on_prediction=...,
        source_buffer_filling_strategy=BufferFillingStrategy.DROP_OLDEST,
)

here is the output that I am getting


0: 640x640 (no detections), 60.2ms
Speed: 43.5ms preprocess, 60.2ms inference, 1077.6ms postprocess per image at shape (1, 3, 640, 640)
E2E latency inference pipeline: 4.7277s, throughput: 0.0 fps

0: 640x640 10 cars, 1 number_plate, 50.5ms
Speed: 3.0ms preprocess, 50.5ms inference, 1707.4ms postprocess per image at shape (1, 3, 640, 640)

E2E latency inference pipeline: 1.8277s, throughput: 1.122334455663683 fps
0: 640x640 9 cars, 1 number_plate, 42.4ms
Speed: 3.0ms preprocess, 42.4ms inference, 6.0ms postprocess per image at shape (1, 3, 640, 640)

E2E latency inference pipeline: 0.1453s, throughput: 1.6268980477118664 fps
0: 640x640 9 cars, 1 number_plate, 41.9ms
Speed: 3.0ms preprocess, 41.9ms inference, 3.0ms postprocess per image at shape (1, 3, 640, 640)

E2E latency inference pipeline: 0.0733s, throughput: 2.115282919086782 fps
0: 640x640 9 cars, 1 number_plate, 42.3ms
Speed: 3.0ms preprocess, 42.3ms inference, 3.0ms postprocess per image at shape (1, 3, 640, 640)

E2E latency inference pipeline: 0.1192s, throughput: 2.5799793601528753 fps
0: 640x640 9 cars, 1 number_plate, 47.5ms
Speed: 3.0ms preprocess, 47.5ms inference, 3.0ms postprocess per image at shape (1, 3, 640, 640)

E2E latency inference pipeline: 0.1224s, throughput: 3.0 fps
0: 640x640 9 cars, 1 number_plate, 44.0ms
Speed: 4.0ms preprocess, 44.0ms inference, 4.2ms postprocess per image at shape (1, 3, 640, 640)

E2E latency inference pipeline: 0.0833s, throughput: 3.4196384953491434 fps
0: 640x640 9 cars, 1 number_plate, 45.0ms
Speed: 3.0ms preprocess, 45.0ms inference, 2.0ms postprocess per image at shape (1, 3, 640, 640)

E2E latency inference pipeline: 0.1132s, throughput: 3.7914691943117504 fps
0: 640x640 9 cars, 1 number_plate, 45.5ms
Speed: 3.1ms preprocess, 45.5ms inference, 3.0ms postprocess per image at shape (1, 3, 640, 640)

E2E latency inference pipeline: 0.1376s, throughput: 4.172461752421325 fps
0: 640x640 9 cars, 1 number_plate, 44.0ms
Speed: 4.0ms preprocess, 44.0ms inference, 5.0ms postprocess per image at shape (1, 3, 640, 640)

E2E latency inference pipeline: 0.1034s, throughput: 4.506534474964618 fps
0: 640x640 9 cars, 1 number_plate, 44.1ms
Speed: 3.1ms preprocess, 44.1ms inference, 6.0ms postprocess per image at shape (1, 3, 640, 640)

E2E latency inference pipeline: 0.1315s, throughput: 4.820333041178166 fps
0: 640x640 9 cars, 1 number_plate, 46.2ms
Speed: 3.0ms preprocess, 46.2ms inference, 2.0ms postprocess per image at shape (1, 3, 640, 640)

E2E latency inference pipeline: 0.0944s, throughput: 5.154639175238657 fps
0: 640x640 9 cars, 1 number_plate, 43.1ms
Speed: 3.0ms preprocess, 43.1ms inference, 2.0ms postprocess per image at shape (1, 3, 640, 640)

E2E latency inference pipeline: 0.1143s, throughput: 5.473684210526316 fps
0: 640x640 9 cars, 1 number_plate, 46.5ms
Speed: 4.0ms preprocess, 46.5ms inference, 4.0ms postprocess per image at shape (1, 3, 640, 640)

E2E latency inference pipeline: 0.1166s, throughput: 5.742411812939782 fps
0: 640x640 9 cars, 1 number_plate, 45.5ms
Speed: 3.0ms preprocess, 45.5ms inference, 3.0ms postprocess per image at shape (1, 3, 640, 640)

E2E latency inference pipeline: 0.109s, throughput: 6.0 fps
0: 640x640 9 cars, 1 number_plate, 44.3ms
Speed: 4.0ms preprocess, 44.3ms inference, 4.5ms postprocess per image at shape (1, 3, 640, 640)

the throughput reported here increases, due to implementation details of monitor - u need to wait for the value to be more stable, as shown here: https://global.discourse-cdn.com/standard17/uploads/roboflow1/original/2X/4/4570019f52e7efd9d3ec37f9b30afe568b8d86da.png

initial values are lower as first ticks were slow

with these setting I am still getting same 5-6 fps

        source_buffer_filling_strategy=BufferFillingStrategy.DROP_OLDEST,
        on_prediction=sink.on_prediction,

what should I do? Is this inference pipeline issue?

If source_buffer_filling_strategy does not improve the results let’s drop it.

My question is now - you report here https://global.discourse-cdn.com/standard17/uploads/roboflow1/original/2X/4/4570019f52e7efd9d3ec37f9b30afe568b8d86da.png almost 20fps. Latency is suspiciously high and this is something Roboflow team needs to investigate. Apart from that we have probably all sorted out?

Thanks @p.peczek you did solve my original query but now I am stuck at the fps issue.
I created a question under the inference discussion tab on github. Can you please help me debug it or is there an alternate approach that I can use do the inference