Tensorrt converted weights not working with supervision

Mubashir_Waheed · June 10, 2024, 10:40am

I am working on a computer vision project and for that i trained a custom model on top yolov8l-seg
I am getting 6-7 fps when I run inference and in order to increase the fps I converted the weights to tensorrt which created the best.engine file.
When I run on inference using best.pt it works fine and I don’t get any error but I am getting following error when I use best.engine

Here is the code

model = YOLO(".\best.engine")

logging.basicConfig(filename='output.log', level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

class CustomSink:
    def __init__(self, weights_path: str, zone_configuration_path: str, classes: List[int]):
        self._model = YOLO(weights_path)
        self.classes = classes
        self.tracker = sv.ByteTrack(minimum_matching_threshold=0.5)
        self.fps_monitor = sv.FPSMonitor()
        self.polygons = load_zones_config(file_path=zone_configuration_path)
        self.timers = [ClockBasedTimer() for _ in self.polygons]
        self.zones = [
            sv.PolygonZone(
                polygon=polygon,
                triggering_anchors=(sv.Position.CENTER,),
            )
            for polygon in self.polygons
        ]

    def infer(self, video_frames: List[VideoFrame]) -> List[any]: 
        # result must be returned as list of elements representing model prediction for single frame
        # with order unchanged.
        return self._model([v.image for v in video_frames])

    def on_prediction(self, result: dict, frame: VideoFrame) -> None:
        self.fps_monitor.tick()
        fps = self.fps_monitor.fps
        # modify the following code to adjust 
        detections = sv.Detections.from_ultralytics(result)
        detections = detections[find_in_list(detections.class_id, self.classes)]
        detections = self.tracker.update_with_detections(detections)

        annotated_frame = frame.image.copy()

        annotated_frame = sv.draw_text(
            scene=annotated_frame,
            text=f"{fps:.1f}",
            text_anchor=sv.Point(40, 30),
            background_color=sv.Color.from_hex("#A351FB"),
            text_color=sv.Color.from_hex("#000000"),
        )

        for idx, zone in enumerate(self.zones):
            annotated_frame = sv.draw_polygon(
                scene=annotated_frame, polygon=zone.polygon, color=COLORS.by_idx(idx)
            )

            detections_in_zone = detections[zone.trigger(detections)]
            time_in_zone = self.timers[idx].tick(detections_in_zone)
            custom_color_lookup = np.full(detections_in_zone.class_id.shape, idx)

            annotated_frame = COLOR_ANNOTATOR.annotate(
                scene=annotated_frame,
                detections=detections_in_zone,
                custom_color_lookup=custom_color_lookup,
            )
            labels = [
                f"#{tracker_id} {int(time // 60):02d}:{int(time % 60):02d}"
                for tracker_id, time in zip(detections_in_zone.tracker_id, time_in_zone)
            ]
            annotated_frame = LABEL_ANNOTATOR.annotate(
                scene=annotated_frame,
                detections=detections_in_zone,
                labels=labels,
                custom_color_lookup=custom_color_lookup,
            )
        cv2.imshow("Processed Video", annotated_frame)
        # cv2.waitKey(1)
        if cv2.waitKey(1) & 0xFF == ord('q'):
            cv2.destroyAllWindows()
            raise SystemExit("Program terminated by user")
    

def main(
    weight_path: str,
    rtsp_url: str,
    zone_configuration_path: str,
    model_id: str,
    confidence: float,
    iou: float,
    classes: List[int],
) -> None:
    sink = CustomSink(weights_path=weight_path ,zone_configuration_path=zone_configuration_path, classes=classes)

    pipeline = InferencePipeline.init_with_custom_logic(
        # pass custom model 
        video_reference=rtsp_url,
        on_video_frame=sink.infer,
        on_prediction=sink.on_prediction,
        # confidence=confidence,
        # iou_threshold=iou,
    )

    pipeline.start()

    try:
        pipeline.join()
    except KeyboardInterrupt:
        pipeline.terminate()

I am passing rtsp_url, path to weights etc as command line arguments.

What is the issue?
I have Nvidia RTX 470 Laptop GPU
Operating system Windows 11

p.peczek · June 10, 2024, 1:05pm

Hello there,

Issue that you encountered seems to be quite similar to the one reported here: Export to TensorRT KeyError · Issue #6471 · ultralytics/ultralytics · GitHub

Seems like that was the solution: Export to TensorRT KeyError · Issue #6471 · ultralytics/ultralytics · GitHub

As you are probably using object-detection model I suggest trying:

self._model = YOLO(weights_path, task="detect")

and start debugging the problem by getting YOLO model to work alone, then move on to wrapping it with InferencePipeline.

Mubashir_Waheed · June 10, 2024, 1:36pm

Yes this is the fix. I had to pass task="segment" since I am segmenting. But even after converting the model to tensorrt I am unable to improve the fps. I am getting 4-5 fps for segementation on my 470 nvidia laptop gpu

p.peczek · June 10, 2024, 1:53pm

There are few details I need to know to help you:

what is the model (size)
what is the resolution of the footage
what is the source of footage - video file / USB camera / rtsp stream. If one of last two - I need to know the parameters (fps of the source)

Mubashir_Waheed · June 10, 2024, 2:10pm

I trained model on custom dataset. I used yolov8l-sg as the base model.
I trained the model on default resolution (640)
I am getting the live camera feed through rtsp url and I have video at 1920*1080
around 17 fps

when I use object detection model I get around 25 fps for the same stream.
i converted to tensorrt to get better results. What possible changes can I make to get better fps?

p.peczek · June 10, 2024, 2:28pm

I am not sure how TensorRT conversion works, I assume that it requires image to be 640px, and once you provide 1080p in the script, footage gets downsized - but this is something you could verify.

I would start from checking what may be a bottleneck in this case.

Could you first grab some video frames from camera and run for-loop in python script measuring time that the inference from the model takes - by doing so u would mimic what is done here:

    def infer(self, video_frames: List[VideoFrame]) -> List[any]: 
        # result must be returned as list of elements representing model prediction for single frame
        # with order unchanged.
        return self._model([v.image for v in video_frames])

just without inference pipeline and real-time video processing. Please take a look at the actual input resolution for the model and utilisation of GPU.
If you see that results are produced in fast pace - we would go deeper to see if bottleneck is in InferencePipeline

Mubashir_Waheed · June 11, 2024, 7:59am

I ran the following code

import cv2
import time
from ultralytics import YOLO
import numpy as np

# Initialize the YOLO model
model = YOLO("C:\\Users\\mubas\\OneDrive\\Desktop\\ultralytics\\segmentation\\seg-trained-model-weights\\best.engine", task='segment')

def capture_and_infer(rtsp_url: str, num_frames: int) -> None:
    cap = cv2.VideoCapture(rtsp_url)
    if not cap.isOpened():
        print("Error: Unable to open video stream")
        return

    for i in range(num_frames):
        ret, frame = cap.read()
        if not ret:
            print("Error: Unable to read frame from video stream")
            break

        # Start measuring time
        start_time = time.time()

        # Perform inference
        results = model([frame])

        # End measuring time
        inference_time = time.time() - start_time
        print(f"Frame {i+1}: Inference time = {inference_time:.4f} seconds")

        # Process results if needed (this is just an example)
        for result in results:
            pass

    cap.release()
    cv2.destroyAllWindows()

if __name__ == "__main__":
    rtsp_url = "rtsp://admin:<password>@192.168.0.126:554/Streaming/Channels/2001"
    num_frames = 10  
    capture_and_infer(rtsp_url, num_frames)

and I am getting this output

[06/11/2024-11:09:40] [TRT] [I] Loaded engine size: 258 MiB
[06/11/2024-11:09:40] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +99, now: CPU 0, GPU 352 (MiB)
[06/11/2024-11:09:40] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading

0: 640x640 (no detections), 39.5ms
Speed: 29.4ms preprocess, 39.5ms inference, 621.8ms postprocess per image at shape (1, 3, 640, 640)
Frame 1: Inference time = 3.5867 seconds

0: 640x640 16 cars, 1 number_plate, 39.0ms
Speed: 3.0ms preprocess, 39.0ms inference, 1056.5ms postprocess per image at shape (1, 3, 640, 640)
Frame 2: Inference time = 1.1005 seconds

0: 640x640 16 cars, 1 number_plate, 40.0ms
Speed: 2.0ms preprocess, 40.0ms inference, 6.0ms postprocess per image at shape (1, 3, 640, 640)
Frame 3: Inference time = 0.0490 seconds

0: 640x640 16 cars, 1 number_plate, 38.5ms
Speed: 3.0ms preprocess, 38.5ms inference, 2.0ms postprocess per image at shape (1, 3, 640, 640)
Frame 4: Inference time = 0.0465 seconds

0: 640x640 16 cars, 1 number_plate, 43.0ms
Speed: 2.0ms preprocess, 43.0ms inference, 2.0ms postprocess per image at shape (1, 3, 640, 640)
Frame 5: Inference time = 0.0490 seconds

0: 640x640 16 cars, 2 number_plates, 42.2ms
Speed: 2.0ms preprocess, 42.2ms inference, 4.0ms postprocess per image at shape (1, 3, 640, 640)
Frame 6: Inference time = 0.0502 seconds

0: 640x640 16 cars, 1 number_plate, 42.0ms
Speed: 3.0ms preprocess, 42.0ms inference, 4.0ms postprocess per image at shape (1, 3, 640, 640)
Frame 7: Inference time = 0.0500 seconds

0: 640x640 16 cars, 2 number_plates, 41.0ms
Speed: 2.0ms preprocess, 41.0ms inference, 4.0ms postprocess per image at shape (1, 3, 640, 640)
Frame 8: Inference time = 0.0490 seconds

0: 640x640 16 cars, 2 number_plates, 42.0ms
Speed: 3.0ms preprocess, 42.0ms inference, 4.0ms postprocess per image at shape (1, 3, 640, 640)
Frame 9: Inference time = 0.0530 seconds

0: 640x640 16 cars, 2 number_plates, 41.0ms
Speed: 3.0ms preprocess, 41.0ms inference, 3.0ms postprocess per image at shape (1, 3, 640, 640)
Frame 10: Inference time = 0.0490 seconds

p.peczek · June 11, 2024, 8:56am

ok, great

so you should be able to reach 20 FPS running raw processing in TRT inferring against 640x640 image, at least taking into GPU throughput.

Lets do 2 things.

Could you please denote whole time of frame processing:

    for i in range(num_frames):
        frame_start = time.time()
        ret, frame = cap.read()
        if not ret:
            print("Error: Unable to read frame from video stream")
            break

        # Start measuring time
        start_time = time.time()

        # Perform inference
        results = model([frame])

        # End measuring time
        inference_time = time.time() - start_time
        print(f"Frame {i+1}: Inference time = {inference_time:.4f} seconds")

        # Process results if needed (this is just an example)
        for result in results:
            # Do something with the result if needed
            pass
       total_frame_time = time.time() - frame_start
       print(f"Frame {i+1}: total time = {total_frame_time:.4f} seconds")

That would make it clear how long it takes to a) grab and decode frame b) make inference

Having that done I would compare it to InferencePipeline with sink:

from datetime import datetime


def debug_on_prediction(result: dict, frame: VideoFrame):
    latency = (datetime.now() - frame.frame_timestamp).total_seconds()
    print(f"E2E latency inference pipeline: {round(latency, 4)}s")


# then change sink in ur original code
pipeline = InferencePipeline.init_with_custom_logic(
        # pass custom model 
        video_reference=rtsp_url,
        on_video_frame=sink.infer,
        on_prediction=debug_on_prediction,
)

p.peczek · June 11, 2024, 8:59am

The reason I ask for that is the following:

we need to see how performant ur setup is in processing frame from the start to the end - it may happen that ur GPU would process 20frames a second, but grabbing and decoding frames may be limiting factor
Second exercise checks the same using InferencePipeline as decoding platform, but without sink logic from the original snippet - tracking + zones post-processing

p.peczek · June 11, 2024, 10:35am

And if you can please run more than 10 frames, 100 would probably provide more stable estimate

Mubashir_Waheed · June 11, 2024, 10:37am

0: 640x640 (no detections), 39.0ms
Speed: 30.6ms preprocess, 39.0ms inference, 630.0ms postprocess per image at shape (1, 3, 640, 640)
Frame 1: Inference time = 4.6094 seconds

0: 640x640 18 cars, 40.2ms
Speed: 3.0ms preprocess, 40.2ms inference, 1077.1ms postprocess per image at shape (1, 3, 640, 640)
Frame 2: Inference time = 1.1223 seconds

0: 640x640 18 cars, 39.7ms
Speed: 2.0ms preprocess, 39.7ms inference, 4.0ms postprocess per image at shape (1, 3, 640, 640)
Frame 3: Inference time = 0.0476 seconds

0: 640x640 18 cars, 39.1ms
Speed: 2.0ms preprocess, 39.1ms inference, 3.0ms postprocess per image at shape (1, 3, 640, 640)
Frame 4: Inference time = 0.0451 seconds

0: 640x640 18 cars, 41.9ms
Speed: 5.0ms preprocess, 41.9ms inference, 6.0ms postprocess per image at shape (1, 3, 640, 640)
Frame 5: Inference time = 0.0549 seconds

0: 640x640 18 cars, 46.0ms
Speed: 6.0ms preprocess, 46.0ms inference, 2.0ms postprocess per image at shape (1, 3, 640, 640)
Frame 6: Inference time = 0.0550 seconds

0: 640x640 18 cars, 41.7ms
Speed: 3.0ms preprocess, 41.7ms inference, 2.0ms postprocess per image at shape (1, 3, 640, 640)
Frame 7: Inference time = 0.0477 seconds

0: 640x640 18 cars, 44.0ms
Speed: 2.0ms preprocess, 44.0ms inference, 4.0ms postprocess per image at shape (1, 3, 640, 640)
Frame 8: Inference time = 0.0520 seconds

0: 640x640 18 cars, 43.7ms
Speed: 2.0ms preprocess, 43.7ms inference, 2.0ms postprocess per image at shape (1, 3, 640, 640)
Frame 9: Inference time = 0.0488 seconds

0: 640x640 19 cars, 44.3ms
Speed: 2.0ms preprocess, 44.3ms inference, 2.0ms postprocess per image at shape (1, 3, 640, 640)
Frame 10: Inference time = 0.0493 seconds
Frame 10: total time = 0.0538 seconds

for the second

Speed: 2.5ms preprocess, 44.7ms inference, 1143.3ms postprocess per image at shape (1, 3, 640, 640)
Speed: 7.0ms preprocess, 43.1ms inference, 6.0ms postprocess per image at shape (1, 3, 640, 640)
Speed: 5.0ms preprocess, 44.7ms inference, 7.0ms postprocess per image at shape (1, 3, 640, 640)
E2E latency inference pipeline: 0.1069s

Also I keep getting this warning so its very hard to read out
SupervisionWarnings: __call__ is deprecated: FPSMonitor.callis deprecated and will be removed insupervision-0.22.0. Use FPSMonitor.fps instead.
and I can’t stop the program unless I kill the terminal I tried ctrl c , ctrl x still can’t stop the program

SupervisionWarnings: __call__ is deprecated: `FPSMonitor.__call__` is deprecated and will be removed in `supervision-0.22.0`. Use `FPSMonitor.fps` instead.
SupervisionWarnings: __call__ is deprecated: `FPSMonitor.__call__` is deprecated and will be removed in `supervision-0.22.0`. Use `FPSMonitor.fps` instead.
SupervisionWarnings: __call__ is deprecated: `FPSMonitor.__call__` is deprecated and will be removed in `supervision-0.22.0`. Use `FPSMonitor.fps` instead.
0: 640x640 19 cars, 43.0ms
Speed: 3.0ms preprocess, 43.0ms inference, 63.5ms postprocess per image at shape (1, 3, 640, 640)

Mubashir_Waheed · June 11, 2024, 10:42am

Got this error

Error Code 1: Cuda Runtime (out of memory
Error Code 1: Cuda Driver (out of memory)

p.peczek · June 11, 2024, 10:50am

Ok,

as this is multi-threading code, termination of InferencePipeline requires:

    pipeline.start()

    try:
        pipeline.join()
    except KeyboardInterrupt:
        pipeline.terminate()

To avoid cuda errors, given not terminated process you may need to kill processes occupying VRAM. To do it, please find python processes using nvidia-smi and kill them, that should de-allocate memory.

Also, you provided dump where for first case:
Frame {i}: total time = ... seconds appears once, not for each frame

and for InferencePipeline you have few dumps of
Speed: 2.5ms preprocess, 44.7ms inference, 1143.3ms postprocess per image at shape (1, 3, 640, 640) …
with only one entry of E2E latency inference pipeline: 0.1069s
please run it longer such that we have insights in process behaviour after everything starts and stabilises. I would say something is wrong with 100ms E2E given model takes ~50, but to evaluate root cause I would need to see if that is only temporary state at startup or something that is present through whole processing

Mubashir_Waheed · June 11, 2024, 10:53am

this is may main function I have the keyboard interrupt setup properly

def main(
    weight_path: str,
    rtsp_url: str,
    zone_configuration_path: str,
    model_id: str,
    confidence: float,
    iou: float,
    classes: List[int],
) -> None:
    sink = CustomSink(weights_path=weight_path ,zone_configuration_path=zone_configuration_path, classes=classes)

    pipeline = InferencePipeline.init_with_custom_logic(
        # pass custom model 
        video_reference=rtsp_url,
        on_video_frame=sink.infer,
        on_prediction=debug_on_prediction
        # on_prediction=sink.on_prediction,
        # confidence=confidence,
        # iou_threshold=iou,
    )

    pipeline.start()

    try:
        pipeline.join()
    except KeyboardInterrupt:
        pipeline.terminate()

Mubashir_Waheed · June 11, 2024, 11:11am

I keep getting fps_call__ deprecated warning in my terminal which covers all the logs

Mubashir_Waheed · June 11, 2024, 12:30pm

see

p.peczek · June 11, 2024, 12:51pm

ok, it is fine,
to remove warnings: export SUPERVISON_DEPRECATION_WARNING=0

I see that there is latency introduced by presence of InferencePipeline - not sure atm why, maybe this reveals weakness of implementation that can be removed, I would need to take a closer look.

signals are not handled properly, probably due to execution of non-python threads under the hood, this is also something I would need to take a closer look.

Let’s do one final thing as in InferencePipeline latency != throughput due to threading involved.

MONITOR = sv.FPSMonitor()

def debug_on_prediction(result: dict, frame: VideoFrame, monitor: sv.FPSMonitor = MONITOR):
    MONITOR.tick()
    latency = (datetime.now() - frame.frame_timestamp).total_seconds()
    print(f"E2E latency inference pipeline: {round(latency, 4)}s, throughput: {MONITOR.fps} fps")

p.peczek · June 11, 2024, 12:53pm

This is to verify throughput without polygons tracing.

I bet this would show the value u were reporting - around 7-8 FPS.

Given this is the case, I would have suggestions to tune performance of inference pipeline, but most likely we have something to improve there, such that we do not introduce so huge latency

Mubashir_Waheed · June 11, 2024, 1:11pm

alright added the export using

import os
os.environ['SUPERVISON_DEPRECATION_WARNING'] = '0'

in case someone in future comes here

Mubashir_Waheed · June 11, 2024, 1:18pm

Topic		Replies	Views
RSTP Local inference running slow/jerky 🤝 Community Help	25	465	December 18, 2024
Roboflow Train usage with Python 🤝 Community Help bugs	8	447	June 30, 2022
Can I deploy two different models to my jetson at the same time? 🤝 Community Help	5	1208	February 17, 2022
How to imbed the video frames from the inference pipeline in the web 🤝 Community Help	26	481	August 27, 2024
Why video stream zoomed in? 🤝 Community Help	4	62	July 17, 2024

Tensorrt converted weights not working with supervision

Related topics