Trackers fail compared to frame-by-frame Keypoint Detection

How do I tune trackers for video inference?

My Keypoint Detection model reached the following metrics:

  • mAP@50: 96.6%
  • Precision: 100%
  • Recall: 90.9%
  • F1: 95.2%

And it really shows good quality keypoint detection on a bunch of frames I’ve provided.

But when it comes to the video inference, model seems to get mad. Standard code from documentation doesn’t contain any tracking at all. So I’ve tried to add trackers:

  • OC-SORT doesn’t outperform any other tracker
  • SORTTracker also gives insufficient results
  • DetectionSmoother changed almost nothing in the way bbox coordinates are calculated
  • ByteTrack often loses track of an object and produces hundreds of IDs (only ~2 needed)

All of the them often ‘lose’ backward objects (probably because of incorrect IoU threshold).

I’ve added hard poses, frames with vertices missing, like GPT on Roboflow Documentation advised, but it made almost no improvement, comparing to frame-by-frame inference.

I’ve also tried to tune ByteTrack:

tracker = sv.ByteTrack(

       lost_track_buffer=75,

       track_activation_threshold=0.18,

       minimum_matching_threshold=0.7,

       minimum_consecutive_frames=1,

)

But the suggested param values seems to be bad.

You can compare the inference quality on the image:

How should I tune this trackers or change video inference module to get better results?

Hi @fitwist ,

Could you also share a video of just detection (visualization) on a video? Could be that inference speed is too slow for tracker to consistently maintain ID over frames.

Thanks, Erik

Of course, here they are

OC-SORT: https://youtu.be/AZ3smKoeL2Y

Frame-by-frame: https://youtu.be/CPvP3gULlEo

Hi @fitwist , could you also share the full code with detection, tracking, and visualization? The first video seems to have an issue with visualization (scaling?), but would like to verify it locally.

Thanks, Erik

Here’s the code for tracking:

Here’s the code for visualization:

Hi @fitwist ! Here Alex from the trackers team. I see in the OC-SORT tracker video that boxes and keypoints clearly look corrupt, while in the other video KeyPoints look really nice!

Could you tell me how are you using the tracker? because OC-SORT isn’t responsible for the input detections quality and I couldnt find this in your code. Also, how are you detecting keypoints in that video? Because OC-SORT should never return a different box than what was inputted, so I think that the problem might be in how you are handling the input to the tracker. Maybe a video of the input boxes helps!

Also we encourage to use tracking alogirthms from trackers package which enables to easily swap trackers instead of sv.ByteTrack.

Best,

Alex