RF-DETR - Latency Results Do Not Match Paper

Omer_Taub · December 21, 2025, 11:26am

Hello,

I am benchmarking the model reported in the paper and am seeing a noticeable mismatch between the latency numbers reported in the paper and the latency I measure in practice.

Setup details:

Batch size: 1
GPU: NVIDIA RTX PRO 4500 (Blackwell)
Inference focused (no training, no data loading overhead)

Despite matching the batch size and using a modern high-end GPU, the measured latency is consistently higher than what is reported in the paper. I want to confirm:

Whether the paper’s latency numbers were measured with any specific assumptions (e.g., mixed precision, TensorRT, specific CUDA/cuDNN versions, or warmup strategy).
Whether preprocessing/postprocessing was excluded from the reported latency.
If Flash Attention, fused kernels, or other backend-specific optimizations were explicitly enabled.
Whether the reported numbers reflect end-to-end latency or pure model forward time.

Any clarification on the exact benchmarking methodology used in the paper would be very helpful, as I am trying to reproduce the results as closely as possible.

Thank you!

isaacrob · December 21, 2025, 2:15pm

The paper says the specific conditions under which latency is measured, see section 4 and the appendix on CUDA graphs. As mentioned in the paper, we use TensorRT 10.4 and CUDA 12.4 on a T4 GPU with FP16 and CUDA graphs enabled. We measure model forward time, which for a DETR is the end to end time.

As is also mentioned in the paper, we take a close look at power throttling as a source of inconsistencies in latency measurements in different papers and propose a solution. To facilitate research, we linked to an open source repo containing code to exactly reproduce our claimed latencies and accuracies.

system · January 11, 2026, 2:15pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Speed of Small Object Detection Models 🤝 Community Help	5	193	August 27, 2025
Tensorrt converted weights not working with supervision 🤝 Community Help bugs	27	469	July 3, 2024
Discrepancy between predictions, could use some guidance 🤝 Community Help rf-detr	1	43	November 25, 2025
Discrepancy between model weights and roboflow hosted inference? 🛠️ Feature Reqs	4	116	August 11, 2025
RF-DETR Feedback 🤝 Community Help	8	299	August 18, 2025

RF-DETR - Latency Results Do Not Match Paper

Related topics