RF-DETR Feedback

Hey y’all,

What is the best way to talk to Roboflow about bugs in RF-DETR? I’ve been working on a project using this model and have come across some areas for improvement. I’ve made one post on GitHub, and I’ve noticed that it has been a few weeks since Roboflow has gotten back to anyone. What would be the best platform for me to discuss these issues with Roboflow? I know that y’all have Discord and this forum in addition to GitHub forums.

Best,
Rhys

1 Like

Hi @RhysDeLoach!
This (The Forum) is the best place to discuss any issues you have or feedback you can provide on the Roboflow app.

Would love to discuss the specific issues you’re encountering with RF-DETR!

Hey @Ford,

Ok, great! All of these tips pertain to the develop branch of RF-DETR and not necessarily the official release.

Issue 1:
The functionality to perform a test run once the model finishes training has been added to the develop branch. I’m not sure if this was intentional, but in implementing this feature, a requirement for a test set was introduced, even when run_test is set to False. The error below illustrates this issue:

FileNotFoundError: [Errno 2] No such file or directory: 'dataset/test/_annotations.coco.json'

This seems to occur because dataset_test is being built and data_loader_test created regardless of the run_test flag’s value. According to the traceback, the failure happens at line 196 of main.py (see below):

dataset_test = build_dataset(image_set='test', args=args, resolution=args.resolution)

Also, if run_test is going to default to True, I believe there should be more graceful handling to inform the user that they need to either provide a test set or set run_test to False.

Issue 2:
There are two parts to this issue. I am working with a dataset that has some class imbalance, so I have been experimenting with some different techniques to try and account for this. One way that I’ve tried to adjust for my class imbalance is by using the varifocal loss function rather than the ia bce loss function.

model.train(use_varifocal_loss = True, ia_bce_loss = False,...)

The first error that I ran into when trying to use the varifocal loss function was…

RuntimeError: Index put requires the source and destination dtypes match, got BFloat16 for the destination and Float for the source.

This was resulting from a difference in data type between pos_ious (Float) and cls_iou_targets (BFloat16) in line 351 of lwdetr.py. When running on CPU, this error does not occur. I assume this is a result of some error in the Mixed Precision handling, but I haven’t looked into it too much. Interestingly, on CPU, cls_iou_targets is initialized to Float while on GPU it is initialized to BFloat16. Meanwhile, pos_ious is always Float. I was able to fix this issue by adding…

pos_ious = pos_ious.to(cls_iou_targets.dtype)

…right before line 351, but there may be a better way to fix it such as by initializing pos_ious to src_logits.dtype similar to how cls_iou_targets is handled.

The second error that I ran into when trying to use the varifocal loss function was…

File "line 543, in sigmoid_varifocal_loss
    (1 - alpha) * (prob - targets).abs().pow(gamma) * \
                   ~~~~~^~~~~~~~~
RuntimeError: The size of tensor a (3) must match the size of tensor b (4) at non-singleton dimension 2

I did some digging and found that this error is caused by how num_classes is handled in the SetCriterion class in lwdetr.py. I have 2 classes, but because I’m using the Roboflow version of the COCO JSON format which uses a super category buffer class, the detection head is being reinitialized to 3 classes. This num_classes variable is then used to initialize SetCriterion at line 693 in lwdetr.py. However, for some reason, 1 is being added to the num_classes value, making it 4. Later, at line 346, cls_iou_targets is initialized using this incorrect value for its shape. cls_iou_targets is then passed to the sigmoid_focal_loss function at line 512 as the targets argument, which causes a size mismatch error between prob and targets. I fixed the issue by preventing the code from adding 1 to num_classes at line 693…

criterion = SetCriterion(args.num_classes,...)

rather than

criterion = SetCriterion(args.num_classes + 1,...)

The code ran fine after that and the loss function seems to be working correctly. However, there may be a better fix or I could be missing something since I didn’t fully understand the original intent behind adding 1 to num_classes.

This was very convoluted, so if you would like, I’d be more than happy to set up a call to discuss these issues. Hope this helps.

Best,
Rhys

Hey @Ford,

Were you able to look this over?

Best,
Rhys

Hi @RhysDeLoach!
Absolutely this is fantastic feedback!! I greatly appreciate the time you took to put this list together.

Have you had a chance to try the new RF-DETR nano, small and medium?

1 Like

Hey @Ford,

Awesome, I appreciate it. I know that late Friday afternoon posts can often get drowned out by the weekend posts, so I just wanted to make sure that you got this information.

As for the new models, I have been working on a different project this week, so I haven’t gotten the chance to check them out. However, I’ll definitely train a few today and let you know how it goes!

Best,
Rhys

Hey @Ford,

I got a chance the run the new models. For my application, the performance of the medium and small models was comparable to the performance base models. As expected, the nano model was slightly worse, but it is nice to have a more lightweight option. I haven’t played with any hyperparameters on the new models yet, but that is my next step. I appreciate all the hard work that y’all are putting into RF-DETR.

Best,
Rhys

Hi @RhysDeLoach!
Glad to hear it, thank you for trying it out! Looking forward to hearing your results!

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.