RF-DETR Training Failed due to missing "wandb" module

jerluc · April 11, 2025, 12:40am

This is a little confusing since we’re trying to train using the new RF-DETR model in the cloud, but it fails every time saying that Roboflow’s Python process is missing the “wandb” module.

The image dataset is correctly resized to 640x640, so the suggested fix does not seem to be the problem.

Project Type: Object Detection
Operating System & Browser: macOS + Firefox
Project Universe Link or Workspace/Project ID: stationa/we-rooftops-det (private workspace)

peter_roboflow · April 11, 2025, 3:38pm

Hi, we’ve deployed a new version, can you retry and let me know how it goes

jerluc · April 11, 2025, 8:02pm

Thanks @peter_roboflow. That seems to have worked to train now!

I’m a little confused though as:

Training took only a little over an hour on a dataset of 2700 images despite that the estimates were claiming it would take closer to 8 hours. It also stopped after only 18 epochs (this might just mean the model wasn’t seeing much progress and stopped, but not sure).
The confusion matrix and vector analysis charts are completely empty at all confidence thresholds.
Both precision and recall show up as 100% which is highly suspect.

Any thoughts on what happened here?

(Sorry I tried to upload more screenshots, but the forum won’t let me since my account is new?)

peter_roboflow · April 11, 2025, 10:21pm

@jerluc sorry about the forum not allowing image uploads, not sure how to fix that

Training took only a little over an hour on a dataset of 2700 images despite that the estimates were claiming it would take closer to 8 hours. It also stopped after only 18 epochs (this might just mean the model wasn’t seeing much progress and stopped, but not sure).

yeah, we do early stopping so that we don’t waste your money while the model isn’t improving. rf-detr converges really fast so it should generally take a lot less time than estimated – one of the aspects of the model we are proud of since it should save users credits

The confusion matrix and vector analysis charts are completely empty at all confidence thresholds.

hmmm, that should not be happening, we will look into that!

Both precision and recall show up as 100% which is highly suspect.

you are right to be suspicious – those numbers arent getting calculated yet for rf-detr other than in model eval. That was a frontend bug I thought we had fixed – do you mind sharing where exactly in the app you are seeing that? If you can’t upload images here you can electronically mail me at my name at roboflow dot com

jerluc · April 12, 2025, 12:33am

Thanks @peter_roboflow! It makes sense then that it didn’t take very long. I was just surprised to see it so much faster than was estimated!

Please let me know if you’ve found out anything about the confusion matrix. I figured this was related to the precision and recall showing up incorrectly, but maybe not.

For context, this is a screenshot from the top of the “Visualize” page that shows the 100% numbers:

Matvei_Popov · April 14, 2025, 7:57pm

Hello @jerluc I’m very sorry you experienced this issue. We found what was causing this problem and are currently working on fixing it. We will let you know as soon as it is ready!

system · May 5, 2025, 7:57pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
RF-DETR Training Glitch: 10.5 Hours, 20+ Credits, and No Progress Past Epoch 12 Community Help	6	172	April 15, 2025
RF-DETR model stuck in "Stopping.." stage Community Help	6	96	April 27, 2025
RF-DETR training for a long time, please help Community Help	2	114	April 1, 2025
Issue with RF-DETR Training: Missing wandb Module Feedback bugs	0	50	April 14, 2025
Issue Stopping RF-DETR Training Job - Estimated Time Increased Drastically Community Help	3	64	May 6, 2025

RF-DETR Training Failed due to missing "wandb" module

Related topics