Ideal hyperparameter setup for training RF-DETR for ball detection

I am trying to train an RF-DETR model to detect pickleballs using this dataset, which has approx 4.5k training images. I tried to train it using the how-to-finetune-rf-detr-on-detection-dataset notebook from the tutorial by replacing the basketball dataset with mine, but my model is performing quite poorly:
{‘class’: ‘ball’, ‘map@50:95’: 0.114, ‘map@50’: 0.391, ‘precision’: 0.556, ‘recall’: 0.47}

Please help me figure out the ideal training setup including the important hyperparams and their ideal values.

Hey @kunalshnikov . The fact that you have it working is great progress! Skimming through your dataset, I think you have some opportunities for improvement there. I’ll drop a couple ideas below.

But a big part of this also depends on your use-case. For example, in the basketball example, the use appears to be identifying players and the ball using a “TV-view” of the court. That is, they do not expect to be able to find a basketball laying in the grass at a park. The biggest lift for you may be identifying your use-case and keeping images that meet exactly that.

So some thoughts:

  1. You will achieve better results if the images are all from the same perspective. (i.e. the camera stays in one spot.) Try pulling out a set of images that are all from the end of the court and train based on that. Do not include shots from the side nor close-up. Mixing in some of those closeups with all the details of the pickleball that the other shots do not have might be confusing things
  2. Finding that little yellow ball in a large image is tough. Fortunately, you have some great contrast in images between the ball and the court/players. As you pick images for your test dataset from the end of the court, discard any that have similar “bright spots” in the court area which could be mistaken for a ball. In the short term, this might help your efforts.

For example this is a good image with a view from the baseline and just the one ball:

This image is not a good one and I would not include this in the dataset. It has multiple balls which I would not expect in your “game-play, end-of-court” type images, and it’s not from the right camera angle.

4 Likes

Great reply, @Automatez. Echoing the importance of having your training dataset match the perspective of where your model will run in production. Representative data is more important than the right hyperparameters, @kunalshnikov .

@Automatez @josephofiowa thanks a lot. I was trying to train my first version with that dataset to test how feasible it is to detect a ball using RF-DETR / YOLO. Parallelly, I am building my own training data with footage taken from the behind-the-baseline angle only as that is the angle I am planning to use in practice. Currently it’s frames from a YouTube video but eventually I’ll shoot my own footage as well.

However, I am not able to find comprehensive resources online on the model hyperparams for RF-DETR such as default values and how to adjust the values in response to model results. If you could elaborate a bit more on that, it’ll be really appreciated.