I am trying to train an RF-DETR model to detect pickleballs using this dataset, which has approx 4.5k training images. I tried to train it using the how-to-finetune-rf-detr-on-detection-dataset notebook from the tutorial by replacing the basketball dataset with mine, but my model is performing quite poorly:
{‘class’: ‘ball’, ‘map@50:95’: 0.114, ‘map@50’: 0.391, ‘precision’: 0.556, ‘recall’: 0.47}
Please help me figure out the ideal training setup including the important hyperparams and their ideal values.
Hey @kunalshnikov . The fact that you have it working is great progress! Skimming through your dataset, I think you have some opportunities for improvement there. I’ll drop a couple ideas below.
But a big part of this also depends on your use-case. For example, in the basketball example, the use appears to be identifying players and the ball using a “TV-view” of the court. That is, they do not expect to be able to find a basketball laying in the grass at a park. The biggest lift for you may be identifying your use-case and keeping images that meet exactly that.
So some thoughts:
- You will achieve better results if the images are all from the same perspective. (i.e. the camera stays in one spot.) Try pulling out a set of images that are all from the end of the court and train based on that. Do not include shots from the side nor close-up. Mixing in some of those closeups with all the details of the pickleball that the other shots do not have might be confusing things
- Finding that little yellow ball in a large image is tough. Fortunately, you have some great contrast in images between the ball and the court/players. As you pick images for your test dataset from the end of the court, discard any that have similar “bright spots” in the court area which could be mistaken for a ball. In the short term, this might help your efforts.
For example this is a good image with a view from the baseline and just the one ball:
This image is not a good one and I would not include this in the dataset. It has multiple balls which I would not expect in your “game-play, end-of-court” type images, and it’s not from the right camera angle.
4 Likes
Great reply, @Automatez. Echoing the importance of having your training dataset match the perspective of where your model will run in production. Representative data is more important than the right hyperparameters, @kunalshnikov .
@Automatez @josephofiowa thanks a lot. I was trying to train my first version with that dataset to test how feasible it is to detect a ball using RF-DETR / YOLO. Parallelly, I am building my own training data with footage taken from the behind-the-baseline angle only as that is the angle I am planning to use in practice. Currently it’s frames from a YouTube video but eventually I’ll shoot my own footage as well.
However, I am not able to find comprehensive resources online on the model hyperparams for RF-DETR such as default values and how to adjust the values in response to model results. If you could elaborate a bit more on that, it’ll be really appreciated.