Detecting album covers in complex scenes

arpeggiyo · October 10, 2024, 4:30pm

I’m new to Roboflow and computer vision more generally and was hoping to get some general pointers on how to best approach a particular project…

I want to be able to detect vinyl album covers in complex scenes. Specifically would want to return the name of the album and artist.

I have a data set of hundreds of thousands of album covers with associated metadata.

The input into my program would be a video still that potentially contains an album cover. The album cover could be at an odd orientation or even partially in view, the lighting could be wildly inconsistent.

Most examples I’ve seen are for object class detection, whereas I’m interested in a specific instance of the object. I think I may even need to do both. Maybe an object class detection to detect a vinyl album, then some transformation of the object to standardize it a bit, and then some sort of similarity lookup against my dataset?

Appreciate any points on best methods, models, etc on getting started! Also curious if this is even a feasible project or maybe would not work well with current methods.

rfjames · October 14, 2024, 11:09pm

I have done a bit of work on vinyl indexing in the past. I outline one approach which involves zero-shot classification with CLIP here, then sending the results to GPT for data classification and enrichment: Cataloguing my vinyl collection with computer vision | James' Coffee Blog

Another approach is to use object detection with object tracking, then send the results to a multimodal model. Object detection is ideal if there are likely to be more than one vinyl record in a given image. With object detection, you can:

Detect all vinyl records
Segment each of them from the rest of the image
Send each record independently to a multimodal model

Because multimodal models are trained on huge datasets, it is likely that more popular records are known already.

You can build workflows for multimodal classification on Roboflow Workflows.

Let me know if you have any questions!

system · November 4, 2024, 11:09pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Suggestions for improving image detection Community Help	10	235	October 12, 2023
Model Accuracy Using Synthetic Data for Retail Bottle Recognition Community Help	4	23	August 13, 2024
Asking Question Community Help	3	743	October 15, 2023
Paint by Numbers Detection Community Help segmentation	4	30	July 1, 2025
Detection Needed Community Help	6	240	January 4, 2024

Detecting album covers in complex scenes

Related topics