Detecting album covers in complex scenes

I’m new to Roboflow and computer vision more generally and was hoping to get some general pointers on how to best approach a particular project…

I want to be able to detect vinyl album covers in complex scenes. Specifically would want to return the name of the album and artist.

I have a data set of hundreds of thousands of album covers with associated metadata.

The input into my program would be a video still that potentially contains an album cover. The album cover could be at an odd orientation or even partially in view, the lighting could be wildly inconsistent.

Most examples I’ve seen are for object class detection, whereas I’m interested in a specific instance of the object. I think I may even need to do both. Maybe an object class detection to detect a vinyl album, then some transformation of the object to standardize it a bit, and then some sort of similarity lookup against my dataset?

Appreciate any points on best methods, models, etc on getting started! Also curious if this is even a feasible project or maybe would not work well with current methods.

I have done a bit of work on vinyl indexing in the past. I outline one approach which involves zero-shot classification with CLIP here, then sending the results to GPT for data classification and enrichment: Cataloguing my vinyl collection with computer vision | James' Coffee Blog

Another approach is to use object detection with object tracking, then send the results to a multimodal model. Object detection is ideal if there are likely to be more than one vinyl record in a given image. With object detection, you can:

  1. Detect all vinyl records
  2. Segment each of them from the rest of the image
  3. Send each record independently to a multimodal model

Because multimodal models are trained on huge datasets, it is likely that more popular records are known already.

You can build workflows for multimodal classification on Roboflow Workflows.

Let me know if you have any questions!

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.