I’m new to Roboflow and computer vision more generally and was hoping to get some general pointers on how to best approach a particular project…
I want to be able to detect vinyl album covers in complex scenes. Specifically would want to return the name of the album and artist.
I have a data set of hundreds of thousands of album covers with associated metadata.
The input into my program would be a video still that potentially contains an album cover. The album cover could be at an odd orientation or even partially in view, the lighting could be wildly inconsistent.
Most examples I’ve seen are for object class detection, whereas I’m interested in a specific instance of the object. I think I may even need to do both. Maybe an object class detection to detect a vinyl album, then some transformation of the object to standardize it a bit, and then some sort of similarity lookup against my dataset?
Appreciate any points on best methods, models, etc on getting started! Also curious if this is even a feasible project or maybe would not work well with current methods.