Hi there, I’m a total n00b on all things computer vision but very excited about the possibilities!
My goal is to classify still captures from a webcam pointing at a digital billboard (see image below/attached). The images will be tagged/classified into the client campaigns that are run on the billboard. The goal is to assist my team with capturing proof of flighting photos for audit purposes - currently they trawl through hours of footage looking for a specific client’s ad.
My hypothesis:
I have the original artwork of all the content that is run on the billboard. This is the training data.
I feed this artwork into a roboflow model and train it with brand and campaign
I run the webcam still images against this model
Roboflow outputs the classification: e.g. {image: “webcam01.jpg”; brand:“BP”; campaign: “winter rewards campaign”}
My goal is to create an automated workflow where new ads (e.g. brand-campaign-01.jpg) are uploaded into the model and the webcam images (e.g. webcam01.jpg) are pulled from a google drive (or dropbox), fed into the model, and then saved into the drive under “/classified” as [brand-campaign-date.jpg].
Does this sound like a feasible hypothesis?
I’m hoping that the very specific nature of the ads uploaded and ads displayed as well as the fixed location of the webcam means that there’s little variability other than atmospheric changes such as night/day and screen brightness etc.
Thanks for reading this far, I hope this isn’t a silly idea!
Scott
First, no matter what approach you take you’re 100% going to want a general object detector which picks up any billboard (you’ll then crop the billboards to compare to your train data). This detector should only be trained on webcam data.
Then, it’s a little more open ended. My best idea is using a model like CLIP to compare embeddings between the billboard crop and the base images.
Our workflow feature allows you to build deployment pipelines that handles all of this in one request (it also works on RTSP streams!).
Thanks Jacob, so to play it back to you at my basic level:
The general object detector will “narrow the search zone” to just the screen of the billboard. This detector is fed a bunch of webcam images from the various webcams I’ve got pointing at my billboards.
Then a CLIP model allows me to feed the billboard footage (either via video stream or still captures) and also feed in the base images.
Some follow up questions if you don’t mind:
Points 1 and 2 are all contained in one workflow? Or is point 1 trained first then the end result is put into the workflow?
What would the output be? Does the workflow output still images that are labelled accordingly?
I have over 100 billboards but they mostly look the same so I assume I won’t need to feed the detector too many before it starts to get consistent outputs?
Hey @thescott - ah, I just realized that our CLIP block doesn’t support outputting the embeddings.
Here’s an example of what it would look like if the block did have the output you’d need (I’ve put in a feature request for you).
Basically the flow would be:
Use a finetuned detector (or yolo-world, as I’ve shown in this example) to identify billboards.
Crop the billboards.
Send the crops to CLIP and get and the output embeddings.
Then, outside of a workflow you’d want to compare those embeddings against the embeddings of all of your existing billboards to look for near matches.
Here are our docs on using CLIP (we have a hosted endpoint to generate embeddings, just not in workflows yet). You won’t need too much data for this approach, but you’re going to need to play around with the cosine similarity threshold to call something a match.
Thanks this is very helpful and mostly clear. I will need to play with it to get my head around it but I did get it to crop on first attempt
On your step 4 “… outside of a workflow you’d want to compare those embeddings against the embeddings of all of your existing billboards to look for near matches.”
When you say “existing billboards” do you mean the base images (i.e. the original ads)?
And by outside of a workflow, is that outside of the roboflow ecosystem too? If outside, I imagine I would need to find a place to store the embeddings. I assume Roboflow would be able to cater for the comparison of my embeddings with the base images, if not via a workflow, perhaps via API?
I’m hoping I can set up low-code workflows where:
webcam images are manually pulled from the camera every day (streaming not set up yet so would be 10second increment snap shots) and saved to a dropbox/gdrive and then a workflow on pipedream.com sends them to roboflow to have embeddings generated.
Roboflow has 3 pre-built actions (Classify an image, detect an object from an image, upload image to a roboflow project) - i assume could upload gdrive webcam images to the project1. base images are also uploaded to a separate gdrive and pushed into a model on Roboflow.
embeddings are then held up against the model to be classified
Or does this not require a model on step 2 as comparisons are a different thing?
Yes - by “existing billboards” I meant base images.
Unfortunately, while we have embeddings of images stored in Roboflow today, we don’t have a good way to externalize those via search. The best method would be using our Search API, but that would require uploading the base crop into Roboflow and using the image ID to do a simiarity search (not ideal for a scaled deployment).
To do this today you’d need to spin up a vector database to run the similarities easily (or, train a custom object detector with ~200 classes for each billboard, but you’d need a lot of training data for this and your original images would not be useful).
I’m sorry we don’t have a good solve for this - seems like the kind of thing we’d love to support in the future.