Clip random moment from random video

I would like to build a program that can find any random moment in any random video and tell you what time the moment is at.

Ex. “Mario fighting Bowser” in www.youtube.com/watch?v=YAb3b3jWdtA,
or “Steph Curry shooting a 3 pointer” in www.youtube.com/watch?v=-1pIEirBdTY

Is this possible to do with roboflow?

I built a program that does this with GPT 4-Vision by extracting one frame from each second, asking if it represents the moment, and then returning the sequence of frames most likely to represent the moment. It works great, but using GPT 4-Vision on that scale is very expensive!

Yes, it is possible to build such a program using Roboflow. You can use Roboflow’s Video Inference API to run computer vision models on frames in videos at a custom FPS. This can be used to identify specific moments in a video, similar to how you’ve been using GPT-4 Vision.

However, to identify specific moments like “Mario fighting Bowser” or “Steph Curry shooting a 3 pointer”, you would need a trained model that can recognize these specific events. If you have a dataset of images labeled with these events, you can use Roboflow to train a model to recognize them.

Once you have a trained model, you can use it to analyze frames from the video. You can then use these vectors with a video search template to enable searching your video with text queries. This can be used to find the time in the video where the event occurs.

Here’s a high-level overview of the steps you would need to follow:

  1. Collect a dataset of images representing the events you’re interested in (e.g., “Mario fighting Bowser”, “Steph Curry shooting a 3 pointer”).
  2. Use Roboflow to train a model on this dataset.
  3. Use the Roboflow Video Inference API to run your trained model on frames from the video.
  4. Use the output from the model to search the video for the events you’re interested in.

Please note that while this approach can help reduce costs compared to using GPT-4 Vision, there will still be costs associated with using the Roboflow Video Inference API and training a model.

For more details on how to use Roboflow with video, you can refer to this blog post. For more information on how to search video frames with Roboflow, you can refer to this guide.


Hi! I’m Roboflow’s AI powered bot. This is an automated response powered using AI with knowledge from our docs, blog, and forum. If this solves your issue, please mark it as the solution using the checkbox icon. If this doesn’t solve your issue, please follow up with what you’ve tried/why, and the community will continue to chime in as usual.

Is it possible to create a more general-use program that can find any random moment in any random video?

CLIP is awesome here! You create an embedding for each frame and then compare them to the embedding for the test prompt (text and images share the same embedding space).

Here’s a related tutorial.

1 Like

Thank you! What is the API cost for something like this? My team would use it on a large scale (>100,000 frames per day)

You would qualify for a significant volume discount. I’ll email you separately.

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.