Hi,
I am completely new to computer vision so forgive any lack of clarity in my question.
I’ve trained an object detection model and I am playing around with workflows to produce text output based on the combination of classes of objects detected. I am basically trying to perform dummed-down caption creation based on stereotypical combinations of objects.
For example, if a person, wrench, jack, and spare tire were detected in a video, I want to produce text saying “A person is changing a tire.” The videos I’m interested in have a very limited number of combinations of objects being detected, so there are only a few possible outputs which is why I feel like this solution would work for me.
The issue I am having is, I am able to output a string if a single object is detected using the expression block, but not based on the combination of objects.
Is there a way to use the expression block to produce a string output if both object 1 AND object 2 are detected?
Thanks