Does the context around a bounding box influence the model?

I’m fairly new to computer vision models and I’m trying to understand if the area around a bounding box influences the model at all.

An example: this image shows a floor plan with two doors.


Lets say I draw a bounding box around just the door symbols and label the left door as “closet door” and the top door as “exterior door”.

As a human, I could look at the bounding box for the closet door and then look around that bounding box for more clues. I would see that the word “CLOSET” is next to the door and conclude that this is a closet door.

Does the area around the bounding box influence the model’s predictions, or does it only train on the portion of the image inside the bounding box?

If the latter, how would I go about using the context around a bounding box for clues as to the object’s identification? Would I need to make the original bounding boxes larger (I suspect this would make the training data too noisy)? Would I need to run some sort of second stage model?

1 Like

Hey @cedric-swivvel - thanks for posting!

First; the most popular object detection and instance segmentation models use the pixels within the annotation as positive prompts, and the pixels outside the annotation as negative prompts. By labeling a box around the door, you are teaching the model that the pixels in that annotation are “like door”, and the pixels outside that annotation are “not like door”.

The model will not pick up the context clues like “exterior” or “closet”, per your example.

Generally, you want to solve this with post-processing. You should label all doors as doors, and then use other logic to understand door type. For instance, you could use OCR for closet door and maybe wall detection for exterior door.

1 Like

Makes sense, thanks for the quick reply!

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.