I’m fairly new to computer vision models and I’m trying to understand if the area around a bounding box influences the model at all.
An example: this image shows a floor plan with two doors.
Lets say I draw a bounding box around just the door symbols and label the left door as “closet door” and the top door as “exterior door”.
As a human, I could look at the bounding box for the closet door and then look around that bounding box for more clues. I would see that the word “CLOSET” is next to the door and conclude that this is a closet door.
Does the area around the bounding box influence the model’s predictions, or does it only train on the portion of the image inside the bounding box?
If the latter, how would I go about using the context around a bounding box for clues as to the object’s identification? Would I need to make the original bounding boxes larger (I suspect this would make the training data too noisy)? Would I need to run some sort of second stage model?