Complete newbie question on annotating labels that are postioned relative to other lables

Hi all, I’m working on a project to evaluate using computer vision to analyse a house under construction.

We would like to identify house framing classes such as:

All of these components/classes are commonly made out of wood or steel, so the only thing that is different between each of them, is how they located in relation to other components - e.g. a stud is a tall vertical piece of wood, and a dwang is a horizontal piece of wood between two studs.

My question is is this “relativity” able to be trained into a model?

I’m not sure if my question makes sense, but any advice appreciated.