I am currently working on a project in the field of electrical engineering that involves automating the process of extracting material take-off (MTO) data from single-line and multi-line diagrams (SLD/MLD). For this, I’ve developed a custom annotation strategy using Roboflow, aiming to train a model that can accurately detect and classify engineering components and symbols.
To facilitate this, I’ve created three annotation classes:
Component_Block: Used for bounding both symbols and associated text.
Text_Block: Used for bounding text-only descriptions that specify component characteristics but do not require associated symbols.
Symbol_Block: Used for bounding symbols-only, with no associated text.
Using these labels, I trained a YOLO model and achieved an initial mAP@50 of about 0.3 with a quantity of 60 images. While this is a promising start, I’m looking to significantly improve performance and reduce errors.
I would greatly appreciate any guidance from the community regarding:
I want to know how this method works in this type of task,
Best practices in fine-tuning for symbol-text annotation tasks,
Augmentation strategies or preprocessing tips that worked well for similar domains,
Any tricks to improve model performance on technical diagrams or documents.
Hey! I had a somewhat similar project where I was trying to detect a specific symbol with a code inside. (When the office supply runs out on the shelf, the circled code shows up indicating a need to reorder and what the item is to reorder.)
I built mine with a combination of the Workflows and a Python script.
The first step was similar to yours - annotate data, then train a model for object detection to find the circle in my case. (First image below.) I think your Component_Block and Symbol_Block will operate similarly. You’ll just need a sufficient dataset. I would recommend using a bunch of augmentations as well, but be sure to avoid ones that will flip the image, creating backwards text.
With the circle identified, I built a Workflow to take an image and run the model against it, then used a Dynamic Crop block to get just the detections so I could read the item numbers (second image below).
From there I ran a script to run my workflow on a model first, then to run OCR against the detections and get each of the item numbers. (Workflows does have an OCR option, but in my free tier the only one available was not robust enough for my use-case so I used a different one via Python.)
For the Text_Block you mentioned, I am wondering how well you’ll be able to grab text that is floating without any anchored symbol/design. If it always had the same structure (two words on top, number on bottom) the model might pick up the pattern. But if you don’t train for all possible words (change-over, chg-over, change over, etc) then it might not be reliable.
Hope that helps a little. Drop any comments/questions and I’ll answer if I can!