Hi, we’d like to annotate spherical objects like walnuts, eggs to classify them by sizes/ diameter.
I understand that images can be labelled with circular boundings, but not sure what is the best way to achieve it within the roboflow labelling interface.
While labelling images, within smart polygons, there is convex hull shape option.
Is this the best option for labelling spherical objects by size classes?
Based on the information provided in the extracts, Roboflow supports various annotation tools including bounding boxes, polygons, and smart polygons. However, there is no specific mention of a tool for annotating spherical objects or using a convex hull shape option within the smart polygons.
The Smart Polygon tool in Roboflow allows you to label objects by clicking the center of the object you want to label, and the tool will apply an initial label to the object using a machine learning model to suggest a shape for your object source. If the Smart Polygon label doesn’t perfectly capture the object of interest, you can refine the shape by clicking outside of the applied label to expand the selection or inside the shape to remove areas source.
For spherical objects like walnuts or eggs, you might consider using the bounding box or polygon tools to annotate the images. The bounding box tool allows you to draw new bounding-box annotations, and the polygon tool allows you to draw new polygonal annotations source.
However, to classify objects by sizes or diameter, you would likely need to implement additional logic in your model training process to interpret the size of the bounding box or polygon as representing the size of the object.
Please note that these are general suggestions based on the information provided in the extracts. For more specific advice tailored to your project, you might consider reaching out to Roboflow’s support or community forum.
Hi! I’m Roboflow’s AI powered bot. This is an automated response powered using AI with knowledge from our docs, blog, knowledge base, and previous forum responses. If you find this to be a helpful answer, please mark it as the solution using the checkbox icon below. If this doesn’t solve your issue, please follow up with what you’ve tried/why, and the community will continue to chime in as usual.
Whether using object detection or instance segmentation depends on the project. If the circular object detection you’re looking for is something you want rendered fast, maybe many times per second I would recommend using an object-detection model built on a large and well annotated dataset with a mAP score of 79% or greater. If the project you’re working on is process on submit, or if you’re in a situation where sacrificing speed for accuracy is acceptable, then Instance-Segmentation is what you’ll want. Whenever you see defined annotations via smart polygon that’s gonna be instance segmentation. Its highly accurate however at a tradeoff being that it takes more time to process.
If you want to label fast with smart polygon would be using Model-Assisted Labeling - Roboflow Docs. In that way you’ll use a pretrained model to speed up your labeling. Which is far faster than doing it by yourself especially for large and accurate models.
Speed of prediction is not important in this use case, accuracy is.
Based on this, per your reasoning, using this type of bounding circle to classify sizes should be better than the bounding box. Is that incorrect?
Yes, if speed is not an important factor and you’re looking for accuracy of shape, this should be an instance segmentation project! Also, I am curious as to why you have two classes for the same type of object, did you mean to do that?
Thanks for asking. The two classes represent walnuts of 2 different sizes (32mm and 30mm). Our goal is to train a model to identify different sizes of the walnuts.
These walnuts have been calibrated manually using a measuring device in order for labelling the dataset.
Does this approach not make sense?
I think it does, and making the model shouldn’t be too hard… however I will say, since you’re working with measurements you bring in a new factor, scale. Which means you are likely gonna want to standardize the process! Making a model that can tell the difference between objects is easy enough, but now you’re saying: “these are both walnuts, which one is 32 and which one is 30.” If these images aren’t standardized the model is gonna have trouble determining that. Say, if we have one image top down 22 inches away, and one 19 inches away, these measurements are now different from each other. If you aren’t already, training this model and using it you’re probably gonna want to create a standardized way of sending and training these images at a set distance from the walnuts, say top down 20 inches away at high resolution so the model can accurately determine everything it needs.
That’s right @Alejandro_Bermudez.
The vertical distance between the image capture device and tray is kept constant with this setup and flash on.
The tray is placed in 3 positions:
Center (cross hairs at the center of tray)
Up (cross hairs 2 rows up)
Down (cross hairs 2 rows down)
Of course, the exact distance from the camera to the different walnut units (especially outer rows) varies a bit in the 3 positions.
We are capturing images in the 3 positions to view the walnuts in the outer rows more vertically. Are 3 tray positions sub optimal in terms of size classification?
Would you recommend sticking to just the centered cross hair position?
Many thanks for your guidance.
@kedar thats great! it looks like this is already well thought out. Also, im curious as to why you went with 3 different slightly different angles, is there a strategy to that?
Oh well, it is still in test so let’s see if it helps.
The image captured at 3072X4096, 12.58MP resolution, tall, 4:3
The tray is moved up and down from center to view the units in outer rows without overlap for larger sizes of walnuts. center view + 2 positions displaced vertically from center are sufficient to view all units without overlap.
What challenges do you see with this approach?
The only challenges I see with this approach is deviation from the normalization of data. If you can fit all the walnuts into one picture without distortion, why not just do that? without moving the board you dont introduce any more variables than necessary. Unless im understanding wrong, you wrote :
Does this mean with center view, you don’t get all the walnuts in one frame?
@happypentester Thanks for joining in to help.
This is an image of walnuts (with cross hairs centered) calibrated of shorter diameter 30mm manually with calibrator, as seen above tray.
For larger sizes of walnuts, say 36mm, 38mm the walnuts touch each other.
Is your recommendation to take multiple images with different positions of the walnuts, all with cross hairs centered, to keep focal lengths consistent?
It might be better to instead make a general walnut class focusing on precision, then taking the return from the instance segmentation and creating a pixel to millimeter conversion rate, then measuring via the return from your instance segmentation model. Thoughts?
Hey @happypentester and @Alejandro_Bermudez, this is where we are at the moment after some testing with the walnuts in tray and without tray on the table.
The walnuts in tray approach does not work well especially when the walnuts are of 34+mm in diameter. Hence, we’re going with the walnuts on the table as the max. width of the walnut is visibly different for the different sizes of walnuts. 2-4mm difference in the width of a walnut at 45cm focal length.
see example of 28mm (left) and 32mm (right) walnuts in this image.
As pre-processing for better edge detection, we’re trying grayscale and histogram equalization.
The first model trained on about 1500 samples of 28 and 32 sizes is a bit more biased towards 28s , but does detect larger sizes as 32 well.
Does continuing this approach with a few thousand more samples of different sizes of walnuts seem futile to you?
What kind of pre-processing will help the model learn to identify shape differences better?
Thanks for your patience with my response.
So yes, for this job I think greyscaling your images would help, I also think that as you asked for helping pre-processing, most models take square images, most common being 640x640 (i think) according to Lennychat, Roboflow’s documentation AI “Altering an image to be a square calls for either stretching its dimensions to fit to be a square or keeping its aspect ratio constant and filling in newly created “dead space” with new pixels. source”
So in summary, I suggest:
- Greyscale images
- Taking square pictures and training on square pictures
I also want to suggest instance segmentation and measuring on the return predictions from the instance segmentations converted to a standardized distance measurement. That way you wont need classes ever and can always get a very precise estimation of each walnut.