Isolated object searching

Hi,

I would like to find objects that are isolated from other similar objects. For instance, consider bananas in these two images.

separate
bunch

I would like to find bananas in environments where they are not bunched up as in the first image. If I make my bounding boxes as in this image (where the box is not tightly fitted to the banana), will yolov5 fine bananas only in the first image and not the second image?

Thanks.

Hi Skipper,

Your model will learn the shape and color of the banana, so it may still identify individual bananas within the “bunched up” bananas.

One thing that may work is if you label groups of bananas as “bunched up bananas”, and single bananas as “banana,” for example. You will want to ensure you are completely consistent with this labeling convention on all images, and label any and all “bunched up” bananas or single bananas within the images.

Mohamed,

Your thought is excellent. Thank you. Would you have any thought on how oversized the bounding box should be?

Skipper

Hi Skipper,

I do not recommend creating oversized bounding boxes. You want to create bounding boxes as tightly around the subject you are labeling as possible.

Here is our labeling guide on Object Detection: Labeling Guide: Object Detection - Roboflow

Mohamed,

Slightly confused on your suggestion based upon the 3rd image I showed. This image had a banana with an oversized bounding box. My thought if there were no bananas between the centered banana and the bounding box, that would represent an “isolated/single” banana. If there were other bananas parts included in the surrounding region, one would have a “bunched banana”.

Having two classes as you suggested sounds good but then having a tightly bounded bounding box for both types seems like not providing a good discriminator between the two cases. Your thoughts?

What typically happens when the bounding box is oversized?

Thanks.

When the bounding box is oversized, the model learns more features, as the pixels within the bounding box are the ones you want the model to focus on learning.

You want your model to differentiate a single banana, from a group of bananas - so label the groups of bananas as one class, and single bananas as single bananas (label them all how you would want your model to distinguish them).

Tight bounding boxes are the way to go in object detection. The labeling guide I linked above also details this. Your “discriminator” is what is within the bounding box: multiple bananas vs. one banana.

I’m also not sure that this will even work properly based on the sample images you provided above. This is a difficult problem to solve. Your model is likely going to have a lot of False Positives since you are working with all bananas.

Mohamed,

Good food for thought. Yes, if the bounding box is larger than the absolute minimum, the algorithm will be learning about those “extra” pixels which may or may not be good or useful. It appears that there are always extra pixels with rectangular bounding boxes. These extra pixels can only be eliminated by have a bounding “contour” (follows the perimeter of the object).

My object(s) of interest are not bananas but I found pictures of bananas that illustrated my concern. I could have chosen bicycles being rode or in a bike stand. It is those “extra” pixels that might differentiate where the bike is. If those pixels represent pavement, then the bike might be being rode. If those pixels represent parts of other bikes or parts of a bike stand, then the bike might not be being rode.

If there are too many “extra pixels”, then that feature could be dominant. An image with no bike but with pavement may be mistaken to have a single bike being rode.

I have reviewed roboflow’s posted labeling tips. They are useful but I did not feel they addressed my concerns completely. I have a limited number of images with objects of interest (~4000), typically one object per image. One has to be careful not to have too many classes with a limited number of images. It appears that I will have to do a study with 500 or so images to get a better appreciation of what I should call an “object” and its bounding box size for my use case.

Thanks.