We’ve trained a YOLOv8n model for a single class (Cone) and image size 1920 and converted it to a fully quantized TFlite model to run on a Coral Edge TPU. When running the TFlite model using the tensorflow python library, the output is an array of dimensions 1x5x75600. How do we interpret these results into a collection of bounding boxes? And how do we set a custom confidence threshold?
Hi @bababooey1234 !
The process of taking the output of a model and getting something meaningful (like a set of bounding boxes) is sometimes referred to as “decoding” the model output. For YOLOv8, you have a couple options:

You can upload your weights to Roboflow then use the hosted endpoint to run your model. We take care of the decode for you and give you some easytoparse json. Here’s the docs on that: Upload Weights  Roboflow

You can tackle the decode on your own. The output of the model is an array of candidate detections with dimensions
( batches, 4 + num_classes, num_candidate_detections )
. Each candidate detection is made up of(xc,yy,w,h, class_1_conf, class_2_conf,...,class_N_conf)
wherexc,yc
is the center point of the candidate detection andw,h
are the width and height of the candidate detection.class_N_conf
is the confidence that this box belongs to the Nth class. In your case, you’ll only have a single confidence. This confidence can be considered the detection confidence when you are applying a confidence threshold. To get a set of meaningful bounding boxes, you’ll need to run all of your candidate detections through NonMaximal Suppression (NMS), which is the process of deduplicating overlapping candidate detections in favor of the most confident detection.
Hope that helps!
Hi, I need to use the same procedure. I’ve don’t know what is the correct python code to call and use the TFLite model in python . Is possibile to share your code? Thank’s.
Hello, on tensorflow website there is a sample of code :
Another example if the model doesn't have SignatureDefs defined.
import numpy as np
import tensorflow as tf
# Load the TFLite model and allocate tensors.
interpreter = tf.lite.Interpreter(model_path="converted_model.tflite")
interpreter.allocate_tensors()
# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
# Test the model on random input data.
input_shape = input_details[0]['shape']
input_data = np.array(np.random.random_sample(input_shape), dtype=np.float32)
interpreter.set_tensor(input_details[0]['index'], input_data)
interpreter.invoke()
# The function `get_tensor()` returns a copy of the tensor data.
# Use `tensor()` in order to get a pointer to the tensor.
output_data = interpreter.get_tensor(output_details[0]['index'])
print(output_data)
You just have to load input data according your model : shape and dtype
you can check it here : https://www.tensorflow.org/lite/guide/inference?hl=en#load_and_run_a_model_in_python
Hi,
Thanks for the explanation and it works for bbox predictions.
but, how about segmentation models.
I have trained a model “YOLOV8sseg” converted to tflite
and the outputs are like:
[1, 160, 160, 32] from output_details[1]['index'] which are supposed to be Mask protos
and
[1,40,8400] fro output_details[0]['index'] which are supposed to be Coordinates of detected objects, class labels, and confidence score
I have only 4 classes in my dataset
and the model works perfectly fine before conversion.
but after conversion, I am lost in how to make sense of these dimensions.
You’re on the right track! With YOLOv8 instance segmentation, each prediction (each row of the [1,40,8400] output) has dimensions [num_batch, 4 + num_classes + num_masks, num_candidate_detections]. For YOLOv8 you can see that num_masks is 32, which matches up with the last dimension of the mask protos ouptut. For each detection you’ll want to do some matrix multiplication to combine the mask prediction with the mask protos to compute the mask. See this code in the ultralytics repo: ultralytics/ops.py at 30fc4b537ff1d9b115bc1558884f6bc2696a282c · ultralytics/ultralytics · GitHub
Note, you’ll also need to ensure your NMS function can handle the extra mask dimensions on you predictions array.
@Paul Thank you for your prompt response. I now have better understand and I know I have to do some matrix multiplications, And I was going to use the same function which you shared, But in order to apply nms, I need to know the coordinates of bbox and their confidence score and class labels,
If i look at some of the values of matrix with dimension: [1,40,8400]
print((output_data[0][0][0:4]))
print((output_data[0][0][4:6]))
print((output_data[0][5][0:4]))
print((output_data[0][5][4:6]))
print((output_data[0][10][0:4]))
print((output_data[0][10][4:6]))
the output values of something like this:
[ 3.5684 9.4033 16.021 20.085]
[ 24.542 31.756]
[ 2.1164e06 1.3497e06 1.1561e06 1.3197e06]
[ 1.2169e06 1.3582e06]
[ 0.55169 0.28594 0.067442 0.23666]
[ 0.19009 0.08357]
and for 40 rows i have 8400 values,
and 160x160x32 does not fit into this calculation, unless i am aware of droping some indexes and keeping some,
I hope my question is making sense and i am not making some naive mistake,
Thank you for your patiance and time.
And i am also not sure if the values in 1x160x160x32 are of mask values or something else, as far as i am aware, they should be binary mask values but they are also float values, and not between 0 and 1:
PS: for the matrix 1x40x8400 the values do range between (<640) which is the dimension of input image so they may be the coordinates of the bbox (but in which order, is it xyxy, or xywh or something else and what are the labels for those boxes if they are indeed boxes), but what about index 4,5,6,7 where values are less than zero.
Hi @Paul Thank you for your prompt and helpful response, It makes more sense now,
but still I am not able to understand the structure of output, I am aware I need to do some resizing for matrix multiplications etc, But before that I need NMS and for nms I need to know which value belongs to what.
e.g from 1x40x8400, I have 40 rows and 8400 candidate labels, but these labels are in what order, is it the same like bbox (which I doubt as the values do not look like that)
e.g:
print((output_data[0][0][0:4]))
print((output_data[0][0][4:6]))
print((output_data[0][5][0:4]))
print((output_data[0][5][4:6]))
print((output_data[0][10][0:4]))
print((output_data[0][10][4:6]))
will show the values something like
[ 3.5684 9.4033 16.021 20.085]
[ 24.542 31.756]
[ 2.1164e06 1.3497e06 1.1561e06 1.3197e06]
[ 1.2169e06 1.3582e06]
[ 0.55169 0.28594 0.067442 0.23666]
[ 0.19009 0.08357]
so for index 4,5,6,7 the values are less than one and the rest all the values on all indexes are less than 637 which makes sens that these may be the coordinate of the input image which is 640x640. But in which order and where is the class label of these values? and what is confidence?
if I don’t have this information, I am not sure how to proceed from here.
Thank you for your patience and time.
And this is what I’m seeing for the function definition linked by Paul:
def process_mask_native(protos, masks_in, bboxes, shape):
"""
It takes the output of the mask head, and crops it after upsampling to the bounding boxes.
Args:
protos (torch.Tensor): [mask_dim, mask_h, mask_w]
masks_in (torch.Tensor): [n, mask_dim], n is number of masks after nms
bboxes (torch.Tensor): [n, 4], n is number of masks after nms
shape (tuple): the size of the input image (h,w)
Returns:
masks (torch.Tensor): The returned masks with dimensions [h, w, n]
"""
And this may help for NMS: How to code NonMaximum Suppression (NMS) in plain NumPy
It does sound like that, i agree. But if I look at the values they don’t add up.
I am sorry but I couldn’t get any documentation or help to reorganize tflite output.
num_masks = 32
num_classes = 4
num_predictions = 8400
output2 = np.reshape(output2, (num_predictions, 4 + num_classes + num_masks)) # reshape to [num_predictions, 4 + num_classes + num_masks]
boxes = output2[:, :4]
scores = output2[:, 4:5]
classes = output2[:, 5:5+num_classes]
masks = output2[:, 5+num_classes:]
print("BOX: ",boxes[:1])
print("SCORE: ",scores[:1])
print("CLASS: ",classes[:1])
This should be good right if I am properly ordering,
I should have boxes with their scores and class labels?
But the values in labels and scores are just not making sense. Do I need to apply any other formatting which I am missing ?
BOX: [[ 3.5684 9.4033 16.021 20.085]]
SCORE: [[ 24.542]]
CLASS: [[ 31.756 33.959 37.623 39.702]]
Score of 24.542 ? and class labels of above 30? If the class labels are somehow in 01 range, i would apply argmax, but here they just seem like coordinates of bbox, all of the numbers in some order.
It seems like you may have a flipped dimension. If the output you are seeing is out.shape=[1,40,8400]
, then the first candidate detection would be out[0,:,0]
which would have shape out[0,:,0].shape = [1,40,1]
. The 40 elements are [xc, yc, w, h, c1, c2, c3, c4, m1, m2, ..., m32]
. So, you could reuse the NMS that was working for you previously by passing out[:,:8,:]
. If you do this, you need to keep track of which indices made it through NMS so you can match up the 32 element mask vectors. Alternatively, you can update your NMS function to handle the larger input vectors and essentially ignore the extra 32 elements (but keep them around so that you can compute the masks for each prediction after NMS).