Interpreting YOLOv8->TFlite output

We’ve trained a YOLOv8n model for a single class (Cone) and image size 1920 and converted it to a fully quantized TFlite model to run on a Coral Edge TPU. When running the TFlite model using the tensorflow python library, the output is an array of dimensions 1x5x75600. How do we interpret these results into a collection of bounding boxes? And how do we set a custom confidence threshold?

Hi @bababooey1234 !

The process of taking the output of a model and getting something meaningful (like a set of bounding boxes) is sometimes referred to as “decoding” the model output. For YOLOv8, you have a couple options:

  1. You can upload your weights to Roboflow then use the hosted endpoint to run your model. We take care of the decode for you and give you some easy-to-parse json. Here’s the docs on that: Upload Weights - Roboflow

  2. You can tackle the decode on your own. The output of the model is an array of candidate detections with dimensions ( batches, 4 + num_classes, num_candidate_detections ). Each candidate detection is made up of (xc,yy,w,h, class_1_conf, class_2_conf,...,class_N_conf) where xc,yc is the center point of the candidate detection and w,h are the width and height of the candidate detection. class_N_conf is the confidence that this box belongs to the Nth class. In your case, you’ll only have a single confidence. This confidence can be considered the detection confidence when you are applying a confidence threshold. To get a set of meaningful bounding boxes, you’ll need to run all of your candidate detections through Non-Maximal Suppression (NMS), which is the process of deduplicating overlapping candidate detections in favor of the most confident detection.

Hope that helps!

Hi, I need to use the same procedure. I’ve don’t know what is the correct python code to call and use the TFLite model in python . Is possibile to share your code? Thank’s.

@bababooey1234

Hello, on tensorflow website there is a sample of code :

Another example if the model doesn't have SignatureDefs defined.

import numpy as np
import tensorflow as tf

# Load the TFLite model and allocate tensors.
interpreter = tf.lite.Interpreter(model_path="converted_model.tflite")
interpreter.allocate_tensors()

# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

# Test the model on random input data.
input_shape = input_details[0]['shape']
input_data = np.array(np.random.random_sample(input_shape), dtype=np.float32)
interpreter.set_tensor(input_details[0]['index'], input_data)

interpreter.invoke()

# The function `get_tensor()` returns a copy of the tensor data.
# Use `tensor()` in order to get a pointer to the tensor.
output_data = interpreter.get_tensor(output_details[0]['index'])
print(output_data)


You just have to load input data according your model : shape and dtype

you can check it here : https://www.tensorflow.org/lite/guide/inference?hl=en#load_and_run_a_model_in_python

Hi,
Thanks for the explanation and it works for bbox predictions.
but, how about segmentation models.
I have trained a model “YOLOV8s-seg” converted to tflite and the outputs are like:
[1, 160, 160, 32] from output_details[1]['index'] which are supposed to be Mask protos
and
[1,40,8400] fro output_details[0]['index'] which are supposed to be Coordinates of detected objects, class labels, and confidence score

I have only 4 classes in my dataset
and the model works perfectly fine before conversion.
but after conversion, I am lost in how to make sense of these dimensions.

You’re on the right track! With YOLOv8 instance segmentation, each prediction (each row of the [1,40,8400] output) has dimensions [num_batch, 4 + num_classes + num_masks, num_candidate_detections]. For YOLOv8 you can see that num_masks is 32, which matches up with the last dimension of the mask protos ouptut. For each detection you’ll want to do some matrix multiplication to combine the mask prediction with the mask protos to compute the mask. See this code in the ultralytics repo: ultralytics/ops.py at 30fc4b537ff1d9b115bc1558884f6bc2696a282c · ultralytics/ultralytics · GitHub

Note, you’ll also need to ensure your NMS function can handle the extra mask dimensions on you predictions array.

1 Like

@Paul Thank you for your prompt response. I now have better understand and I know I have to do some matrix multiplications, And I was going to use the same function which you shared, But in order to apply nms, I need to know the coordinates of bbox and their confidence score and class labels,
If i look at some of the values of matrix with dimension: [1,40,8400]

print((output_data[0][0][0:4]))
print((output_data[0][0][4:6]))
print((output_data[0][5][0:4]))
print((output_data[0][5][4:6]))
print((output_data[0][10][0:4]))
print((output_data[0][10][4:6]))

the output values of something like this:

[     3.5684      9.4033      16.021      20.085]
[     24.542      31.756]
[ 2.1164e-06  1.3497e-06  1.1561e-06  1.3197e-06]
[ 1.2169e-06  1.3582e-06]
[   -0.55169    -0.28594    0.067442     0.23666]
[    0.19009     0.08357]

and for 40 rows i have 8400 values,
and 160x160x32 does not fit into this calculation, unless i am aware of droping some indexes and keeping some,
I hope my question is making sense and i am not making some naive mistake,
Thank you for your patiance and time.
And i am also not sure if the values in 1x160x160x32 are of mask values or something else, as far as i am aware, they should be binary mask values but they are also float values, and not between 0 and 1:

PS: for the matrix 1x40x8400 the values do range between (<640) which is the dimension of input image so they may be the coordinates of the bbox (but in which order, is it xyxy, or xywh or something else and what are the labels for those boxes if they are indeed boxes), but what about index 4,5,6,7 where values are less than zero.

Hi @Paul Thank you for your prompt and helpful response, It makes more sense now,
but still I am not able to understand the structure of output, I am aware I need to do some resizing for matrix multiplications etc, But before that I need NMS and for nms I need to know which value belongs to what.
e.g from 1x40x8400, I have 40 rows and 8400 candidate labels, but these labels are in what order, is it the same like bbox (which I doubt as the values do not look like that)
e.g:

print((output_data[0][0][0:4]))
print((output_data[0][0][4:6]))
print((output_data[0][5][0:4]))
print((output_data[0][5][4:6]))
print((output_data[0][10][0:4]))
print((output_data[0][10][4:6]))

will show the values something like

[     3.5684      9.4033      16.021      20.085]
[     24.542      31.756]
[ 2.1164e-06  1.3497e-06  1.1561e-06  1.3197e-06]
[ 1.2169e-06  1.3582e-06]
[   -0.55169    -0.28594    0.067442     0.23666]
[    0.19009     0.08357]

so for index 4,5,6,7 the values are less than one and the rest all the values on all indexes are less than 637 which makes sens that these may be the coordinate of the input image which is 640x640. But in which order and where is the class label of these values? and what is confidence?
if I don’t have this information, I am not sure how to proceed from here.
Thank you for your patience and time.

Hi @rsadiq, it sounds like this is the ordering/formatting

And this is what I’m seeing for the function definition linked by Paul:

def process_mask_native(protos, masks_in, bboxes, shape):
    """
    It takes the output of the mask head, and crops it after upsampling to the bounding boxes.
    Args:
      protos (torch.Tensor): [mask_dim, mask_h, mask_w]
      masks_in (torch.Tensor): [n, mask_dim], n is number of masks after nms
      bboxes (torch.Tensor): [n, 4], n is number of masks after nms
      shape (tuple): the size of the input image (h,w)
    Returns:
      masks (torch.Tensor): The returned masks with dimensions [h, w, n]
    """

And this may help for NMS: How to code Non-Maximum Suppression (NMS) in plain NumPy

It does sound like that, i agree. But if I look at the values they don’t add up.
I am sorry but I couldn’t get any documentation or help to reorganize tflite output.

num_masks = 32
num_classes = 4
num_predictions = 8400

output2 = np.reshape(output2, (num_predictions, 4 + num_classes + num_masks))  # reshape to [num_predictions, 4 + num_classes + num_masks]
boxes = output2[:, :4]
scores = output2[:, 4:5]
classes = output2[:, 5:5+num_classes]
masks = output2[:, 5+num_classes:]

print("BOX: ",boxes[:1])
print("SCORE: ",scores[:1])
print("CLASS: ",classes[:1])

This should be good right if I am properly ordering,
I should have boxes with their scores and class labels?
But the values in labels and scores are just not making sense. Do I need to apply any other formatting which I am missing ?

BOX:  [[     3.5684      9.4033      16.021      20.085]]
SCORE:  [[     24.542]]
CLASS:  [[     31.756      33.959      37.623      39.702]]

Score of 24.542 ? and class labels of above 30? If the class labels are somehow in 0-1 range, i would apply argmax, but here they just seem like coordinates of bbox, all of the numbers in some order.

It seems like you may have a flipped dimension. If the output you are seeing is out.shape=[1,40,8400], then the first candidate detection would be out[0,:,0] which would have shape out[0,:,0].shape = [1,40,1]. The 40 elements are [xc, yc, w, h, c1, c2, c3, c4, m1, m2, ..., m32]. So, you could reuse the NMS that was working for you previously by passing out[:,:8,:]. If you do this, you need to keep track of which indices made it through NMS so you can match up the 32 element mask vectors. Alternatively, you can update your NMS function to handle the larger input vectors and essentially ignore the extra 32 elements (but keep them around so that you can compute the masks for each prediction after NMS).

1 Like

@Paul thanks man.
Its more than helpful. Appreciate it.