Interpreting YOLOv8->TFlite output

Ciao @Paul ! Nice to meet you!

In the picture you attached, is the box the green part near the top right of the image? And is most of the green just the label for that small box?
yes, exactly.

I’ve finally obtained the right coordinates.
The issue was that I was not transposing results.
Basically, the dimension of my output tensor is [1, 6, 3456]:

  • 1 is the batch size;
  • 6 is the the data I need to process [0] => xc, [1] => yc, [2] => width, [3] => height. I suppose that’s why the dimension is 4 + classes, correct? Because, in my case the remaining two, were scores.
  • is the detection output for the previous fields.

By reading properly data, the bbox is now fine, even labels :slight_smile:
But honestly, your initial hints have been essential.

Ahh good catch! I see what you mean. I misread your point #2. Glad you got it working! This is another common gotcha, especially when upgrading to YOLOv8 from YOLOv5.

1 Like

Hi, i have trained a yolov8m-seg model and converted it to tflite.I want to run it on mobile device. i read this thread and it partially helping me understanding the output of the model but still im not able to draw bounding box nor extract mask from the output.

below are the input and output details

Input details:

[{'name': 'inputs_0', 'index': 0, 'shape': array([  1, 640, 640,   3], dtype=int32), 'shape_signature': array([  1, 640, 640,   3], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}]

Output details:

[{'name': 'Identity', 'index': 583, 'shape': array([   1,   37, 8400], dtype=int32), 'shape_signature': array([   1,   37, 8400], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}, {'name': 'Identity_1', 'index': 436, 'shape': array([  1, 160, 160,  32], dtype=int32), 'shape_signature': array([  1, 160, 160,  32], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}]

following code is to draw bounding box on image.


# Load the TFLite model
interpreter = tf.lite.Interpreter(model_path='best_float32.tflite')
interpreter.allocate_tensors()

# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

image = cv2.imread("21.png")
image = cv2.resize(image, (640,640))
cv2_imshow(image)

image = np.array(image).astype('float32') / 255.0

# Set input tensor
input_shape = input_details[0]['shape']
input_data = np.array(image.reshape(input_shape), dtype=np.float32)
interpreter.set_tensor(input_details[0]['index'], input_data)

# Run inference
interpreter.invoke()

# # Get the output tensor values
output_data1 = interpreter.get_tensor(output_details[0]['index'])
output_data2 = interpreter.get_tensor(output_details[1]['index'])  

output_tensor = output_data1.reshape(8400,37)  # Reshape the tensor based on the output shape

H = image.shape[0]
W = image.shape[1]
for i in range(len(output_tensor)): 
    xPos,yPos,w,h=output_tensor[i][0:4]
    scores=output_tensor[i][4:5]
    if scores>0 and scores<1:
        xmin = int(max(1, xPos - w / 2)) 
        ymin = int(max(1, yPos - h / 2)) 
        xmax = int(min(H, xPos + w / 2))
        ymax = int(min(W,yPos + h / 2)) 
        cv2.rectangle(image, (xmin,ymin), (xmax,ymax), (0, 255, 0), 2)
cv2_imshow(image)
 

output image:

output

hi @Paul I hope this message finds you well. I’m reaching out regarding a post I made few days ago that unfortunately hasn’t received any replies yet. I’m still in need of help and would greatly appreciate it if someone could take a look and provide some insights or suggestions.

Hey @Noaman_Anwaar ! The previous posts on this thread were specific to object detection. For instance segmentation, the output is a bit different since the model is also predicting masks for each detected object. Here’s the function in Ultralytics where the output id decoded for yolov8 instance segmentation: ultralytics/ultralytics/yolo/v8/segment/predict.py at 137552996a0aaeb90fa0c1d6da40f012f8801a00 · ultralytics/ultralytics · GitHub

TLDR: There are two outputs from the model, we’ll call them preds and protos. The preds have shape something like [1,4 + num_classes + num_masks, 8400]. The protos have shape [1,160,160, num_masks]. The detection predictions are processed as described above in this thread, you just need to chop off the last num_mask elements of the prediction array so that is has shape [1,4 + num_classes, 8400]. Then, for each prediction, there is some matrix multiplication to be done between the mask predictions (last num_mask elements of the prediction array) and the protos output (which is a single value that applies to all predictions). This will result in a single mask for each prediction.

I hope this helps! I definitely encourage you to dig in to the link I posted above to better unsertand this decoding process!

Hello @kubermario . Are you able to share your code as i am having the same issue with my yolov8 model that i converted to tflite and i can’t find any proper resources on how to process the output properly.

Ciao Skillnoob,
Unfortunately I cannot share that portion of code, because I am not working any more for that company and I was obliged to delete past projects. If you share your maybe we could get some hints together trying to find the right and final solution to the issue you are experiencing.

Hey @kubermario,
Below you can find my current code for processing the output which i got from here. The way Tensorflow does it in the example seems to not work with yolov8 models. My model has a output shape of [1, 5, 3780].

output_data = interpreter.get_tensor(output_details[0]['index'])

results = np.squeeze(output_data)

top_k = results.argsort()[-5:][::-1]

for i in top_k:
   if floating_model:
      print('{:08.6f}: {}'.format(float(results[i]), "ball"))
   else:
      print('{:08.6f}: {}'.format(float(results[i] / 255.0), "ball"))