Hello,
I have trained YOLOv8m on a custom dataset with 5 classes obtaining quite good results. Afterwards I have tried to convert this model to TFLite. I have converted it and created my detection script. I have implemented the preprocessing in the following manner:
def preprocess(img):
# Letterbox
img = letterbox(img, (640, 640))
# BGR to RGB
img = img[:, :, ::-1]
img = img / 255.
# Expand dims for batch
img = np.expand_dims(img, 0)
return img
def letterbox(img, target_size):
ih, iw = img.shape[:2]
w, h = target_size
scale = min(w / iw, h / ih)
nw = int(iw * scale)
nh = int(ih * scale)
image_resized = cv2.resize(img, (nw, nh))
image_padded = np.full(shape=[h, w, 3], fill_value=128.0)
dw = int((w - nw) / 2)
dh = int((h - nh) / 2)
image_padded[dh:nh + dh, dw:nw + dw, :] = image_resized
return image_padded
The main function loads the model, reads an image, process the image, feeds it to the model, and runs the inference:
interpreter = tf.lite.Interpreter(model_path="yolov8_last_float32.tflite")
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
# Load image
img = cv2.imread("WIN_20230719_13_10_23_Pro.jpg")
# Preprocess
img_input = preprocess(img)
# Set input tensor
img_input = img_input.astype(np.float32)
interpreter.set_tensor(input_details[0]['index'], img_input)
# Run inference
interpreter.invoke()
output_details = interpreter.get_output_details()
# Get predictions
output = interpreter.get_tensor(output_details[0]['index'])
output = output[0].transpose()
boxes = output[:, :4]
scores = output[:, 4:]
format_boxes = np.zeros(shape=boxes.shape)
format_boxes[:, 0] = boxes[:, 0] - (boxes[:, 2] / 2)
format_boxes[:, 1] = boxes[:, 1] - (boxes[:, 3] / 2)
format_boxes[:, 2] = boxes[:, 0] + (boxes[:, 2] / 2)
format_boxes[:, 3] = boxes[:, 1] + (boxes[:, 3] / 2)
format_boxes = format_boxes.astype(np.int32)
nms_boxes, scores = non_max_suppression(format_boxes, scores, 0.5)
I also process the output in order to transform it from the bounding box center point, width and height format into bounding box top left and bottom right point format. I also apply non-maximum suppression.
The problem is the score values are very low:
These are the values for each prediction after applying NMS.
Any ideas what am I doing wrong?