Vision Fine tune Model-Answer Structure Unclear

tarek_vs · January 28, 2025, 7:34am

Hello,
so I am currently in the process of vision fine tuning the GPT-4o model. Regarding this How-To Blog (How to Fine-Tune GPT-4o for Object Detection) it is not clear why this approach is used

. Specifically I do not understand the multiplication by 1024. Unless I missed it, it is neither explained in the video nor the blog. My assumption would be the images are formatted to 1024x1024 internally? Are there any sources on this? Also regarding the jsonl, wouldn’t we have to mention the steps we are taking (normalization followed by * 1024) in order for the system to learn? Or is it able to deduce these steps. It is quite unclear for me and I am fine tuning for my thesis. The roboflow app is very helpful however I need a scientific bases on why we take these steps

system · February 18, 2025, 7:35am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to convert json to yolov4 format(.txt)? Community Help feature-request , formats , convert , research	1	328	October 12, 2022
Roboflow API Inference Error Community Help	10	2419	June 23, 2024
Object detection training error Feedback bugs	0	31	September 15, 2024
Error deploying custom trained model on NVIDIA Jetson Xavier Community Help	25	1945	April 17, 2024
Could Someone Give me Advice with Fine-Tuning a Model Using Roboflow? Community Help bugs , feature-request , export	2	322	December 16, 2024

Vision Fine tune Model-Answer Structure Unclear

Related topics