Hi, I was wondering if it would be possible to increase the max image upload size to 30MB, so I could use uncompressed pngs[26MB] for the training of my model project id: det-fly-jafoc-rubjv . The photos are 6244x4168px single channel [monochrome]. Also does roboflow support training yolo models using single channel images without converting them to 3 channel ones? Also how many credits do you expect training on 5000 of these images would use? Currently I’m developing this drone detection as a side project at work, but I’m “scared” of asking to boss to buy the $400 subscription, as the project is still in early stages with only 3 people working on it. However in the future we would need to label an train on about 200k images, as that’s what the competition is using.
Hi @Honza_Ferenc!
Thank you for your question! I just consulted with our team, it would be helpful to know the size of your objects of interest to recommend the best possible solution.
Yes of course, the targets are small drones typically ranging from 500px x 160px to 20px x 5 px. I though about using SAHI, but I’m not too familiar with it, but I can imagine the inference would be faster than one large model, as they don’t scale linearly with resolution - maybe its similiar to using a batch size higher than 1 so faster?, but I might be wrong. Were experimenting one the AGX orin, 1920x1920 inference runs at 30 ms [INT8] using the Yolo12s, and we only need a few frame per seconds, so there is some headroom for higher resolution, but now were running the inference on 3 channel RGB images, duplicated from our monochrome cameras single channel in real time, I measured it and just this operation takes 35ms running on the cpu, so we want to use just the original monochrome image stream. Yolo supports it natively since version 8.3.146(Release v8.3.146 - `ultralytics 8.3.146` New COCO8-Grayscale dataset (#20827) · ultralytics/ultralytics · GitHub)
Hi @Honza_Ferenc!
This is fantastic!! Apologies for the delayed response. SAHI is definitely the best option to detect small drones and maximize detection accuracy at long ranges or the edges. As far as latency, it won’t always be faster when using SAHI compared to one large model due to tiling overhead, but it is far more accurate on small objects in high res images.
I agree, using the original monochrome image stream is your best bet.
Today I tried running the model trained for 1920x1080px at 6224 x 4168 px and it seem to be performing ok ish, inference running at 90ms on the Agx Orin, the speed surprised me, might be worth trying to train the model at native resolution to examine the accuracy, if SAHI would be slow, I have to do more testing, tommorow I will try using SAHI. Anyways do you have any details from the team on if roboflow supports true single chanel monochrome images, I worry they might get converted to 3 channel ones, also what about the maximum file size, could it be made higher (30MB) or is the 20MB a software limitation? Attached is a timelapse where it tracks a small drone from 40-800m, running at 1920x1920, the tracker needs improvements, but the system will have a second more zoomed in camera, so it will have 4x the amount of pixels and autofocus. https://drive.google.com/file/d/17nzFPFgjfWwuwwVRGPqh_MLTxC9jWE-b/view?usp=drivesdk Thanks, Jan
Hi @Honza_Ferenc!
This is fantastic! Would love to hear how your testing of SAHI went.
Unfortunately, the max file size cannot be made any larger than 20MB. If you are using a custom model you can convert to single channel in the data loader but none of the models we use support single channel.