Error When training YOLOv7 on own dataset

Sean_TAY · October 23, 2022, 7:39am

Please share the following so we may better assist you:

Screen shot of your error
image|690x387
File “train.py”, line 616, in
train(hyp, opt, device, tb_writer)
File “train.py”, line 372, in train
scaler.scale(loss).backward()
File “/usr/local/lib/python3.7/dist-packages/torch/_tensor.py”, line 396, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File “/usr/local/lib/python3.7/dist-packages/torch/autograd/init.py”, line 175, in backward
allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to run the backward pass
RuntimeError: CUDA out of memory. Tried to allocate 4.28 GiB (GPU 0; 14.76 GiB total capacity; 4.28 GiB already allocated; 4.28 GiB free; 9.15 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Still finding a way to debug. Pls, any expert can help me to solve this issue. Strange is when I edit the batch number from 16 to 12 then it can train as well.

Topic		Replies	Views
Training Failed Community Help bugs , feature-request	2	272	October 18, 2023
How to Train YOLOv7 on a Custom Dataset Community Help	2	944	December 21, 2022
This training job did not complete successfully Feedback	0	18	March 9, 2025
Problem with training Scaled-YOLOv4 Pytorch on custom dataset tutorial Community Help bugs	1	682	April 19, 2023
YoloV11 training failed Community Help	4	86	March 25, 2025