Hello, we are testing your platform and we got the following error while doing the training:
CUDA out of memory. Tried to allocate 7.96 GiB (GPU 0; 21.99 GiB total capacity; 12.54 GiB already allocated; 4.94 GiB free; 16.71 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Traceback (most recent call last):
File "/app/run_and_catch_error.py", line 11, in <module>
runpy._run_module_as_main(args.module)
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/app/yolov8_object_detection_config.py", line 10, in <module>
main()
File "/app/yolov8_object_detection_config.py", line 6, in main
trainer.monitored_train()
File "/app/src/abstract_monitored_trainer.py", line 34, in monitored_train
raise self.exc
File "/app/src/abstract_monitored_trainer.py", line 40, in monitor_train
self.train()
File "/app/src/yolov8/base.py", line 286, in train
self.model.train(
File "/usr/local/lib/python3.8/dist-packages/ultralytics/yolo/engine/model.py", line 373, in train
self.trainer.train()
File "/usr/local/lib/python3.8/dist-packages/ultralytics/yolo/engine/trainer.py", line 192, in train
self._do_train(world_size)
File "/usr/local/lib/python3.8/dist-packages/ultralytics/yolo/engine/trainer.py", line 332, in _do_train
self.loss, self.loss_items = self.model(batch)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/ultralytics/nn/tasks.py", line 44, in forward
return self.loss(x, *args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/ultralytics/nn/tasks.py", line 215, in loss
return self.criterion(preds, batch)
File "/usr/local/lib/python3.8/dist-packages/ultralytics/yolo/utils/loss.py", line 179, in __call__
_, target_bboxes, target_scores, fg_mask, _ = self.assigner(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/ultralytics/yolo/utils/tal.py", line 112, in forward
mask_pos, align_metric, overlaps = self.get_pos_mask(pd_scores, pd_bboxes, gt_labels, gt_bboxes, anc_points,
File "/usr/local/lib/python3.8/dist-packages/ultralytics/yolo/utils/tal.py", line 131, in get_pos_mask
mask_in_gts = select_candidates_in_gts(anc_points, gt_bboxes)
File "/usr/local/lib/python3.8/dist-packages/ultralytics/yolo/utils/tal.py", line 24, in select_candidates_in_gts
bbox_deltas = torch.cat((xy_centers[None] - lt, rb - xy_centers[None]), dim=2).view(bs, n_boxes, n_anchors, -1)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 7.96 GiB (GPU 0; 21.99 GiB total capacity; 12.54 GiB already allocated; 4.94 GiB free; 16.71 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF