Training Custom Yolov4 detector fails due to CUDA error

Jonas_Mack · June 10, 2022, 5:07pm

Project type: Object Detection
The operating system & browser I am using: Windows 11, Firefox 101.0.1 (64 Bit)

I am following this Tutorial in Colabs called “Mobile Object Detection”: Google Colab

At Step 14

!./darknet detector train data/obj.data cfg/custom-yolov4-tiny-detector.cfg yolov4-tiny.conv.29 -dont_show -map

it produces following error

CUDA status Error: file: ./src/blas_kernels.cu : () : line: 841 : build time: Jun 10 2022 - 16:10:16
CUDA Error: no kernel image is available for execution on the device
CUDA Error: no kernel image is available for execution on the device: File exists
darknet: ./src/utils.c:325: error: Assertion `0’ failed.

I will attach the full output of step 14 at the very bottom of this post.

My only idea is that it might be caused by wrong configurations in step 7:

#install environment from the Makefile. Changes to mitigate CUDA error.
%cd darknet/
!sed -i ‘s/OPENCV=0/OPENCV=1/g’ Makefile
!sed -i ‘s/GPU=0/GPU=1/g’ Makefile
!sed -i ‘s/CUDNN=0/CUDNN=1/g’ Makefile
!sed -i “s/ARCH= -gencode arch=compute_75,code=sm_75/ARCH= -gencode arch=compute_${compute_capability},code=sm_${compute_capability}/g” Makefile
!make

I am using a Tesla T4 GPU, for which compute_75 and sm_75 should be used. I implemented this in the code as you can see, however in the output it says

nvcc -gencode arch=compute_60,code=sm_60

regardless, which is why I think this might be the issue. However I do not know why it still says compute_60 and sm_60.

Any help would be very much appreciated!

full output of step 14:

CUDA-version: 11010 (11020), cuDNN: 7.6.5, GPU count: 1
OpenCV version: 3.2.0
Prepare additional network for mAP calculation…
compute_capability = 750, cudnn_half = 0
net.optimized_memory = 0
mini_batch = 1, batch = 16, time_steps = 1, train = 0
layer filters size/strd(dil) input output
0 conv 32 3 x 3/ 2 416 x 416 x 3 → 208 x 208 x 32 0.075 BF
1 conv 64 3 x 3/ 2 208 x 208 x 32 → 104 x 104 x 64 0.399 BF
2 conv 64 3 x 3/ 1 104 x 104 x 64 → 104 x 104 x 64 0.797 BF
3 route 2 1/2 → 104 x 104 x 32
4 conv 32 3 x 3/ 1 104 x 104 x 32 → 104 x 104 x 32 0.199 BF
5 conv 32 3 x 3/ 1 104 x 104 x 32 → 104 x 104 x 32 0.199 BF
6 route 5 4 → 104 x 104 x 64
7 conv 64 1 x 1/ 1 104 x 104 x 64 → 104 x 104 x 64 0.089 BF
8 route 2 7 → 104 x 104 x 128
9 max 2x 2/ 2 104 x 104 x 128 → 52 x 52 x 128 0.001 BF
10 conv 128 3 x 3/ 1 52 x 52 x 128 → 52 x 52 x 128 0.797 BF
11 route 10 1/2 → 52 x 52 x 64
12 conv 64 3 x 3/ 1 52 x 52 x 64 → 52 x 52 x 64 0.199 BF
13 conv 64 3 x 3/ 1 52 x 52 x 64 → 52 x 52 x 64 0.199 BF
14 route 13 12 → 52 x 52 x 128
15 conv 128 1 x 1/ 1 52 x 52 x 128 → 52 x 52 x 128 0.089 BF
16 route 10 15 → 52 x 52 x 256
17 max 2x 2/ 2 52 x 52 x 256 → 26 x 26 x 256 0.001 BF
18 conv 256 3 x 3/ 1 26 x 26 x 256 → 26 x 26 x 256 0.797 BF
19 route 18 1/2 → 26 x 26 x 128
20 conv 128 3 x 3/ 1 26 x 26 x 128 → 26 x 26 x 128 0.199 BF
21 conv 128 3 x 3/ 1 26 x 26 x 128 → 26 x 26 x 128 0.199 BF
22 route 21 20 → 26 x 26 x 256
23 conv 256 1 x 1/ 1 26 x 26 x 256 → 26 x 26 x 256 0.089 BF
24 route 18 23 → 26 x 26 x 512
25 max 2x 2/ 2 26 x 26 x 512 → 13 x 13 x 512 0.000 BF
26 conv 512 3 x 3/ 1 13 x 13 x 512 → 13 x 13 x 512 0.797 BF
27 conv 256 1 x 1/ 1 13 x 13 x 512 → 13 x 13 x 256 0.044 BF
28 conv 512 3 x 3/ 1 13 x 13 x 256 → 13 x 13 x 512 0.399 BF
29 conv 45 1 x 1/ 1 13 x 13 x 512 → 13 x 13 x 45 0.008 BF
30 yolo
[yolo] params: iou loss: ciou (4), iou_norm: 0.07, cls_norm: 1.00, scale_x_y: 1.05
nms_kind: greedynms (1), beta = 0.600000
31 route 27 → 13 x 13 x 256
32 conv 128 1 x 1/ 1 13 x 13 x 256 → 13 x 13 x 128 0.011 BF
33 upsample 2x 13 x 13 x 128 → 26 x 26 x 128
34 route 33 23 → 26 x 26 x 384
35 conv 256 3 x 3/ 1 26 x 26 x 384 → 26 x 26 x 256 1.196 BF
36 conv 45 1 x 1/ 1 26 x 26 x 256 → 26 x 26 x 45 0.016 BF
37 yolo
[yolo] params: iou loss: ciou (4), iou_norm: 0.07, cls_norm: 1.00, scale_x_y: 1.05
nms_kind: greedynms (1), beta = 0.600000
Total BFLOPS 6.801
avg_outputs = 300864
Allocate additional workspace_size = 26.22 MB
custom-yolov4-tiny-detector
compute_capability = 750, cudnn_half = 0
net.optimized_memory = 0
mini_batch = 4, batch = 64, time_steps = 1, train = 1
layer filters size/strd(dil) input output
0 conv 32 3 x 3/ 2 416 x 416 x 3 → 208 x 208 x 32 0.075 BF
1 conv 64 3 x 3/ 2 208 x 208 x 32 → 104 x 104 x 64 0.399 BF
2 conv 64 3 x 3/ 1 104 x 104 x 64 → 104 x 104 x 64 0.797 BF
3 route 2 1/2 → 104 x 104 x 32
4 conv 32 3 x 3/ 1 104 x 104 x 32 → 104 x 104 x 32 0.199 BF
5 conv 32 3 x 3/ 1 104 x 104 x 32 → 104 x 104 x 32 0.199 BF
6 route 5 4 → 104 x 104 x 64
7 conv 64 1 x 1/ 1 104 x 104 x 64 → 104 x 104 x 64 0.089 BF
8 route 2 7 → 104 x 104 x 128
9 max 2x 2/ 2 104 x 104 x 128 → 52 x 52 x 128 0.001 BF
10 conv 128 3 x 3/ 1 52 x 52 x 128 → 52 x 52 x 128 0.797 BF
11 route 10 1/2 → 52 x 52 x 64
12 conv 64 3 x 3/ 1 52 x 52 x 64 → 52 x 52 x 64 0.199 BF
13 conv 64 3 x 3/ 1 52 x 52 x 64 → 52 x 52 x 64 0.199 BF
14 route 13 12 → 52 x 52 x 128
15 conv 128 1 x 1/ 1 52 x 52 x 128 → 52 x 52 x 128 0.089 BF
16 route 10 15 → 52 x 52 x 256
17 max 2x 2/ 2 52 x 52 x 256 → 26 x 26 x 256 0.001 BF
18 conv 256 3 x 3/ 1 26 x 26 x 256 → 26 x 26 x 256 0.797 BF
19 route 18 1/2 → 26 x 26 x 128
20 conv 128 3 x 3/ 1 26 x 26 x 128 → 26 x 26 x 128 0.199 BF
21 conv 128 3 x 3/ 1 26 x 26 x 128 → 26 x 26 x 128 0.199 BF
22 route 21 20 → 26 x 26 x 256
23 conv 256 1 x 1/ 1 26 x 26 x 256 → 26 x 26 x 256 0.089 BF
24 route 18 23 → 26 x 26 x 512
25 max 2x 2/ 2 26 x 26 x 512 → 13 x 13 x 512 0.000 BF
26 conv 512 3 x 3/ 1 13 x 13 x 512 → 13 x 13 x 512 0.797 BF
27 conv 256 1 x 1/ 1 13 x 13 x 512 → 13 x 13 x 256 0.044 BF
28 conv 512 3 x 3/ 1 13 x 13 x 256 → 13 x 13 x 512 0.399 BF
29 conv 45 1 x 1/ 1 13 x 13 x 512 → 13 x 13 x 45 0.008 BF
30 yolo
[yolo] params: iou loss: ciou (4), iou_norm: 0.07, cls_norm: 1.00, scale_x_y: 1.05
nms_kind: greedynms (1), beta = 0.600000
31 route 27 → 13 x 13 x 256
32 conv 128 1 x 1/ 1 13 x 13 x 256 → 13 x 13 x 128 0.011 BF
33 upsample 2x 13 x 13 x 128 → 26 x 26 x 128
34 route 33 23 → 26 x 26 x 384
35 conv 256 3 x 3/ 1 26 x 26 x 384 → 26 x 26 x 256 1.196 BF
36 conv 45 1 x 1/ 1 26 x 26 x 256 → 26 x 26 x 45 0.016 BF
37 yolo
[yolo] params: iou loss: ciou (4), iou_norm: 0.07, cls_norm: 1.00, scale_x_y: 1.05
nms_kind: greedynms (1), beta = 0.600000
Total BFLOPS 6.801
avg_outputs = 300864
Allocate additional workspace_size = 26.22 MB
Loading weights from yolov4-tiny.conv.29…
seen 64, trained: 0 K-images (0 Kilo-batches_64)
Done! Loaded 29 layers from weights-file
Learning Rate: 0.00261, Momentum: 0.9, Decay: 0.0005
Create 6 permanent cpu-threads
Loaded: 0.523372 seconds
CUDA status Error: file: ./src/blas_kernels.cu : () : line: 841 : build time: Jun 10 2022 - 16:10:16

CUDA Error: no kernel image is available for execution on the device
CUDA Error: no kernel image is available for execution on the device: File exists
darknet: ./src/utils.c:325: error: Assertion `0’ failed.

Mohamed · June 10, 2022, 5:17pm

Hi, I believe the error may be that this line was updated:

Can you try running the notebook with the code implemented as it was originally constructed for that cell? As it is setting the [original] first expression (ARCH= -gencode arch=compute_60,code=sm_60) as equal to this one: ARCH= -gencode arch=compute_${compute_capability},code=sm_${compute_capability. This updates those settings, which would give you the “75” values rather than “60.”

Original cell:

Jonas_Mack · June 11, 2022, 8:13am

That solved it, thank you. I guess I misunderstood the instructions for that line.

Topic		Replies	Views
Yolov4-tiny train error on google colab 🤝 Community Help bugs	8	1599	March 2, 2023
Custom Training for YOLO Darknet Detector on Colab 🤝 Community Help bugs	8	1811	March 2, 2024
YOLOv4-tiny colab notebook doesn't work 🤝 Community Help	1	21	December 8, 2025
yolov4 custom object detection error during training 🤝 Community Help	0	257	August 24, 2022
/bin/bash: ./darknet: No such file or dir 🤝 Community Help	0	243	May 24, 2023

Training Custom Yolov4 detector fails due to CUDA error

Related topics