- Project type: Object Detection
- The operating system & browser I am using: Windows 11, Firefox 101.0.1 (64 Bit)
I am following this Tutorial in Colabs called “Mobile Object Detection”: Google Colab
At Step 14
!./darknet detector train data/obj.data cfg/custom-yolov4-tiny-detector.cfg yolov4-tiny.conv.29 -dont_show -map
it produces following error
CUDA status Error: file: ./src/blas_kernels.cu : () : line: 841 : build time: Jun 10 2022 - 16:10:16
CUDA Error: no kernel image is available for execution on the device
CUDA Error: no kernel image is available for execution on the device: File exists
darknet: ./src/utils.c:325: error: Assertion `0’ failed.
I will attach the full output of step 14 at the very bottom of this post.
My only idea is that it might be caused by wrong configurations in step 7:
#install environment from the Makefile. Changes to mitigate CUDA error.
%cd darknet/
!sed -i ‘s/OPENCV=0/OPENCV=1/g’ Makefile
!sed -i ‘s/GPU=0/GPU=1/g’ Makefile
!sed -i ‘s/CUDNN=0/CUDNN=1/g’ Makefile
!sed -i “s/ARCH= -gencode arch=compute_75,code=sm_75/ARCH= -gencode arch=compute_${compute_capability},code=sm_${compute_capability}/g” Makefile
!make
I am using a Tesla T4 GPU, for which compute_75 and sm_75 should be used. I implemented this in the code as you can see, however in the output it says
nvcc -gencode arch=compute_60,code=sm_60
regardless, which is why I think this might be the issue. However I do not know why it still says compute_60 and sm_60.
Any help would be very much appreciated!
full output of step 14:
CUDA-version: 11010 (11020), cuDNN: 7.6.5, GPU count: 1
OpenCV version: 3.2.0
Prepare additional network for mAP calculation…
compute_capability = 750, cudnn_half = 0
net.optimized_memory = 0
mini_batch = 1, batch = 16, time_steps = 1, train = 0
layer filters size/strd(dil) input output
0 conv 32 3 x 3/ 2 416 x 416 x 3 → 208 x 208 x 32 0.075 BF
1 conv 64 3 x 3/ 2 208 x 208 x 32 → 104 x 104 x 64 0.399 BF
2 conv 64 3 x 3/ 1 104 x 104 x 64 → 104 x 104 x 64 0.797 BF
3 route 2 1/2 → 104 x 104 x 32
4 conv 32 3 x 3/ 1 104 x 104 x 32 → 104 x 104 x 32 0.199 BF
5 conv 32 3 x 3/ 1 104 x 104 x 32 → 104 x 104 x 32 0.199 BF
6 route 5 4 → 104 x 104 x 64
7 conv 64 1 x 1/ 1 104 x 104 x 64 → 104 x 104 x 64 0.089 BF
8 route 2 7 → 104 x 104 x 128
9 max 2x 2/ 2 104 x 104 x 128 → 52 x 52 x 128 0.001 BF
10 conv 128 3 x 3/ 1 52 x 52 x 128 → 52 x 52 x 128 0.797 BF
11 route 10 1/2 → 52 x 52 x 64
12 conv 64 3 x 3/ 1 52 x 52 x 64 → 52 x 52 x 64 0.199 BF
13 conv 64 3 x 3/ 1 52 x 52 x 64 → 52 x 52 x 64 0.199 BF
14 route 13 12 → 52 x 52 x 128
15 conv 128 1 x 1/ 1 52 x 52 x 128 → 52 x 52 x 128 0.089 BF
16 route 10 15 → 52 x 52 x 256
17 max 2x 2/ 2 52 x 52 x 256 → 26 x 26 x 256 0.001 BF
18 conv 256 3 x 3/ 1 26 x 26 x 256 → 26 x 26 x 256 0.797 BF
19 route 18 1/2 → 26 x 26 x 128
20 conv 128 3 x 3/ 1 26 x 26 x 128 → 26 x 26 x 128 0.199 BF
21 conv 128 3 x 3/ 1 26 x 26 x 128 → 26 x 26 x 128 0.199 BF
22 route 21 20 → 26 x 26 x 256
23 conv 256 1 x 1/ 1 26 x 26 x 256 → 26 x 26 x 256 0.089 BF
24 route 18 23 → 26 x 26 x 512
25 max 2x 2/ 2 26 x 26 x 512 → 13 x 13 x 512 0.000 BF
26 conv 512 3 x 3/ 1 13 x 13 x 512 → 13 x 13 x 512 0.797 BF
27 conv 256 1 x 1/ 1 13 x 13 x 512 → 13 x 13 x 256 0.044 BF
28 conv 512 3 x 3/ 1 13 x 13 x 256 → 13 x 13 x 512 0.399 BF
29 conv 45 1 x 1/ 1 13 x 13 x 512 → 13 x 13 x 45 0.008 BF
30 yolo
[yolo] params: iou loss: ciou (4), iou_norm: 0.07, cls_norm: 1.00, scale_x_y: 1.05
nms_kind: greedynms (1), beta = 0.600000
31 route 27 → 13 x 13 x 256
32 conv 128 1 x 1/ 1 13 x 13 x 256 → 13 x 13 x 128 0.011 BF
33 upsample 2x 13 x 13 x 128 → 26 x 26 x 128
34 route 33 23 → 26 x 26 x 384
35 conv 256 3 x 3/ 1 26 x 26 x 384 → 26 x 26 x 256 1.196 BF
36 conv 45 1 x 1/ 1 26 x 26 x 256 → 26 x 26 x 45 0.016 BF
37 yolo
[yolo] params: iou loss: ciou (4), iou_norm: 0.07, cls_norm: 1.00, scale_x_y: 1.05
nms_kind: greedynms (1), beta = 0.600000
Total BFLOPS 6.801
avg_outputs = 300864
Allocate additional workspace_size = 26.22 MB
custom-yolov4-tiny-detector
compute_capability = 750, cudnn_half = 0
net.optimized_memory = 0
mini_batch = 4, batch = 64, time_steps = 1, train = 1
layer filters size/strd(dil) input output
0 conv 32 3 x 3/ 2 416 x 416 x 3 → 208 x 208 x 32 0.075 BF
1 conv 64 3 x 3/ 2 208 x 208 x 32 → 104 x 104 x 64 0.399 BF
2 conv 64 3 x 3/ 1 104 x 104 x 64 → 104 x 104 x 64 0.797 BF
3 route 2 1/2 → 104 x 104 x 32
4 conv 32 3 x 3/ 1 104 x 104 x 32 → 104 x 104 x 32 0.199 BF
5 conv 32 3 x 3/ 1 104 x 104 x 32 → 104 x 104 x 32 0.199 BF
6 route 5 4 → 104 x 104 x 64
7 conv 64 1 x 1/ 1 104 x 104 x 64 → 104 x 104 x 64 0.089 BF
8 route 2 7 → 104 x 104 x 128
9 max 2x 2/ 2 104 x 104 x 128 → 52 x 52 x 128 0.001 BF
10 conv 128 3 x 3/ 1 52 x 52 x 128 → 52 x 52 x 128 0.797 BF
11 route 10 1/2 → 52 x 52 x 64
12 conv 64 3 x 3/ 1 52 x 52 x 64 → 52 x 52 x 64 0.199 BF
13 conv 64 3 x 3/ 1 52 x 52 x 64 → 52 x 52 x 64 0.199 BF
14 route 13 12 → 52 x 52 x 128
15 conv 128 1 x 1/ 1 52 x 52 x 128 → 52 x 52 x 128 0.089 BF
16 route 10 15 → 52 x 52 x 256
17 max 2x 2/ 2 52 x 52 x 256 → 26 x 26 x 256 0.001 BF
18 conv 256 3 x 3/ 1 26 x 26 x 256 → 26 x 26 x 256 0.797 BF
19 route 18 1/2 → 26 x 26 x 128
20 conv 128 3 x 3/ 1 26 x 26 x 128 → 26 x 26 x 128 0.199 BF
21 conv 128 3 x 3/ 1 26 x 26 x 128 → 26 x 26 x 128 0.199 BF
22 route 21 20 → 26 x 26 x 256
23 conv 256 1 x 1/ 1 26 x 26 x 256 → 26 x 26 x 256 0.089 BF
24 route 18 23 → 26 x 26 x 512
25 max 2x 2/ 2 26 x 26 x 512 → 13 x 13 x 512 0.000 BF
26 conv 512 3 x 3/ 1 13 x 13 x 512 → 13 x 13 x 512 0.797 BF
27 conv 256 1 x 1/ 1 13 x 13 x 512 → 13 x 13 x 256 0.044 BF
28 conv 512 3 x 3/ 1 13 x 13 x 256 → 13 x 13 x 512 0.399 BF
29 conv 45 1 x 1/ 1 13 x 13 x 512 → 13 x 13 x 45 0.008 BF
30 yolo
[yolo] params: iou loss: ciou (4), iou_norm: 0.07, cls_norm: 1.00, scale_x_y: 1.05
nms_kind: greedynms (1), beta = 0.600000
31 route 27 → 13 x 13 x 256
32 conv 128 1 x 1/ 1 13 x 13 x 256 → 13 x 13 x 128 0.011 BF
33 upsample 2x 13 x 13 x 128 → 26 x 26 x 128
34 route 33 23 → 26 x 26 x 384
35 conv 256 3 x 3/ 1 26 x 26 x 384 → 26 x 26 x 256 1.196 BF
36 conv 45 1 x 1/ 1 26 x 26 x 256 → 26 x 26 x 45 0.016 BF
37 yolo
[yolo] params: iou loss: ciou (4), iou_norm: 0.07, cls_norm: 1.00, scale_x_y: 1.05
nms_kind: greedynms (1), beta = 0.600000
Total BFLOPS 6.801
avg_outputs = 300864
Allocate additional workspace_size = 26.22 MB
Loading weights from yolov4-tiny.conv.29…
seen 64, trained: 0 K-images (0 Kilo-batches_64)
Done! Loaded 29 layers from weights-file
Learning Rate: 0.00261, Momentum: 0.9, Decay: 0.0005
Create 6 permanent cpu-threads
Loaded: 0.523372 seconds
CUDA status Error: file: ./src/blas_kernels.cu : () : line: 841 : build time: Jun 10 2022 - 16:10:16
CUDA Error: no kernel image is available for execution on the device
CUDA Error: no kernel image is available for execution on the device: File exists
darknet: ./src/utils.c:325: error: Assertion `0’ failed.