Yolov4性能分析(上)
Yolov4性能分析(上)
一.目錄
- 實(shí)驗(yàn)測(cè)試
1) 測(cè)試介紹
2) Test
3) Train
二.分析
1.實(shí)驗(yàn)測(cè)試
- 1
實(shí)驗(yàn)測(cè)試方法
Yolov4訓(xùn)練train實(shí)驗(yàn)方法(Darknet should be
compiled with OpenCV):
duration_run_detector:
./darknet detector train cfg/coco.data
cfg/yolov4.cfg data/yolov4.conv.137
Yolov4測(cè)試test實(shí)驗(yàn)方法(Yolo v4 - save result videofile
res.avi):
Yolo v4 - save result videofile
res.avi: darknet.exe
detector demo cfg/coco.data cfg/yolov4.cfg yolov4.weights test.mp4
-out_filename res.avi
打開Yolov4 Main函數(shù):
duration_run_detector: 0
,
duration_main_test_resize: 0
,
duration_main_visualize: 0
,
duration_main_partial: 0
,
duration_main_oneoff: 0
,
duration_main_operations: 0
,
duration_main_rescale_net 0
,
duration_main_normalize_net 0
,
duration_main_statistics_net
0
,
duration_main_reset_normalize_net 0
,
duration_main_run_rgbr_net 0
,
duration_main_run_nightmare 0
,
duration_main_run_captcha 0
,
duration_main_speed 0
,
duration_main_test_resize 0
,
duration_main_composite_3d 0
,
duration_main_run_writing 0
,
duration_main_run_dice 0
,
duration_main_run_compare 0
,
duration_main_run_tag 0
,
duration_main_run_art 0
,
duration_main_run_classifier 0
,
duration_main_predict_classifier 0
,
duration_main_predict_classifier 0
,
duration_main_run_coco
0
,
duration_main_run_vid_rnn 0
,
duration_main_run_char_rnn 0
,
duration_main_run_go 0
,
duration_main_run_cifar 0
,
duration_main_test_detector 0
//下面的接口參數(shù)是Train,Test,Validate的總接口
duration_main_run_detector 27023955
,
duration_main_run_super 0
,
duration_main_run_voxel 0
,
duration_main_run_yolo 0
,
duration_main_average 0
,
duration_main_denormalize_net 0
if (0 == strcmp(argv[2], "test")) test_detector(datacfg, cfg, weights, filename,
thresh, hier_thresh, dont_show, ext_output, save_labels, outfile, letter_box,
benchmark_layers); // 測(cè)試test_detector函數(shù)入口。
else if (0 == strcmp(argv[2], "train")) train_detector(datacfg, cfg, weights, gpus, ngpus,
clear, dont_show, calc_map, mjpeg_port, show_imgs, benchmark_layers,
chart_path); // 訓(xùn)練train_detector函數(shù)入口。
else if (0 == strcmp(argv[2], "valid")) validate_detector(datacfg,
cfg, weights, outfile); // 驗(yàn)證validate_detector函數(shù)入口。
一.Test
duration_run_detector_find_arg: 3
,
duration_run_detector_test_detector: 0
,
duration_run_detector_demo_detector: 27023955
,
duration_run_detector_train_detector: 0
,
duration_run_detector_calc_anchors: 0
,
duration_run_detector_draw_object: 0
,
duration_run_detector_validate_detector: 0
,
duration_run_detector_validate_detector_recall:
0
,
duration_run_detector_validate_map: 0
if (0 == strcmp(argv[2], “demo”)) {
list
*options = read_data_cfg(datacfg);
int
classes = option_find_int(options, “classes”, 20);
char
*name_list = option_find_str(options, “names”,
“data/names.list”);
char
**names = get_labels(name_list);
if
(filename)
if
(strlen(filename) > 0)
if (filename[strlen(filename) - 1] == 0x0d) filename[strlen(filename) -
1] = 0;
demo(cfg, weights, thresh,
hier_thresh, cam_index, filename, names, classes, avgframes, frame_skip,
prefix, out_filename,
mjpeg_port, dontdraw_bbox, json_port, dont_show, ext_output, letter_box,
time_limit_sec, http_post_host, benchmark, benchmark_layers);
free_list_contents_kvp(options);
free_list(options);
}
Demo Detector
duration_parse_network_cfg_custom 442932/ 27023955=1.64%
duration_demo_load_weights 497513/
27023955=1.84%
duration_fuse_conv_batchnorm 393218/
27023955=1.46%
duration_calculate_binary_weights 591245/27023955=2.19%
duration_get_capture_video_stream 610033/27023955=2.26%
duration_get_capture_webcam
duration_custom_create_thread 220031/27023955=0.8%
duration_thread_sync 315469/27023955=1.17%
duration_create_window_cv 1663027/27023955=6.15%
duration_get_stream_fps_cpp_cv 1335095/27023955=4.94%
duration_create_video_writer 2016790/27023955=7.46%
duration_get_time_point 1803257/27023955=6.67%
duration_this_thread_yield 2208903/27023955=8.17%
duration_custom_atomic_stire_int 478896/27023955=1.77%
duration_diounms_sort 448094/27023955=1.66%
duration_set_track_id 610708/27023955=2.26%
duration_send_json 2365887/27023955=8.75%
duration_send_http_post_request 1082366/27023955=4.01%
duration_draw_detections_cv_v3 3092754/27023955=11.41%
duration_save_cv_jpg 2890907/27023955=10.70%
duration_send_mjpg 2988041/27023955=11.57%
duration_write_frame_cv 2605713/27023955=9.64%
duration_realease_image_mat 523714/27023955=1.94%
duration_delay_time 505567/27023955=1.87%
duration_free_all_thread 587132/27023955=2.17%
Demo:
net = parse_network_cfg_custom(cfgfile, 1, 1); // set batch=1
load_weights(&net, weightfile);
fuse_conv_batchnorm(net);
calculate_binary_weights(net);
if(filename){
printf(“video file: %s\n”, filename);
cap =
get_capture_video_stream(filename);
}
else
{
printf("Webcam
index: %d\n", cam_index);
cap =
get_capture_webcam(cam_index);
}
custom_create_thread(&fetch_thread, 0,
fetch_in_thread, 0));
fetch_in_thread_sync(0); //fetch_in_thread(0);
fetch_in_thread_sync(0); //fetch_in_thread(0);
detect_in_thread_sync(0); //fetch_in_thread(0);
create_window_cv(“Demo”, full_screen, 1352,
1013);
if
(out_filename && !flag_exit)
{int
src_fps = 25;
src_fps
= get_stream_fps_cpp_cv(cap);
output_video_writer =
create_video_writer(out_filename, ‘D’, ‘I’, ‘V’, ‘X’, src_fps,
get_width_mat(det_img), get_height_mat(det_img), 1);
//'H',
‘2’, ‘6’, ‘4’
//'D',
‘I’, ‘V’, ‘X’
//'M',
‘J’, ‘P’, ‘G’
//'M',
‘P’, ‘4’, ‘V’
//'M',
‘P’, ‘4’, ‘2’
//'X',
‘V’, ‘I’, ‘D’
//'W',
‘M’, ‘V’, ‘2’
}
this_thread_yield();
if (!benchmark)
custom_atomic_store_int(&run_fetch_in_thread, 1);
custom_atomic_store_int(&run_detect_in_thread, 1);
if (nms) {
if (l.nms_kind == DEFAULT_NMS) do_nms_sort(local_dets,
local_nboxes, l.classes, nms);
else diounms_sort(local_dets, local_nboxes, l.classes,
nms, l.nms_kind, l.beta_nms);
}
if (l.embedding_size) set_track_id(local_dets,
local_nboxes, demo_thresh, l.sim_thresh, l.track_ciou_norm,
l.track_history_size, l.dets_for_track, l.dets_for_show);
if (demo_json_port > 0) {
int timeout
= 400000;
send_json(local_dets, local_nboxes, l.classes, demo_names, frame_id,
demo_json_port, timeout);
}
show_image_mat(show_img, “Demo”);
wait_key_cv(1);
send_http_post_request(http_post_host, http_post_port,
filename,
local_dets, nboxes, classes, names, frame_id, ext_output, timeout);
draw_detections_cv_v3(show_img, local_dets,
local_nboxes, demo_thresh, demo_names, demo_alphabet, demo_classes,
demo_ext_output);
free_detections(local_dets, local_nboxes);
if(show_img) save_cv_jpg(show_img, buff);
// if you run it with param -mjpeg_port 8090 then open URL in your web-browser:
http://localhost:8090
if
(mjpeg_port > 0 && show_img) {
int
port = mjpeg_port;
int
timeout = 400000;
int
jpeg_quality = 40; // 1 - 100
send_mjpeg(show_img, port, timeout, jpeg_quality);
}
// save video
file
if
(output_video_writer && show_img) {
write_frame_cv(output_video_writer, show_img);
printf("\n cvWriteFrame \n");
}
while
(custom_atomic_load_int(&run_detect_in_thread)) {
if(avg_fps > 180) this_thread_yield();
else this_thread_sleep_for(thread_wait_ms); // custom_join(detect_thread, 0);
}
if
(!benchmark) {
while
(custom_atomic_load_int(&run_fetch_in_thread)) {
if(avg_fps
- this_thread_yield();
else
this_thread_sleep_for(thread_wait_ms);
// custom_join(fetch_thread, 0);
}
free_image(det_s);
}
if
(time_limit_sec > 0 && (get_time_point() - start_time_lim)/1000000
time_limit_sec) {
printf(" start_time_lim = %f, get_time_point() = %f, time spent =
%f \n", start_time_lim, get_time_point(), get_time_point() -
start_time_lim);
break;
}
二.Train
1)if (0 == strcmp(argv[2],
“train”)) train_detector(datacfg, cfg,
weights, gpus, ngpus, clear, dont_show, calc_map, mjpeg_port, show_imgs,
benchmark_layers, chart_path);
2)train_detector()函數(shù):數(shù)據(jù)加載入口。
pthread_t load_thread = load_data(args); // 首次創(chuàng)建并啟動(dòng)加載線程,args為模型
訓(xùn)練參數(shù)。
1) load_data()函數(shù):load_threads()分配線程。
pthread_t load_data(load_args args)
/* 調(diào)用load_threads()函數(shù)。 */
if(pthread_create(&thread, 0, load_threads, ptr)) error(“Thread creation failed”); // 參數(shù)1:指向線程標(biāo)識(shí)符的指針;參數(shù)2:設(shè)置線程屬性;參數(shù)3:線程運(yùn)行函數(shù)的地址;參數(shù)4:運(yùn)行函數(shù)的參數(shù)。
2) 多線程調(diào)用run_thread_loop()。
if (pthread_create(&threads[i], 0, run_thread_loop, ptr)) error(“Thread creation failed”); // 根據(jù)線程個(gè)數(shù),調(diào)用run_thread_loop函數(shù)。
3) load_thread()函數(shù)中:根據(jù)type標(biāo)識(shí)符執(zhí)行最底層的數(shù)據(jù)加載任務(wù)load_data_detection()。
void *run_thread_loop(void
*ptr)
pthread_mutex_lock(&mtx_load_data);
load_args *args_local =
(load_args *)xcalloc(1, sizeof(load_args));
*args_local = args_swap[i]; //
傳入線程ID,在load_threads()函數(shù)中args_swap[i]
= args。
pthread_mutex_unlock(&mtx_load_data);load_thread(args_local); // 調(diào)用load_thread()函數(shù)。
custom_atomic_store_int(&run_load_data[i],
0);
4) load_thread()函數(shù)中:根據(jù)type標(biāo)識(shí)符執(zhí)行最底層的數(shù)據(jù)加載任務(wù)load_data_detection()。
if (a.type == DETECTION_DATA){
// 用于檢測(cè)的數(shù)據(jù),在train_detector()函數(shù)中,args.type
= DETECTION_DATA。
*a.d = load_data_detection(a.n,
a.paths, a.m, a.w, a.h, a.c, a.num_boxes, a.classes, a.flip, a.gaussian_noise,
a.blur, a.mixup, a.jitter, a.resize, a.hue, a.saturation, a.exposure,
a.mini_batch, a.track, a.augment_speed, a.letter_box, a.show_imgs);
5) “darknet/src/data.c”–load_data_detection()函數(shù)根據(jù)是否配置opencv,有兩個(gè)版本,opencv版本中:
基本數(shù)據(jù)處理:
包括crop、flip、HSV augmentation、blur以及gaussian_noise。(注意,a.type
== DETECTION_DATA時(shí),無angle參數(shù)傳入,沒有圖像旋轉(zhuǎn)增強(qiáng))
if (track) random_paths =
get_sequential_paths(paths, n, m, mini_batch, augment_speed); // 目標(biāo)跟蹤。
else random_paths =
get_random_paths(paths, n, m); // 隨機(jī)選取n張圖片的路徑。
src = load_image_mat_cv(filename, flag); //
image_opencv.cpp中,load_image_mat_cv函數(shù)入口,使用opencv讀取圖像。
/* 將原圖進(jìn)行一定比例的縮放。 */
float
img_ar = (float)ow / (float)oh; // 讀取到的原始圖像寬高比。
float
net_ar = (float)w / (float)h; // 規(guī)定的,輸入到網(wǎng)絡(luò)要求的圖像寬高比。
float result_ar = img_ar / net_ar; //
兩者求比值來判斷如何進(jìn)行l(wèi)etter_box縮放。
//
swidth - should be increased
/* 執(zhí)行l(wèi)etter_box變換。 */
/*
truth在調(diào)用函數(shù)后獲得所有圖像的標(biāo)簽信息,因?yàn)閷?duì)原始圖片進(jìn)行了數(shù)據(jù)增強(qiáng),其中的平移抖動(dòng)勢(shì)必會(huì)改動(dòng)每個(gè)物體的矩形框標(biāo)簽信息,需要根據(jù)具體的數(shù)據(jù)增強(qiáng)方式進(jìn)行相應(yīng)矯正,后面的參數(shù)就是用于數(shù)據(jù)增強(qiáng)后的矩形框信息矯正。 */
//
image_opencv.cpp中,image_data_augmentation函數(shù)入口,數(shù)據(jù)增強(qiáng)。
image ai = image_data_augmentation(src, w,
h, pleft, ptop, swidth, sheight, flip, dhue, dsat, dexp, gaussian_noise, blur,
boxes, truth);
6) image_data_augmentation()函數(shù)
cv::Mat img = *(cv::Mat *)mat; // 讀取圖像數(shù)據(jù)。
// crop
//
flip,雖然配置文件里沒有flip參數(shù),但代碼里有使用。
// HSV
augmentation
gaussian_noise
// Mat
-> image
7) 高級(jí)數(shù)據(jù)處理:
主要是mosaic數(shù)據(jù)增強(qiáng)。
…
if (use_mixup == 0) { // 不使用mixup。
d.X.vals[i] = ai.data;
memcpy(d.y.vals[i], truth, 5 *
boxes * sizeof(float)); // C庫(kù)函數(shù),從存儲(chǔ)區(qū)truth復(fù)制5 *
boxes * sizeof(float)個(gè)字節(jié)到存儲(chǔ)區(qū)d.y.vals[i]。
}else if (use_mixup == 1) { // 使用mixup。if
(i_mixup == 0) { //
第一個(gè)序列。
d.X.vals[i] = ai.data;memcpy(d.y.vals[i], truth, 5 * boxes * sizeof(float)); // n張圖的label->d.y.vals,i_mixup=1時(shí),作為上一個(gè)sequence的label。}
else if (i_mixup == 1) { // 第二個(gè)序列,此時(shí)d.X.vals已經(jīng)儲(chǔ)存上個(gè)序列n張?jiān)鰪?qiáng)后的圖。
image old_img = make_empty_image(w, h, c);
old_img.data =
d.X.vals[i]; // 記錄上一個(gè)序列的n張old_img。
blend_images_cv(ai, 0.5,
old_img, 0.5); //
image_opencv.cpp中,blend_images_cv函數(shù)入口,新舊序列對(duì)應(yīng)的兩張圖進(jìn)行線性融合,ai只是在i_mixup和i循環(huán)最里層的一張圖。
blend_truth(d.y.vals[i],
boxes, truth); // 上一個(gè)序列的d.y.vals[i]與這個(gè)序列的truth融合。
free_image(old_img);
// 釋放img數(shù)據(jù)。
d.X.vals[i] = ai.data;
// 保存這個(gè)序列的n張圖。
}
}
else if (use_mixup == 3) { // mosaic數(shù)據(jù)增強(qiáng)。if
(i_mixup == 0) { //
第一序列,初始化。
image tmp_img = make_image(w, h, c);
d.X.vals[i] =
tmp_img.data;
}
if (flip) { // 翻轉(zhuǎn)。
int tmp =
pleft;
pleft = pright;
pright = tmp;
}
const int left_shift = min_val_cmp(cut_x[i],
max_val_cmp(0, (-pleft*w
/ ow))); // utils.h中,min_val_cmp函數(shù)入口,取小(min)取大(max)。
const int top_shift = min_val_cmp(cut_y[i],
max_val_cmp(0, (-ptoph
/ oh))); // ptop<0時(shí),取cut_y[i]與-ptoph / oh較小的,否則返回0。
const int right_shift = min_val_cmp((w -
cut_x[i]), max_val_cmp(0,
(-pright*w / ow)));
const int bot_shift = min_val_cmp(h -
cut_y[i], max_val_cmp(0,
(-pbot*h / oh)));
int k, x, y;
for (k
= 0; k < c; ++k) { //
通道。
for (y
= 0; y < h; ++y) { //
高度。
int j =
yw + kw*h; // 每張圖i,按行堆疊索引j。
if (i_mixup == 0
&& y < cut_y[i]) { // 右下角區(qū)塊,i_mixup=0~3,d.X.vals[i]未被清0,累計(jì)粘貼4塊區(qū)域。
int
j_src = (w - cut_x[i] - right_shift) + (y + h - cut_y[i] - bot_shift)w + kw*h;
memcpy(&d.X.vals[i][j
-
0],
&ai.data[j_src], cut_x[i] * sizeof(float));
// 由ai.data[j_src]所指內(nèi)存區(qū)域復(fù)制cut_x[i]*sizeof(float)個(gè)字節(jié)到&d.X.vals[i][j -
0]所指內(nèi)存區(qū)域。
}
if (i_mixup
== 1 && y <
cut_y[i]) { // 左下角區(qū)塊。
int
j_src = left_shift + (y + h - cut_y[i] - bot_shift)w + kw*h;
memcpy(&d.X.vals[i][j
- cut_x[i]], &ai.data[j_src], (w-cut_x[i]) * sizeof(float));
}
if
(i_mixup == 2 &&
y >= cut_y[i]) { // 右上角區(qū)塊。
int
j_src = (w - cut_x[i] - right_shift) + (top_shift + y - cut_y[i])w + kw*h;
memcpy(&d.X.vals[i][j
- 0],
&ai.data[j_src], cut_x[i] * sizeof(float));
}
if
(i_mixup == 3 &&
y >= cut_y[i]) { // 左上角區(qū)塊。
int
j_src = left_shift + (top_shift + y - cut_y[i])w + kw*h;
memcpy(&d.X.vals[i][j
- cut_x[i]], &ai.data[j_src], (w - cut_x[i]) * sizeof(float));
}
}
}
blend_truth_mosaic(d.y.vals[i], boxes, truth, w, h,
cut_x[i], cut_y[i], i_mixup, left_shift, right_shift, top_shift, bot_shift); //
label對(duì)應(yīng)shift調(diào)整。
free_image(ai);
ai.data =
d.X.vals[i];
}
…
8)
整體架構(gòu)
整體架構(gòu)和YOLO-V3相同(感謝知乎大神@江大白),創(chuàng)新點(diǎn)如下:
輸入端 --> Mosaic數(shù)據(jù)增強(qiáng)、cmBN、SAT自對(duì)抗訓(xùn)練;
BackBone --> CSPDarknet53、Mish激活函數(shù)、Dropblock;
Neck --> SPP、FPN+PAN結(jié)構(gòu);
Prediction --> GIOU_Loss、DIOU_nms。
網(wǎng)絡(luò)配置文件(.cfg)決定了模型架構(gòu),訓(xùn)練時(shí)需要在命令行指定。文件以[net]段開頭,定義與訓(xùn)練直接相關(guān)的參數(shù):
[net]
Testing # 測(cè)試時(shí),batch和subdivisions設(shè)置為1,否則可能出錯(cuò)。
#batch=1 # 大一些可以減小訓(xùn)練震蕩及訓(xùn)練時(shí)NAN的出現(xiàn)。
#subdivisions=1 # 必須為為8的倍數(shù),顯存吃緊可以設(shè)成32或64。
Training
batch=64 # 訓(xùn)練過程中將64張圖一次性加載進(jìn)內(nèi)存,前向傳播后將64張圖的loss累加求平均,再一次性后向傳播更新權(quán)重。
subdivisions=16 # 一個(gè)batch分16次完成前向傳播,即每次計(jì)算4張。
width=608 # 網(wǎng)絡(luò)輸入的寬。
height=608 # 網(wǎng)絡(luò)輸入的高。
channels=3 # 網(wǎng)絡(luò)輸入的通道數(shù)。
momentum=0.949 # 動(dòng)量梯度下降優(yōu)化方法中的動(dòng)量參數(shù),更新的時(shí)候在一定程度上保留之前更新的方向。
decay=0.0005 # 權(quán)重衰減正則項(xiàng),用于防止過擬合。
angle=0 # 數(shù)據(jù)增強(qiáng)參數(shù),通過旋轉(zhuǎn)角度來生成更多訓(xùn)練樣本。
saturation = 1.5 # 數(shù)據(jù)增強(qiáng)參數(shù),通過調(diào)整飽和度來生成更多訓(xùn)練樣本。
exposure = 1.5 # 數(shù)據(jù)增強(qiáng)參數(shù),通過調(diào)整曝光量來生成更多訓(xùn)練樣本。
hue=.1 # 數(shù)據(jù)增強(qiáng)參數(shù),通過調(diào)整色調(diào)來生成更多訓(xùn)練樣本。
learning_rate=0.001 # 學(xué)習(xí)率。
burn_in=1000 # 在迭代次數(shù)小于burn_in時(shí),學(xué)習(xí)率的更新為一種方式,大于burn_in時(shí),采用policy的更新方式。
max_batches = 500500 #訓(xùn)練迭代次數(shù),跑完一個(gè)batch為一次,一般為類別數(shù)*2000,訓(xùn)練樣本少或train from scratch可適當(dāng)增加。
policy=steps # 學(xué)習(xí)率調(diào)整的策略。
steps=400000,450000 # 動(dòng)態(tài)調(diào)整學(xué)習(xí)率,steps可以取max_batches的0.8~0.9。
scales=.1,.1 # 迭代到steps(1)次時(shí),學(xué)習(xí)率衰減十倍,steps(2)次時(shí),學(xué)習(xí)率又會(huì)在前一個(gè)學(xué)習(xí)率的基礎(chǔ)上衰減十倍。
#cutmix=1 # cutmix數(shù)據(jù)增強(qiáng),將一部分區(qū)域cut掉但不填充0像素而是隨機(jī)填充訓(xùn)練集中的其他數(shù)據(jù)的區(qū)域像素值,分類結(jié)果按一定的比例分配。
mosaic=1 # 馬賽克數(shù)據(jù)增強(qiáng),取四張圖,隨機(jī)縮放、隨機(jī)裁剪、隨機(jī)排布的方式拼接,詳見上述代碼分析。
其余區(qū)段,包括[convolutional]、[route]、[shortcut]、[maxpool]、[upsample]、[yolo]層,為不同類型的層的配置參數(shù)。YOLO-V4中[net]層之后堆疊多個(gè)CBM及CSP層,首先是2個(gè)CBM層,CBM結(jié)構(gòu)如下:
[convolutional]
batch_normalize=1 # 是否進(jìn)行BN。
filters=32
卷積核個(gè)數(shù),也就是該層的輸出通道數(shù)。
size=3
卷積核大小。
stride=1
卷積步長(zhǎng)。
pad=1
pad邊緣補(bǔ)像素。
activation=mish # 網(wǎng)絡(luò)層激活函數(shù),yolo-v4只在Backbone中采用了mish,網(wǎng)絡(luò)后面仍采用Leaky_relu。
創(chuàng)新點(diǎn)是Mish激活函數(shù),與Leaky_Relu曲線對(duì)比如圖:
Mish在負(fù)值的時(shí)候并不是完全截?cái)?#xff0c;而是允許比較小的負(fù)梯度流入,保證了信息的流動(dòng)。此外,平滑的激活函數(shù)允許更好的信息深入神經(jīng)網(wǎng)絡(luò),梯度下降效果更好,從而提升準(zhǔn)確性和泛化能力。
兩個(gè)CBM后是CSP1,CSP1結(jié)構(gòu)如下:
CSP1 = CBM + 1個(gè)殘差unit + CBM -> Concat(with CBM),見總圖。
[convolutional] # CBM層,直接與7層后的route層連接,形成總圖中CSPX下方支路。
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=mish
[route] # 得到前面第2層的輸出,即CSP開始位置,構(gòu)建如圖所示的CSP第一支路。
layers = -2
[convolutional] # CBM層。
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=mish
Residual Block
[convolutional] # CBM層。
batch_normalize=1
filters=32
size=1
stride=1
pad=1
activation=mish
[convolutional] # CBM層。
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=mish
[shortcut] # add前面第3層的輸出,Residual Block結(jié)束。
from=-3
activation=linear
[convolutional] # CBM層。
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=mish
[route] # Concat上一個(gè)CBM層與前面第7層(CBM)的輸出。
layers = -1,-7
接下來的CBM及CSPX架構(gòu)與上述block相同,只是CSPX對(duì)應(yīng)X個(gè)殘差單元,如圖:
CSP模塊將基礎(chǔ)層的特征映射劃分為兩部分,再skip
connection,減少計(jì)算量的同時(shí)保證了準(zhǔn)確率。
要注意的是,backbone中兩次出現(xiàn)分支,與后續(xù)Neck連接,稍后會(huì)解釋。
四. Neck&Prediction
.cfg配置文件后半部分是Neck和YOLO-Prediction設(shè)置,我做了重點(diǎn)注釋:
CBL*3
[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky
不再使用Mish。
[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky
[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky
SPP-最大池化的方式進(jìn)行多尺度融合
[maxpool] #
5*5。
stride=1
size=5
[route]
layers=-2
[maxpool] #
9*9。
stride=1
size=9
[route]
layers=-4
[maxpool] #
13*13。
stride=1
size=13
[route] #
Concat。
layers=-1,-3,-5,-6
End SPP
CBL*3
[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky
不再使用Mish。
[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky
[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky
CBL
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
上采樣
[upsample]
stride=2
[route]
layers = 85 # 獲取Backbone中CBM+CSP8+CBM模塊的輸出,85從net以外的層開始計(jì)數(shù),從0開始索引。
[convolutional]
增加CBL支路。
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
[route] #
Concat。
layers = -1,
-3
CBL*5
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=512
activation=leaky
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=512
activation=leaky
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
CBL
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
上采樣
[upsample]
stride=2
[route]
layers = 54 # 獲取Backbone中CBM2+CSP1+CBM2+CSP2+CBM*2+CSP8+CBM模塊的輸出,54從net以外的層開始計(jì)數(shù),從0開始索引。
CBL
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[route] #
Concat。
layers = -1,
-3
CBL*5
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=256
activation=leaky
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=256
activation=leaky
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
Prediction
CBL
[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=256
activation=leaky
conv
[convolutional]
size=1
stride=1
pad=1
filters=255
activation=linear
[yolo] #
7676255,對(duì)應(yīng)最小的anchor box。mask = 0,1,2 #
當(dāng)前屬于第幾個(gè)預(yù)選框。# coco數(shù)據(jù)集默認(rèn)值,可通過detector
calc_anchors,利用k-means計(jì)算樣本anchors,但要根據(jù)每個(gè)anchor的大小(是否超過6060或3030)更改mask對(duì)應(yīng)的索引(第一個(gè)yolo層對(duì)應(yīng)小尺寸;第二個(gè)對(duì)應(yīng)中等大小;第三個(gè)對(duì)應(yīng)大尺寸)及上一個(gè)conv層的filters。anchors = 12,
16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
classes=80 # 網(wǎng)絡(luò)需要識(shí)別的物體種類數(shù)。num=9 # 預(yù)選框的個(gè)數(shù),即anchors總數(shù)。jitter=.3 # 通過抖動(dòng)增加噪聲來抑制過擬合。ignore_thresh
= .7truth_thresh = 1scale_x_y = 1.2iou_thresh=0.213cls_normalizer=1.0iou_normalizer=0.07iou_loss=ciou
CIOU損失函數(shù),考慮目標(biāo)框回歸函數(shù)的重疊面積、中心點(diǎn)距離及長(zhǎng)寬比。nms_kind=greedynmsbeta_nms=0.6max_delta=5
[route]
layers = -4 # 獲取Neck第一層的輸出。
構(gòu)建第二分支 ###### CBL
###[convolutional]batch_normalize=1size=3stride=2pad=1filters=256activation=leaky
[route] #
Concat。layers = -1, -16
CBL*5
###[convolutional]batch_normalize=1filters=256size=1stride=1pad=1activation=leaky
[convolutional]batch_normalize=1size=3stride=1pad=1filters=512activation=leaky
[convolutional]batch_normalize=1filters=256size=1stride=1pad=1activation=leaky
[convolutional]batch_normalize=1size=3stride=1pad=1filters=512activation=leaky
[convolutional]batch_normalize=1filters=256size=1stride=1pad=1activation=leaky
CBL
###[convolutional]batch_normalize=1size=3stride=1pad=1filters=512activation=leaky
conv
###[convolutional]size=1stride=1pad=1filters=255activation=linear
[yolo] #
3838255,對(duì)應(yīng)中等的anchor box。mask =
3,4,5anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192,
243, 459, 401classes=80num=9jitter=.3ignore_thresh = .7truth_thresh =
1scale_x_y =
1.1iou_thresh=0.213cls_normalizer=1.0iou_normalizer=0.07iou_loss=ciounms_kind=greedynmsbeta_nms=0.6max_delta=5
[route] # 獲取Neck第二層的輸出。layers = -4
構(gòu)建第三分支 ###### CBL
###[convolutional]batch_normalize=1size=3stride=2pad=1filters=512activation=leaky
[route] #
Concat。layers = -1, -37
CBL*5
###[convolutional]batch_normalize=1filters=512size=1stride=1pad=1activation=leaky
[convolutional]batch_normalize=1size=3stride=1pad=1filters=1024activation=leaky
[convolutional]batch_normalize=1filters=512size=1stride=1pad=1activation=leaky
[convolutional]batch_normalize=1size=3stride=1pad=1filters=1024activation=leaky
[convolutional]batch_normalize=1filters=512size=1stride=1pad=1activation=leaky
CBL
###[convolutional]batch_normalize=1size=3stride=1pad=1filters=1024activation=leaky
conv
###[convolutional]size=1stride=1pad=1filters=255activation=linear
[yolo] #
1919255,對(duì)應(yīng)最大的anchor box。mask =
6,7,8anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192,
243, 459, 401classes=80num=9jitter=.3ignore_thresh = .7truth_thresh =
1random=1scale_x_y = 1.05iou_thresh=0.213cls_normalizer=1.0iou_normalizer=0.07iou_loss=ciounms_kind=greedynmsbeta_nms=0.6max_delta=5
其中第一個(gè)創(chuàng)新點(diǎn)是引入Spatial Pyramid Pooling(SPP)模塊:
代碼中max pool和route層組合,三個(gè)不同尺度的max-pooling將前一個(gè)卷積層輸出的feature maps進(jìn)行多尺度的特征處理,再與原圖進(jìn)行拼接,一共4個(gè)scale。相比于只用一個(gè)max-pooling,提取的特征范圍更大,而且將不同尺度的特征進(jìn)行了有效分離;
第二個(gè)創(chuàng)新點(diǎn)是在FPN的基礎(chǔ)上引入PAN結(jié)構(gòu):
原版PANet中PAN操作是做element-wise相加,YOLO-V4則采用擴(kuò)增維度的Concat,如下圖:
Backbone下采樣不同階段得到的特征圖Concat后續(xù)上采樣階對(duì)應(yīng)尺度的的output,形成FPN結(jié)構(gòu),再經(jīng)過兩個(gè)botton-up的PAN結(jié)構(gòu)。
下采樣1:前10個(gè)block中,只有3個(gè)CBM的stride為2,輸入圖像尺寸變?yōu)?08/222=76,filters根據(jù)最后一個(gè)CBM為256,因此第10個(gè)block輸出feature map為7676256;
下采樣2:繼續(xù)Backbone,同理,第13個(gè)block(CBM)輸出3838512的特征圖;
下采樣3:第23個(gè)block(CBL)輸出為1919512;
上采樣1:下采樣3
- CBL + 上采樣 =
3838256;
Concat1:[上采樣1] Concat [下采樣2 + CBL] = [3838256] Concat
[3838512 + (256,1)] =
3838512;
上采樣2:Concat1
- CBL5 + CBL + 上采樣
= 7676*128;
Concat2:[上采樣2] Concat [下采樣1 + CBL] = [7676128] Concat
[7676256 + (128,1)] =
7676256;
Concat3(PAN1):[Concat2
- CBL5 + CBL] Concat [Concat1 + CBL5] = [7676256 + (128,1) + (256,2)] Concat [3838512 + (256,1)] = [3838256] Concat [3838256]
= 3838512;
Concat4(PAN2):[Concat3
- CBL5 + CBL] Concat [下采樣3]
= [3838512 + (256,1) +
(512,2)]
Concat [1919512] = 1919*1024;
Prediction①:Concat2
- CBL5 + CBL + conv = 7676*256 + (128,1)
- (256,1) +
(filters,1) =
7676filters,其中filters
= (class_num + 5)*3,圖中默認(rèn)COCO數(shù)據(jù)集,80類所以是255;
Prediction②:PAN1
- CBL5 + CBL + conv = 3838*512 + (256,1)
- (512,1) +
(filters,1) =
3838filters,其中filters
= (class_num + 5)*3,圖中默認(rèn)COCO數(shù)據(jù)集,80類所以是255;
Prediction③:PAN2
- CBL5 + CBL + conv = 1919*1024 + (512,1)
- (1024,1) +
(filters,1) =
1919filters,其中filters
= (class_num + 5)*3,圖中默認(rèn)COCO數(shù)據(jù)集,80類所以是255。
五. 網(wǎng)絡(luò)構(gòu)建
上述從backbone到prediction的網(wǎng)絡(luò)架構(gòu),源碼中都是基于network結(jié)構(gòu)體來儲(chǔ)存網(wǎng)絡(luò)參數(shù)。具體流程如下:
“darknet/src/detector.c”–train_detector()函數(shù)中:
// 計(jì)算mAP。
五. 網(wǎng)絡(luò)構(gòu)建
上述從backbone到prediction的網(wǎng)絡(luò)架構(gòu),源碼中都是基于network結(jié)構(gòu)體來儲(chǔ)存網(wǎng)絡(luò)參數(shù)。具體流程如下:
“darknet/src/detector.c”–train_detector()函數(shù)中:
…
network
net_map;
if (calc_map) {
// 計(jì)算mAP。
......net_map =
parse_network_cfg_custom(cfgfile, 1, 1); // parser.c中parse_network_cfg_custom函數(shù)入口,加載cfg和參數(shù)構(gòu)建網(wǎng)絡(luò),batch = 1。
net_map.benchmark_layers = benchmark_layers;
const int
net_classes = net_map.layers[net_map.n - 1].classes;
int k; // free memory unnecessary arraysfor (k = 0;
k < net_map.n - 1; ++k) free_layer_custom(net_map.layers[k], 1);
......}srand(time(0));char *base =
basecfg(cfgfile); // utils.c中basecfg()函數(shù)入口,解析cfg/yolo-obj.cfg文件,就是模型的配置參數(shù),并打印。
printf("%s\n", base);
float avg_loss
= -1;
network* nets =
(network*)xcalloc(ngpus, sizeof(network)); // 給network結(jié)構(gòu)體分內(nèi)存,用來儲(chǔ)存網(wǎng)絡(luò)參數(shù)。
srand(time(0));int seed =
rand();
int k;for (k = 0; k
< ngpus; ++k) {
srand(seed);
#ifdef GPU
cuda_set_device(gpus[k]);
#endif
nets[k] =
parse_network_cfg(cfgfile); // parse_network_cfg_custom(cfgfile, 0, 0),nets根據(jù)GPU個(gè)數(shù)分別加載配置文件。
nets[k].benchmark_layers = benchmark_layers;
if
(weightfile) {
load_weights(&nets[k], weightfile); // parser.c中l(wèi)oad_weights()接口,讀取權(quán)重文件。
}if (clear)
{ // 是否清零。
*nets[k].seen = 0;
*nets[k].cur_iteration = 0;
}
nets[k].learning_rate *= ngpus;
}srand(time(0));network net =
nets[0]; // 參數(shù)傳遞給net
....../* 準(zhǔn)備加載參數(shù)。 */load_args args
= { 0 };
args.w = net.w;args.h = net.h;args.c = net.c;args.paths =
paths;
args.n = imgs;args.m =
plist->size;
args.classes =
classes;
args.flip =
net.flip;
args.jitter =
l.jitter;
args.resize =
l.resize;
args.num_boxes
= l.max_boxes;
net.num_boxes =
args.num_boxes;
net.train_images_num = train_images_num;
args.d =
&buffer;
args.type =
DETECTION_DATA;
args.threads =
64; // 16 or 64
…
“darknet/src/parser.c”–parse_network_cfg_custom()函數(shù)中:
network parse_network_cfg_custom(char *filename, int
batch, int time_steps)
{
list *sections
= read_cfg(filename); // 讀取配置文件,構(gòu)建成一個(gè)鏈表list。
node *n =
sections->front; // 定義sections的首節(jié)點(diǎn)為n。
if(!n)
error(“Config file has no sections”);
network net =
make_network(sections->size - 1); // network.c中,make_network函數(shù)入口,從net變量下一層開始,依次為其中的指針變量分配內(nèi)存。由于第一個(gè)段[net]中存放的是和網(wǎng)絡(luò)并不直接相關(guān)的配置參數(shù),因此網(wǎng)絡(luò)中層的數(shù)目為sections->size
-
1。
net.gpu_index =
gpu_index;size_params
params;if (batch >
-
params.train = 0; // allocates
memory for Detection onlyelse
params.train = 1; //
allocates memory for Detection & Trainingsection *s =
(section *)n->val; // 首節(jié)點(diǎn)n的val傳遞給section。list *options =
s->options;
if(!is_network(s)) error(“First section must be [net] or
[network]”);
parse_net_options(options, &net); // 初始化網(wǎng)絡(luò)全局參數(shù),包含但不限于[net]中的參數(shù)。
#ifdef GPU
printf(“net.optimized_memory = %d \n”, net.optimized_memory);
if
(net.optimized_memory >= 2 && params.train) {
pre_allocate_pinned_memory((size_t)1024 * 1024 * 1024 * 8); // pre-allocate 8 GB CPU-RAM for pinned
memory
}
#endif // GPU
......while(n){ //初始化每一層的參數(shù)。
params.index = count;
fprintf(stderr, "%4d ", count);
s =
(section *)n->val;
options =
s->options;
layer l = {
(LAYER_TYPE)0 };
LAYER_TYPE
lt = string_to_layer_type(s->type);
if(lt ==
CONVOLUTIONAL){ // 卷積層,調(diào)用parse_convolutional()函數(shù)執(zhí)行make_convolutional_layer()創(chuàng)建卷積層。
l =
parse_convolutional(options, params);
}else if(lt
== LOCAL){
l =
parse_local(options, params);
}else if(lt
== ACTIVE){
l =
parse_activation(options, params);
}else if(lt
== RNN){
l =
parse_rnn(options, params);
}else if(lt
== GRU){
l =
parse_gru(options, params);
}else if(lt
== LSTM){
l =
parse_lstm(options, params);
}else if
(lt == CONV_LSTM) {
l =
parse_conv_lstm(options, params);
}else if(lt
== CRNN){
l =
parse_crnn(options, params);
}else if(lt
== CONNECTED){
l =
parse_connected(options, params);
}else if(lt
== CROP){
l = parse_crop(options,
params);
}else if(lt
== COST){
l =
parse_cost(options, params);
l.keep_delta_gpu = 1;
}else if(lt
== REGION){
l =
parse_region(options, params);
l.keep_delta_gpu = 1;
}else if (lt == YOLO) { // yolov3/4引入的yolo_layer,調(diào)用parse_yolo()函數(shù)執(zhí)行make_yolo_layer()創(chuàng)建yolo層。l =
parse_yolo(options, params);
l.keep_delta_gpu = 1;
}else if
(lt == GAUSSIAN_YOLO) {
l =
parse_gaussian_yolo(options, params);
l.keep_delta_gpu = 1;
}else if(lt
== DETECTION){
l =
parse_detection(options, params);
}else if(lt
== SOFTMAX){
l =
parse_softmax(options, params);
net.hierarchy = l.softmax_tree;
l.keep_delta_gpu = 1;
}else if(lt
== NORMALIZATION){
l =
parse_normalization(options, params);
}else if(lt
== BATCHNORM){
l =
parse_batchnorm(options, params);
}else if(lt
== MAXPOOL){
l =
parse_maxpool(options, params);
}else if
(lt == LOCAL_AVGPOOL) {
l =
parse_local_avgpool(options, params);
}else if(lt
== REORG){
l =
parse_reorg(options, params); }
else if (lt
== REORG_OLD) {
l =
parse_reorg_old(options, params);
}else if(lt
== AVGPOOL){
l =
parse_avgpool(options, params);
}else if(lt
== ROUTE){
l =
parse_route(options, params);
int k;for (k
= 0; k < l.n; ++k) {
net.layers[l.input_layers[k]].use_bin_output = 0;
net.layers[l.input_layers[k]].keep_delta_gpu = 1;
}}else if
(lt == UPSAMPLE) {
l =
parse_upsample(options, params, net);
}else if(lt
== SHORTCUT){
l =
parse_shortcut(options, params, net);
net.layers[count - 1].use_bin_output = 0;
net.layers[l.index].use_bin_output = 0;
net.layers[l.index].keep_delta_gpu = 1;
}else if
(lt == SCALE_CHANNELS) {
l =
parse_scale_channels(options, params, net);
net.layers[count - 1].use_bin_output = 0;
net.layers[l.index].use_bin_output = 0;
net.layers[l.index].keep_delta_gpu = 1;
}else if (lt
== SAM) {
l =
parse_sam(options, params, net);
net.layers[count - 1].use_bin_output = 0;
net.layers[l.index].use_bin_output = 0;
net.layers[l.index].keep_delta_gpu = 1;}else if(lt
== DROPOUT){
l =
parse_dropout(options, params);
l.output = net.layers[count-1].output;
l.delta
= net.layers[count-1].delta;
#ifdef GPU
l.output_gpu
= net.layers[count-1].output_gpu;
l.delta_gpu = net.layers[count-1].delta_gpu;
l.keep_delta_gpu = 1;
#endif
}else if (lt
== EMPTY) {
layer
empty_layer = {(LAYER_TYPE)0};
empty_layer.out_w = params.w;
empty_layer.out_h = params.h;
empty_layer.out_c = params.c;
l =
empty_layer;
l.output = net.layers[count - 1].output;
l.delta
= net.layers[count - 1].delta;
#ifdef GPU
l.output_gpu = net.layers[count - 1].output_gpu;
l.delta_gpu = net.layers[count - 1].delta_gpu;
#endif
}else{
fprintf(stderr, “Type not recognized: %s\n”, s->type);
}......net.layers[count]
= l; // 每個(gè)解析函數(shù)返回一個(gè)填充好的層l,將這些層全部添加到network結(jié)構(gòu)體的layers數(shù)組中。
if
(l.workspace_size > workspace_size) workspace_size = l.workspace_size; //
workspace_size表示網(wǎng)絡(luò)的工作空間,指的是所有層中占用運(yùn)算空間最大的那個(gè)層的,因?yàn)閷?shí)際上在GPU或CPU中某個(gè)時(shí)刻只有一個(gè)層在做前向或反向運(yùn)算。
if (l.inputs
max_inputs) max_inputs = l.inputs;
if
(l.outputs > max_outputs) max_outputs = l.outputs;
free_section(s);
n =
n->next; // node節(jié)點(diǎn)前沿,empty則while-loop結(jié)束。
++count;if(n){ // 這部分將連接的兩個(gè)層之間的輸入輸出shape統(tǒng)一。if (l.antialiasing)
{
params.h = l.input_layer->out_h;
params.w = l.input_layer->out_w;
params.c = l.input_layer->out_c;
params.inputs = l.input_layer->outputs;
}else {
params.h = l.out_h;
params.w = l.out_w;
params.c = l.out_c;
params.inputs = l.outputs;
}}if
(l.bflops > 0) bflops += l.bflops;
if (l.w
1 && l.h > 1) {
avg_outputs += l.outputs;
avg_counter++;
}}
free_list(sections);
......return net; // 返回解析好的network類型的指針變量,這個(gè)指針變量會(huì)伴隨訓(xùn)練的整個(gè)過程。
}
以卷積層和yolo層為例,介紹網(wǎng)絡(luò)層的創(chuàng)建過程,convolutional_layer.c中make_convolutional_layer()函數(shù):
convolutional_layer make_convolutional_layer(int batch,
int steps, int h, int w, int c, int n, int groups, int size, int stride_x, int
stride_y, int dilation, int padding, ACTIVATION activation, int
batch_normalize, int binary, int xnor, int adam, int use_bin_output, int index,
int antialiasing, convolutional_layer *share_layer, int assisted_excitation,
int deform, int train)
{
int total_batch
= batch*steps;
int i;convolutional_layer
l = { (LAYER_TYPE)0 }; // convolutional_layer其實(shí)就是layer。
l.type =
CONVOLUTIONAL; // layer的類型,此處為卷積層。
l.train =
train;
/* 改變輸入和輸出的維度。 */if (xnor)
groups = 1; // disable groups for
XNOR-net
if (groups <
-
groups = 1; // group將對(duì)應(yīng)的輸入輸出通道對(duì)應(yīng)分組,默認(rèn)為1(輸出輸入的所有通道各為一組),把卷積group等于輸入通道,輸出通道等于輸入通道就實(shí)現(xiàn)了depthwize separable convolution結(jié)構(gòu)。
const int
blur_stride_x = stride_x;const int
blur_stride_y = stride_y;l.antialiasing
= antialiasing;if
(antialiasing) {stride_x =
stride_y = l.stride = l.stride_x = l.stride_y = 1; // use stride=1 in
host-layer
}l.deform =
deform;
l.assisted_excitation = assisted_excitation;
l.share_layer =
share_layer;
l.index =
index;
l.h = h; //
input的高。
l.w = w; //
input的寬。
l.c = c; //
input的通道。
l.groups =
groups;
l.n = n; // 卷積核filter的個(gè)數(shù)。l.binary =
binary;
l.xnor = xnor;
l.use_bin_output = use_bin_output;
l.batch =
batch; // 訓(xùn)練使用的batch_size。
l.steps =
steps;
l.stride =
stride_x; // 移動(dòng)步長(zhǎng)。
l.stride_x =
stride_x;
l.stride_y =
stride_y;
l.dilation =
dilation;
l.size = size;
// 卷積核的大小。
l.pad =
padding; // 邊界填充寬度。
l.batch_normalize = batch_normalize; // 是否進(jìn)行BN操作。
l.learning_rate_scale = 1;
/* 數(shù)組的大小: c/groups*n*size*size。 */l.nweights = (c
/ groups) * n * size * size; // groups默認(rèn)值為1,出現(xiàn)c的原因是對(duì)多個(gè)通道的廣播操作。
if
(l.share_layer) {
if (l.size
!= l.share_layer->size || l.nweights != l.share_layer->nweights || l.c !=
l.share_layer->c || l.n != l.share_layer->n) {
printf(" Layer size, nweights, channels or filters don’t match for
the share_layer");
getchar();
}l.weights =
l.share_layer->weights;
l.weight_updates = l.share_layer->weight_updates;
l.biases =
l.share_layer->biases;
l.bias_updates = l.share_layer->bias_updates;
}else {l.weights =
(float*)xcalloc(l.nweights, sizeof(float));
l.biases =
(float*)xcalloc(n, sizeof(float));
if (train)
{
l.weight_updates = (float*)xcalloc(l.nweights, sizeof(float));
l.bias_updates = (float*)xcalloc(n, sizeof(float));
}}// float scale
= 1./sqrt(sizesizec);
float scale =
sqrt(2./(sizesizec/groups)); // 初始值scale。
if
(l.activation == NORM_CHAN || l.activation == NORM_CHAN_SOFTMAX || l.activation
== NORM_CHAN_SOFTMAX_MAXVAL) {
for (i = 0;
i < l.nweights; ++i) l.weights[i] = 1;
// rand_normal();
}else {for (i = 0;
i < l.nweights; ++i) l.weights[i] = scale*rand_uniform(-1, 1); // rand_normal();
}/* 根據(jù)公式計(jì)算輸出維度。 */int out_h =
convolutional_out_height(l);
int out_w =
convolutional_out_width(l);
l.out_h =
out_h; // output的高。
l.out_w =
out_w; // output的寬。
l.out_c = n; //
output的通道,等于卷積核個(gè)數(shù)。
l.outputs =
l.out_h * l.out_w * l.out_c; // 一個(gè)batch的output維度大小。
l.inputs = l.w
-
l.h * l.c; // 一個(gè)batch的input維度大小。
l.activation =
activation;l.output =
(float*)xcalloc(total_batch*l.outputs, sizeof(float)); // 輸出數(shù)組。
#ifndef GPU
if (train)
l.delta = (float*)xcalloc(total_batch*l.outputs, sizeof(float)); // 暫存更新數(shù)據(jù)的輸出數(shù)組。
#endif // not GPU
/* 三個(gè)重要的函數(shù),前向運(yùn)算,反向傳播和更新函數(shù)。 */l.forward = forward_convolutional_layer;l.backward =
backward_convolutional_layer;
l.update =
update_convolutional_layer; // 明確了更新的策略。
if(binary){
l.binary_weights = (float*)xcalloc(l.nweights, sizeof(float));
l.cweights
= (char*)xcalloc(l.nweights, sizeof(char));
l.scales =
(float*)xcalloc(n, sizeof(float));
}if(xnor){
l.binary_weights = (float*)xcalloc(l.nweights, sizeof(float));
l.binary_input = (float*)xcalloc(l.inputs * l.batch, sizeof(float));
int align =
32;// 8;
int
src_align = l.out_h*l.out_w;
l.bit_align
= src_align + (align - src_align % align);
l.mean_arr
= (float*)xcalloc(l.n, sizeof(float));
const
size_t new_c = l.c / 32;
size_t
in_re_packed_input_size = new_c * l.w * l.h + 1;
l.bin_re_packed_input = (uint32_t*)xcalloc(in_re_packed_input_size,
sizeof(uint32_t));
l.lda_align
= 256; // AVX2
int k =
l.sizel.sizel.c;
size_t
k_aligned = k + (l.lda_align - k%l.lda_align);
size_t
t_bit_input_size = k_aligned * l.bit_align / 8;
l.t_bit_input = (char*)xcalloc(t_bit_input_size, sizeof(char));
}/* Batch
Normalization相關(guān)的變量設(shè)置。 */
if(batch_normalize){
if (l.share_layer)
{
l.scales = l.share_layer->scales;
l.scale_updates = l.share_layer->scale_updates;
l.mean
= l.share_layer->mean;
l.variance = l.share_layer->variance;
l.mean_delta = l.share_layer->mean_delta;
l.variance_delta = l.share_layer->variance_delta;
l.rolling_mean = l.share_layer->rolling_mean;
l.rolling_variance = l.share_layer->rolling_variance;
}else {
l.scales = (float*)xcalloc(n, sizeof(float));
for (i
= 0; i < n; ++i) {
l.scales[i] = 1;
}if
(train) {
l.scale_updates = (float*)xcalloc(n, sizeof(float));
l.mean = (float*)xcalloc(n, sizeof(float));
l.variance = (float*)xcalloc(n, sizeof(float));
l.mean_delta = (float*)xcalloc(n, sizeof(float));
l.variance_delta = (float*)xcalloc(n, sizeof(float));
}
l.rolling_mean = (float*)xcalloc(n, sizeof(float));
l.rolling_variance = (float*)xcalloc(n, sizeof(float));
}......return l;
}
yolo_layer.c中make_yolo_layer()函數(shù):
layer make_yolo_layer(int batch, int w, int h, int n, int
total, int *mask, int classes, int max_boxes)
{
int i;layer l = {
(LAYER_TYPE)0 };
l.type = YOLO;
// 層類別。
l.n = n; // 一個(gè)cell能預(yù)測(cè)多少個(gè)b-box。l.total =
total; // anchors數(shù)目,9。
l.batch =
batch; // 一個(gè)batch包含的圖像張數(shù)。
l.h = h; //
input的高。
l.w = w; //
imput的寬。
l.c =
n*(classes + 4 + 1);
l.out_w = l.w;
// output的高。
l.out_h = l.h;
// output的寬。
l.out_c = l.c;
// output的通道,等于卷積核個(gè)數(shù)。
l.classes =
classes; // 目標(biāo)類別數(shù)。
l.cost =
(float*)xcalloc(1, sizeof(float)); // yolo層總的損失。
l.biases =
(float*)xcalloc(total * 2, sizeof(float)); // 儲(chǔ)存b-box的anchor box的[w,h]。
if(mask) l.mask
= mask; // 有mask傳入。
else{l.mask =
(int*)xcalloc(n, sizeof(int));
for(i = 0;
i < n; ++i){
l.mask[i] = i;
}}l.bias_updates
= (float*)xcalloc(n * 2, sizeof(float)); // 儲(chǔ)存b-box的anchor box的[w,h]的更新值。
l.outputs =
hwn*(classes + 4 + 1); // 一張訓(xùn)練圖片經(jīng)過yolo層后得到的輸出元素個(gè)數(shù)(Grid數(shù)每個(gè)Grid預(yù)測(cè)的矩形框數(shù)每個(gè)矩形框的參數(shù)個(gè)數(shù))
l.inputs =
l.outputs; // 一張訓(xùn)練圖片輸入到y(tǒng)olo層的元素個(gè)數(shù)(對(duì)于yolo_layer,輸入和輸出的元素個(gè)數(shù)相等)
l.max_boxes =
max_boxes; // 一張圖片最多有max_boxes個(gè)ground truth矩形框,這個(gè)數(shù)量時(shí)固定寫死的。
l.truths =
l.max_boxes*(4 + 1); // 4個(gè)定位參數(shù)+1個(gè)物體類別,大于GT實(shí)際參數(shù)數(shù)量。
l.delta =
(float*)xcalloc(batch * l.outputs, sizeof(float)); // yolo層誤差項(xiàng),包含整個(gè)batch的。
l.output =
(float*)xcalloc(batch * l.outputs, sizeof(float)); // yolo層所有輸出,包含整個(gè)batch的。
/* 存儲(chǔ)b-box的Anchor box的[w,h]的初始化,在parse.c中parse_yolo函數(shù)會(huì)加載cfg中Anchor尺寸。*/for(i = 0; i
< total*2; ++i){
l.biases[i]
= .5;
}/* 前向運(yùn)算,反向傳播函數(shù)。*/l.forward =
forward_yolo_layer;
l.backward =
backward_yolo_layer;
#ifdef GPU
l.forward_gpu =
forward_yolo_layer_gpu;
l.backward_gpu
= backward_yolo_layer_gpu;
l.output_gpu =
cuda_make_array(l.output, batch*l.outputs);
l.output_avg_gpu = cuda_make_array(l.output, batch*l.outputs);
l.delta_gpu =
cuda_make_array(l.delta, batch*l.outputs);
free(l.output);if (cudaSuccess
== cudaHostAlloc(&l.output, batchl.outputssizeof(float),
cudaHostRegisterMapped)) l.output_pinned = 1;
else {
cudaGetLastError(); // reset CUDA-error
l.output =
(float*)xcalloc(batch * l.outputs, sizeof(float));
}free(l.delta);if (cudaSuccess
== cudaHostAlloc(&l.delta, batchl.outputssizeof(float),
cudaHostRegisterMapped)) l.delta_pinned = 1;
else {
cudaGetLastError(); // reset CUDA-error
l.delta =
(float*)xcalloc(batch * l.outputs, sizeof(float));
}
#endif
fprintf(stderr,
“yolo\n”);
srand(time(0));return l;
}
這里要強(qiáng)調(diào)下"darknet/src/list.h"中定義的數(shù)據(jù)結(jié)構(gòu)list:
typedef struct node{
void *val;struct node
*next;
struct node
*prev;
} node;
typedef struct list{
int size; //
list的所有節(jié)點(diǎn)個(gè)數(shù)。
node *front; //
list的首節(jié)點(diǎn)。
node *back; //
list的普通節(jié)點(diǎn)。
} list; // list類型變量保存所有的網(wǎng)絡(luò)參數(shù),有很多的sections節(jié)點(diǎn),每個(gè)section中又有一個(gè)保存層參數(shù)的小list。
以及"darknet/src/parser.c"中定義的數(shù)據(jù)結(jié)構(gòu)section:
typedef struct{
char *type; //
section的類型,保存的是網(wǎng)絡(luò)中每一層的網(wǎng)絡(luò)類型和參數(shù)。在.cfg配置文件中, 以‘[’開頭的行被稱為一個(gè)section(段)。
list *options;
// section的參數(shù)信息。
}section;
“darknet/src/parser.c”–read_cfg()函數(shù)的作用就是讀取.cfg配置文件并返回給list類型變量sections:
/* 讀取神經(jīng)網(wǎng)絡(luò)結(jié)構(gòu)配置文件.cfg文件中的配置數(shù)據(jù),將每個(gè)神經(jīng)網(wǎng)絡(luò)層參數(shù)讀取到每個(gè)section結(jié)構(gòu)體(每個(gè)section是sections的一個(gè)節(jié)點(diǎn))中,而后全部插入到list結(jié)構(gòu)體sections中并返回。*/
/* param: filename是C風(fēng)格字符數(shù)組,神經(jīng)網(wǎng)絡(luò)結(jié)構(gòu)配置文件路徑。*/
/* return: list結(jié)構(gòu)體指針,包含從神經(jīng)網(wǎng)絡(luò)結(jié)構(gòu)配置文件中讀入的所有神經(jīng)網(wǎng)絡(luò)層的參數(shù)。*/
list *read_cfg(char *filename)
{
FILE *file =
fopen(filename, “r”);
if(file == 0)
file_error(filename);
/* 一個(gè)section表示配置文件中的一個(gè)字段,也就是網(wǎng)絡(luò)結(jié)構(gòu)中的一層,因此,一個(gè)section將讀取并存儲(chǔ)某一層的參數(shù)以及該層的type。 */char *line;int nu = 0; // 當(dāng)前讀取行記號(hào)。list *sections
= make_list(); // sections包含所有的神經(jīng)網(wǎng)絡(luò)層參數(shù)。
section
*current = 0; // 當(dāng)前讀取到的某一層。
while((line=fgetl(file)) != 0){
++ nu;
strip(line); // 去除讀入行中含有的空格符。
switch(line[0]){
/* 以'['開頭的行是一個(gè)新的section,其內(nèi)容是層的type,比如[net],[maxpool],[convolutional]... */case
‘[’:
current = (section*)xmalloc(sizeof(section)); // 讀到了一個(gè)新的section:current。
list_insert(sections, current); // list.c中,list_insert函數(shù)入口,將該新的section保存起來。
current->options = make_list();
current->type = line;
break;
case
‘\0’: // 空行。
case
‘#’: // 注釋。
case
‘;’: // 空行。
free(line); // 對(duì)于上述三種情況直接釋放內(nèi)存即可。
break;
/* 剩下的才真正是網(wǎng)絡(luò)結(jié)構(gòu)的數(shù)據(jù),調(diào)用read_option()函數(shù)讀取,返回0說明文件中的數(shù)據(jù)格式有問題,將會(huì)提示錯(cuò)誤。 */
default:
if(!read_option(line, current->options)){ // 將讀取到的參數(shù)保存在current變量的options中,這里保存在options節(jié)點(diǎn)中的數(shù)據(jù)為kvp鍵值對(duì)類型。
fprintf(stderr, “Config file error line %d, could parse:
%s\n”, nu, line);
free(line);}
break;
}}fclose(file);return
sections;
}
綜上,解析過程將鏈表中的網(wǎng)絡(luò)參數(shù)保存到network結(jié)構(gòu)體,用于后續(xù)權(quán)重更新。
總結(jié)
以上是生活随笔為你收集整理的Yolov4性能分析(上)的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 基于自动驾驶车辆的NVIDIA-Tens
- 下一篇: Yolov4性能分析(下)