當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

10万元奖金语音识别赛进行中！CTC 模型 Baseline 助你轻松上分

發(fā)布時間：2024/10/8 编程问答 48 豆豆

生活随笔收集整理的這篇文章主要介紹了 10万元奖金语音识别赛进行中！CTC 模型 Baseline 助你轻松上分小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

隨著互聯(lián)網(wǎng)、智能硬件的普及，智能音箱和語音助手已經(jīng)深入人們的日常生活，家居場景下的語音識別技術已成為企業(yè)和研究機構(gòu)競相追逐的關鍵技術。

目前，由北京智源人工智能研究院、愛數(shù)智慧、biendata?共同發(fā)布的“智源 MagicSpeechNet 家庭場景語音數(shù)據(jù)集挑戰(zhàn)賽” （2019 年 12 月 — 2020年 3 月）正在火熱進行中，總獎金為 10 萬元。參賽者需要使用比賽提供的真實家庭環(huán)境中的雙人對話音頻和文本數(shù)據(jù)，訓練并優(yōu)化語音識別（ASR）模型。比賽和數(shù)據(jù)復制下方鏈接查看，或點擊“閱讀原文”，歡迎所有感興趣的讀者參賽。?

為便于選手熟悉和上手賽題，biendata 邀請 zc 選手（主要研究方向機器學習，語音翻譯，語音識別）從數(shù)據(jù)處理、模型選擇、提升方向等方面進行深入分析，希望可以拋磚引玉，為陷入瓶頸的選手提供靈感和思路，共同探索 ASR 實際應用場景中可行的解決方案。

比賽地址：

https://www.biendata.com/competition/magicdata/?

Baseline地址：

https://biendata.com/models/category/4264/L_notebook/

Baseline 概述

zc 選手所采用模型結(jié)構(gòu)基于語音識別中常見的 CTC 算法（Connectionist temporal classification），其中使用 CNN 對 mel-spectogram 進行特征抽取進行特征抽取，使用 RNN（本文選擇的是 BiLSTM，如果是 Streaming ASR，則考慮 GRU 或者 LC-LSTM 等）+ DNN 對序列每個 units 所對應的 label進行預測 label 進行預測，使用 CTC Loss 進行模型優(yōu)化。該模型預估得分為 0.45。zc 認為使用 Data Augmentation、LM Fusion、Larger Model 等策略有助于進一步提示模型性能。?

注：本 baseline 數(shù)據(jù)處理及模型訓練部分參考《自動化所博士生田正坤分享端到端 Baseline》一文。?

https://www.biendata.com/models/category/4162/L_notebook/?

Baseline 詳情

1. 賽題簡介?

本次比賽的任務為日常家庭環(huán)境中的對話語音識別。所使用數(shù)據(jù)集為智源 MagicSpeechNet 家庭場景中文語音數(shù)據(jù)集，其中的語言材料來自數(shù)十段真實環(huán)境中的雙人對話。每段對話基于多種平臺進行錄制，并已完全轉(zhuǎn)錄和標注。參賽者需要使用比賽提供的數(shù)據(jù)訓練并優(yōu)化模型，提升模型在家庭環(huán)境的對話語音識別效果。?

2. 數(shù)據(jù)處理

2.1 音頻處理?

切分音頻，整理 metadata。

import?os import?jsondata_rootdir?=?'./Magicdata'??#?指定解壓后數(shù)據(jù)的根目錄audiodir?=?os.path.join(data_rootdir,?'audio') trans_dir?=?os.path.join(data_rootdir,?'transcription')#?音頻切分 def?segment_wav(src_wav,?tgt_wav,?start_time,?end_time):span?=?end_time?-?start_timecmds?=?'sox?%s?%s?trim?%f?%f'?%?(src_wav,?tgt_wav,?start_time,?span)os.system(cmds)#?將時間格式轉(zhuǎn)化為秒為單位 def?time2sec(t):h,m,s?=?t.strip().split(":")return?float(h)?*?3600?+?float(m)?*?60?+?float(s)#?讀取json文件內(nèi)容 def?load_json(json_file):with?open(json_file,?'r',?encoding="utf-8")?as?f:lines?=?f.readlines()json_str?=?''.join(lines).replace('\n',?'').replace('?',?'').replace(',}',?'}')return?json.loads(json_str)#?訓練集和開發(fā)集數(shù)據(jù)處理 for?name?in?['train',?'dev']:???save_dir?=?os.path.join('./data',?name,?'wav')if?not?os.path.exists(save_dir):os.makedirs(save_dir)seg_wav_list?=?[]sub_audio_dir?=?os.path.join(audiodir,?name)for?wav?in?os.listdir(sub_audio_dir):if?wav[0]?==?'.':continue?#?跳過隱藏文件if?name?==?'dev':parts?=?wav.split('_')jf?=?'_'.join(parts[:-1])+'.json'suffix?=?parts[-1]else:jf?=?wav[:-4]+'.json'utt_list?=?load_json(os.path.join(trans_dir,?name,?jf))for?i?in?range(len(utt_list)):utt_info?=?utt_list[i]session_id?=?utt_info['session_id']if?name?==?'dev':tgt_id?=?session_id?+?'_'?+?str(i)?+?'_'?+?suffixelse:tgt_id?=?session_id?+?'_'?+?str(i)?+?'.wav'#?句子切分start_time?=?time2sec(utt_info['start_time']['original'])end_time?=?time2sec(utt_info['end_time']['original'])src_wav?=?os.path.join(sub_audio_dir,?wav)tgt_wav?=?os.path.join(save_dir,?tgt_id)segment_wav(src_wav,?tgt_wav,?start_time,?end_time)seg_wav_list.append((tgt_id,?tgt_wav,?utt_info['words']))with?open(os.path.join('./data',?name,?'wav.scp'),?'w')?as?ww:with?open(os.path.join('./data',?name,?'transcrpts.txt'),?'w',?encoding='utf-8')?as?tw:for?uttid,?wavdir,?text?in?seg_wav_list:ww.write(uttid+'?'+wavdir+'\n')tw.write(uttid+'?'+text+'\n')print('prepare?%s?dataset?done!'?%?name)#?測試集數(shù)據(jù)處理 save_dir?=?os.path.join('./data',?'test',?'wav') if?not?os.path.exists(save_dir):os.makedirs(save_dir)seg_wav_list?=?[] sub_audio_dir?=?os.path.join(audiodir,?'test') for?wav?in?os.listdir(sub_audio_dir):if?wav[0]?==?'.'?or?'IOS'?not?in?wav:continue?#?跳過隱藏文件和非IOS的音頻文件jf?=?'_'.join(wav.split('_')[:-1])+'.json'utt_list?=?load_json(os.path.join(trans_dir,?'test_no_ref_noise',?jf))for?i?in?range(len(utt_list)):utt_info?=?utt_list[i]session_id?=?utt_info['session_id']uttid?=?utt_info['uttid']if?'words'?in?utt_info:?continue?#?如果句子已經(jīng)標注，則跳過#?句子切分start_time?=?time2sec(utt_info['start_time'])end_time?=?time2sec(utt_info['end_time'])tgt_id?=?uttid?+?'.wav'src_wav?=?os.path.join(sub_audio_dir,?wav)tgt_wav?=?os.path.join(save_dir,?tgt_id)segment_wav(src_wav,?tgt_wav,?start_time,?end_time)seg_wav_list.append((uttid,?tgt_wav))with?open(os.path.join('./data',?'test',?'wav.scp'),?'w')?as?ww:for?uttid,?wavdir?in?seg_wav_list:ww.write(uttid+'?'+wavdir+'\n')print('prepare?test?dataset?done!')

2.2 文本處理?

對文本數(shù)據(jù)進行歸一化處理，其中包括大寫字母都轉(zhuǎn)化為小寫字母，過濾掉標點符號和無意義的句子。

3. 系統(tǒng)構(gòu)建?

3.1 實驗環(huán)境?

實驗在 Linux 系統(tǒng)上進行，要求具備以下軟件和硬件環(huán)境。?

至少具備一個 GPU?
python >= 3.6?
pytorch >= 1.2.0
torchaudio >= 0.3.0

import?torch from?torch?import?nn import?torch.nn.functional?as?F import?math

3.2 數(shù)據(jù)處理與加載?

3.2.1 詞表生成?

根據(jù)訓練集文本生成詞表，并加入起始標記<BOS>，結(jié)束標記<EOS>，填充標記<PAD>，以及未識別詞標記<UNK>。

import?osvocab_dict?=?{}for?name?in?['train',?'dev']:with?open(os.path.join('./data',?name,?'text'),?'r',?encoding='utf-8')?as?fr:for?line?in?fr:chars?=?line.strip().split()[1:]for?c?in?chars:if?c?in?vocab_dict:vocab_dict[c]?+=?1else:vocab_dict[c]?=?1vocab_list?=?sorted(vocab_dict.items(),?key=lambda?x:?x[1],?reverse=True) vocab?=?{'<PAD>':?0,?'<BOS>':?1,?'<EOS>':?2,?'<UNK>':?3} for?i?in?range(len(vocab_list)):c?=?vocab_list[i][0]vocab[c]?=?i?+?4print('There?are?%d?units?in?Vocabulary!'?%?len(vocab)) with?open(os.path.join('./data',?'vocab'),?'w',?encoding='utf-8')?as?fw:for?c,?id?in?vocab.items():fw.write(c+'?'+?str(id)?+'\n')

3.2.2 構(gòu)建特征提取與加載模塊?

source 端：提取語音數(shù)據(jù)的 MFCC 特征作為輸入
target 端：對文本數(shù)據(jù)用抽取的 vocab 進行編碼

import?os import?torch import?numpy?as?np import?torchaudio?as?ta from?torch.utils.data?import?Dataset,?DataLoaderPAD?=?0 BOS?=?1 EOS?=?2 UNK?=?3class?AudioDataset(Dataset):def?__init__(self,?wav_list,?text_list=None,?unit2idx=None,?num_mel_bins=80):self.num_mel_bins=num_mel_binsself.unit2idx?=?unit2idxself.file_list?=?[]for?wavscpfile?in?wav_list:with?open(wavscpfile,?'r',?encoding='utf-8')?as?wr:for?line?in?wr:uttid,?path?=?line.strip().split()self.file_list.append([uttid,?path])if?text_list?is?not?None:self.targets_dict?=?{}for?textfile?in?text_list:with?open(textfile,?'r',?encoding='utf-8')?as?tr:for?line?in?tr:parts?=?line.strip().split()uttid?=?parts[0]label?=?[]for?c?in?parts[1:]:label.append(self.unit2idx[c]?if?c?in?self.unit2idx?else?self.unit2idx['<UNK>'])self.targets_dict[uttid]?=?labelself.file_list?=?self.filter(self.file_list)?#?過濾掉沒有標注的句子assert?len(self.file_list)?==?len(self.targets_dict)else:self.targets_dict?=?Noneself.lengths?=?len(self.file_list)def?__getitem__(self,?index):uttid,?path?=?self.file_list[index]wavform,?_?=?ta.load_wav(path)?#?加載wav文件feature?=?ta.compliance.kaldi.fbank(wavform,?num_mel_bins=self.num_mel_bins)?#?計算fbank特征#?特征歸一化mean?=?torch.mean(feature)std?=?torch.std(feature)feature?=??(feature?-?mean)?/?stdif?self.targets_dict?is?not?None:targets?=?self.targets_dict[uttid]return?uttid,?feature,?targetselse:return?uttid,?featuredef?filter(self,?feat_list):new_list?=?[]for?(uttid,?path)?in?feat_list:if?uttid?not?in?self.targets_dict:?continuenew_list.append([uttid,?path])return?new_listdef?__len__(self):return?self.lengths@propertydef?idx2char(self):return?{i:?c?for?(c,?i)?in?self.unit2idx.items()}#?收集函數(shù)，將同一個批內(nèi)的特征填充到同樣的長度，并在文本中加上起始標記和結(jié)束標記 def?collate_fn(batch):uttids?=?[data[0]?for?data?in?batch]features_length?=?[data[1].shape[0]?for?data?in?batch]max_feat_length?=?max(features_length)padded_features?=?[]if?len(batch[0])?==?3:targets_length?=?[len(data[2])?for?data?in?batch]max_text_length?=?max(targets_length)padded_targets?=?[]for?parts?in?batch:feat?=?parts[1]feat_len?=?feat.shape[0]padded_features.append(np.pad(feat,?((0,?max_feat_length-feat_len),?(0,?0)),?mode='constant',?constant_values=0.0))if?len(batch[0])?==?3:target?=?parts[2]text_len?=?len(target)padded_targets.append([BOS]?+?target?+?[EOS]?+?[PAD]?*?(max_text_length?-?text_len))if?len(batch[0])?==?3:return?uttids,?torch.FloatTensor(padded_features),?torch.LongTensor(padded_targets)else:return?uttids,?torch.FloatTensor(padded_features)

3.3 模型結(jié)構(gòu)?

本文所用結(jié)構(gòu)基于語音識別中常見的 CTC 算法（Connectionist temporal classification）。?

使用 CNN 對 mel-spectogram 進行特征抽取；
使用 RNN（本文選擇的是 BiLSTM，如果是 Streaming ASR，則考慮 GRU 或者 LC-LSTM 等）+ DNN 對序列每個 units 所對應的 label 進行預測；
使用 CTC Loss 進行模型優(yōu)化。

class?BatchRNN(nn.Module):def?__init__(self,?input_size,?hidden_size,?rnn_type=nn.LSTM,bidirectional=False,?batch_norm=True,?dropout?=?0.1):super(BatchRNN,?self).__init__()self.input_size?=?input_sizeself.hidden_size?=?hidden_sizeself.bidirectional?=?bidirectionalself.batch_norm?=?SequenceWise(nn.BatchNorm1d(input_size))?if?batch_norm?else?Noneself.rnn?=?rnn_type(input_size=input_size,?hidden_size=hidden_size,bidirectional=bidirectional,?dropout?=?dropout,?bias=False)def?forward(self,?x):if?self.batch_norm?is?not?None:x?=?self.batch_norm(x)x,?_?=?self.rnn(x)self.rnn.flatten_parameters()return?xclass?SequenceWise(nn.Module):def?__init__(self,?module):super(SequenceWise,?self).__init__()self.module?=?moduledef?forward(self,?x):try:x,?batch_size_len?=?x.data,?x.batch_sizes#?x.data?sum(x_len)?*?num_featuresx?=?self.module(x)x?=?nn.utils.rnn.PackedSequence(x,?batch_size_len)except:t,?n?=?x.size(0),?x.size(1)x?=?x.view(t*n,?-1)#?x?sum(x_len)?*?num_featuresx?=?self.module(x)x?=?x.view(t,?n,?-1)return?xclass?InferenceBatchLogSoftmax(nn.Module):def?forward(self,?x):#?[seq_len,?bs,?num_class]if?not?self.training:seq_len?=?x.size()[0]return?torch.stack([F.log_softmax(x[i])?for?i?in?range(seq_len)],?0)else:x?=?F.log_softmax(x)return?x class?CTC_Model(nn.Module):def?__init__(self,?rnn_input_size=80,?rnn_hidden_size=256,?rnn_layers=4,rnn_type=nn.LSTM,?bidirectional=True,batch_norm=True,?num_class=3864,?drop_out=0.1):super(CTC_Model,?self).__init__()self.rnn_input_size?=?rnn_input_sizeself.rnn_hidden_size?=?rnn_hidden_sizeself.rnn_layers?=?rnn_layersself.rnn_type?=?rnn_typeself.num_class?=?num_classself.num_directions?=?2?if?bidirectional?else?1self._drop_out?=?drop_outself.name?=?'CTC_Model'self.conv?=?nn.Sequential(??#?抽取features時的壓縮尺度，可以調(diào)整nn.Conv2d(1,?16,?kernel_size=(11,?3),?stride=(2,?2)),nn.BatchNorm2d(16),nn.ReLU(),nn.Conv2d(16,?16,?kernel_size=(3,?3),?stride=(2,?2)),nn.BatchNorm2d(16),nn.ReLU(),)rnn_input_size?=?int(math.floor(rnn_input_size-3)/2+1)rnn_input_size?=?int(math.floor(rnn_input_size-3)/2+1)rnn_input_size?*=?16rnns?=?[]rnn?=?BatchRNN(input_size=rnn_input_size,?hidden_size=rnn_hidden_size,rnn_type=rnn_type,?bidirectional=bidirectional,batch_norm=False)rnns.append(('0',?rnn))for?i?in?range(rnn_layers-1):rnn?=?BatchRNN(input_size=self.num_directions*rnn_hidden_size,hidden_size=rnn_hidden_size,?rnn_type=rnn_type,bidirectional=bidirectional,?dropout?=?drop_out,?batch_norm?=?batch_norm)rnns.append(('%d'?%?(i+1),?rnn))self.rnns?=?nn.Sequential(OrderedDict(rnns))if?batch_norm?:fc?=?nn.Sequential(nn.BatchNorm1d(self.num_directions*rnn_hidden_size),nn.Linear(self.num_directions*rnn_hidden_size,?num_class+1,?bias=False))else:fc?=?nn.Linear(self.num_directions*rnn_hidden_size,?num_class+1,?bias=False)self.fc?=?SequenceWise(fc)self.inference_log_softmax?=?InferenceBatchLogSoftmax()def?forward(self,?x):x?=?torch.unsqueeze(x,?dim=1)??#?x:?[bs,?1,?seq_len,?80]x?=?self.conv(x)??#?[bs,?16,?seq_len/4,?19]x?=?x.transpose(2,?3).contiguous()sizes?=?x.size()x?=?x.view(sizes[0],?sizes[1]*sizes[2],?sizes[3])??#?[bs,?304，?seq_len/4]x?=?x.transpose(1,?2).transpose(0,?1).contiguous()?#?[seq_len/4,?bs,?16*19]x?=?self.rnns(x)??#?[seq_len/4,?16,?512]x?=?self.fc(x)??#?[seq_len/4,?16,?num_class]x?=?self.inference_log_softmax(x)??#?[seq_len/4,?16,?num_class]return?x

3.4 訓練過程與模型保存

#?獲取input的seqlen def?get_input_len(inputs):#?inputs?[bs,?seq_len,?80]x?=?torch.sum(inputs.abs(),?dim=2)??#?[bs,?seq_len]x?=?x.ne(torch.zeros(inputs.shape[:-1],?dtype=torch.int64))x?=?torch.sum(x,?dim=1)x?=?((x?-?11)?/?2?+?1?-?3)?/?2?+?1return?x#?獲取target的seqlen def?get_target_len(targets):#?targets?[bs,?text_len]x?=?targets.ne(torch.zeros(targets.shape,?dtype=torch.int64))x?=?torch.sum(x,?dim=1)return?xtotal_epochs?=?60?#?模型迭代次數(shù) batch_size?=?16?#?指定批大小 warmup_steps?=?12000?#?熱身步數(shù) lr_factor?=?1.0?#?學習率因子 accu_grads_steps?=?8?#?梯度累計步數(shù)num_mel_bins?=?80 rnn_hidden_size?=?256 rnn_layers?=?4 rnn_type?=?nn.LSTM bidirectional?=?True batch_norm?=?True drop_out?=?0.1#?加載詞表 unit2idx?=?{} with?open('./data/vocab',?'r',?encoding='utf-8')?as?fr:for?line?in?fr:unit,?idx?=?line.strip().split()unit2idx[unit]?=?int(idx) vocab_size?=?len(unit2idx)?#?輸出詞表大小 print("[info]?vocab?size?is",?vocab_size)#?模型定義 model?=?CTC_Model(rnn_input_size=num_mel_bins,?rnn_hidden_size=rnn_hidden_size,?rnn_layers=rnn_layers,rnn_type=rnn_type,?bidirectional=bidirectional,?batch_norm=batch_norm,num_class=vocab_size,?drop_out=drop_out)if?torch.cuda.is_available():model.cuda()?#?將模型加載到GPU中train_wav_list?=?['./data/train/wav.scp',?'./data/dev/wav.scp'] train_text_list?=?['./data/train/text',?'./data/dev/text'] dataset?=?AudioDataset(train_wav_list,?train_text_list,?unit2idx=unit2idx,?num_mel_bins=num_mel_bins) dataloader?=?torch.utils.data.DataLoader(dataset,?batch_size=batch_size,shuffle=True,?num_workers=2,?pin_memory=False,collate_fn=collate_fn)#?定義優(yōu)化器以及學習率更新函數(shù) def?get_learning_rate(step):return?lr_factor?*?rnn_hidden_size?**?(-0.5)?*?min(step?**?(-0.5),?step?*?warmup_steps?**?(-1.5))lr?=?get_learning_rate(step=1) optimizer?=?torch.optim.Adam(model.parameters(),?lr=lr,?betas=(0.9,?0.98),?eps=1e-9)if?not?os.path.exists('./model'):?os.makedirs('./model')global_step?=?1 step_loss?=?0 ctc_loss?=?nn.CTCLoss(blank=vocab_size-1,?reduction='mean')print('Begin?to?Train...') for?epoch?in?range(total_epochs):print('*****??epoch:?%d?*****'%?epoch)for?step,?(_,?inputs,?targets)?in?enumerate(dataloader):#?將輸入加載到GPU中if?torch.cuda.is_available():inputs?=?inputs.cuda()??#?[bs,?seq_len,?80]targets?=?targets.cuda()??#?[bs,?txt_len]input_sizes?=?get_input_len(inputs)target_sizes?=?get_target_len(targets)out?=?model(inputs)loss?=?ctc_loss(out,?targets,?input_sizes,?target_sizes)loss?/=?batch_sizeloss.backward()step_loss?+=?loss.item()if?(step+1)?%?accu_grads_steps?==?0:#?梯度裁剪grad_norm?=?torch.nn.utils.clip_grad_norm_(model.parameters(),?5.0)optimizer.zero_grad()optimizer.step()if?global_step?%?10?==?0:print('-Training-Epoch-%d,?Global?Step:%d,?lr:%.8f,?Loss:%.5f'?%?\(epoch,?global_step,?lr,?step_loss?/?accu_grads_steps))global_step?+=?1step_loss?=?0#?學習率更新lr?=?get_learning_rate(global_step)for?param_group?in?optimizer.param_groups:param_group['lr']?=?lr#?模型保存checkpoint?=?model.state_dict()torch.save(checkpoint,?os.path.join('./model',?'model.epoch.%d.pt'?%?epoch)) print('Done!')

4. 改進方法

Data Augmentation
LM Fusion
Larger Model

比賽時間

競賽分為初賽與復賽兩階段，初賽已于 2019 年 12 月 23 日開啟，biendata 平臺同步發(fā)布訓練集、開發(fā)集、測試集，并開放初賽提交。2020 年 3 月 20 日，初賽報名和組隊時間截止。每日提交存在次數(shù)限制，請感興趣的選手盡量選擇提前參賽，以獲得更多驗證提交次數(shù)和優(yōu)化模型的機會。

參賽方式

點擊閱讀原文鏈接或掃描下圖中的二維碼直達賽事頁面，注冊網(wǎng)站-下載數(shù)據(jù)，即可參賽。?

biendata 是知名的國際性大數(shù)據(jù)競賽平臺，面向全球在校學生、科研人員、企業(yè)以及自由職業(yè)者開放，期待對人工智能感興趣的小伙伴能在平臺上眾多比賽中大展身手，在思維與技術的交流碰撞中激發(fā)創(chuàng)新和突破。?

友情提示，因涉及到數(shù)據(jù)下載，強烈建議大家登錄 PC 頁面報名參加。

比賽數(shù)據(jù)

“智源 MagicSpeechNet 家庭場景中文語音數(shù)據(jù)集”是當前業(yè)界稀缺的優(yōu)質(zhì)家居環(huán)境語音數(shù)據(jù)，其中包含數(shù)百小時的真實家庭環(huán)境中的雙人對話，每段對話基于多種平臺進行錄制，并已完全轉(zhuǎn)錄和標注。

比賽數(shù)據(jù)分為訓練集、開發(fā)集和測試集三部分，測試集數(shù)據(jù)為需要識別的音頻文件，每段音頻分為安卓平臺、iOS 平臺，錄音筆錄制的三個文件。為便于選手分割每段音頻，比賽提供了標明起始和結(jié)束時間點信息的 json 文件，選手需使用模型識別音頻中的對話，并根據(jù) json 中對應的 uttid 提交相應的文本。?

相較于國內(nèi)外同類多通道語音識別比賽，本比賽數(shù)據(jù)在數(shù)量、場景、聲音特性等方面具有以下優(yōu)勢。?

1. 大量的對話數(shù)據(jù)國內(nèi)的語音識別比賽基本使用朗讀類型的語音數(shù)據(jù)，而本比賽使用的數(shù)據(jù)為真實的對話數(shù)據(jù)。數(shù)據(jù)為完全真實場景的對話，說話人以放松和無腳本的方式，圍繞所選主題自由對話。相比基于對話數(shù)據(jù)的國際同類比賽，在數(shù)據(jù)量方面仍舊具有極大的優(yōu)勢。同時，合理的說話人語音交疊更真實地體現(xiàn)日常家庭場景下的語音識別難度。

2. 場景真實多樣本數(shù)據(jù)集采集于 3 個真實的家庭場景，說話人以放松和無腳本的方式，圍繞所選主題自由對話。不同的采集環(huán)境豐富了數(shù)據(jù)的多樣性，同時增強了比賽的難度。

3. 近講與多平臺遠講數(shù)據(jù)結(jié)合每段對話有 5 個通道的同步錄音，包括 3 個遠講通道和 2 個近講通道。遠講通道分別由多個型號的安卓手機，蘋果手機和錄音筆錄制，充分體現(xiàn)多平臺錄音數(shù)據(jù)的特性；近講數(shù)據(jù)使用高保真麥克風錄制，與說話人的嘴保持 10 cm 的距離。

4. 豐富均衡的聲音特性本數(shù)據(jù)集擁有豐富均衡的聲音特性。錄制本數(shù)據(jù)集的說話人來自中國大陸不同地域，存在一定的普通話口音。同時，說話人選自不同年齡段，性別均衡。

智源算法大賽

2019 年 9 月，智源人工智能算法大賽正式啟動。本次比賽由北京智源人工智能研究院主辦，清華大學、北京大學、中科院計算所、曠視、知乎、博世、愛數(shù)智慧、國家天文臺、晶泰等協(xié)辦，總獎金超過 100 萬元，旨在以全球領先的科研數(shù)據(jù)集與算法競賽為平臺，選拔培育人工智能創(chuàng)新人才。?

研究院副院長劉江也表示：“我們希望不拘一格來支持人工智能真正的標志性突破，即使是本科生，如果真的是好苗子，我們也一定支持。”而人工智能大賽就是發(fā)現(xiàn)有潛力的年輕學者的重要途徑。?

本次智源人工智能算法大賽有兩個重要的目的，一是通過發(fā)布數(shù)據(jù)集和數(shù)據(jù)競賽的方式，推動基礎研究的進展。特別是可以讓計算機領域的學者參與到其它學科的基礎科學研究中。二是可以通過比賽篩選、鍛煉相關領域的人才。智源算法大賽已發(fā)布全部的 10 個數(shù)據(jù)集，目前仍有 5 個比賽（獎金 50 萬）尚未結(jié)束。

正在角逐的比賽

智源小分子化合物性質(zhì)預測挑戰(zhàn)賽?

https://www.biendata.com/competition/molecule/?

智源杯天文數(shù)據(jù)算法挑戰(zhàn)賽?

https://www.biendata.com/competition/astrodata2019/

智源-INSPEC 工業(yè)大數(shù)據(jù)質(zhì)量預測賽?

https://www.biendata.com/competition/bosch/

智源-MagicSpeechNet 家庭場景中文語音數(shù)據(jù)集挑戰(zhàn)賽

https://www.biendata.com/competition/magicdata/?

智源-高能對撞粒子分類挑戰(zhàn)賽?

https://www.biendata.com/competition/jet/

????

現(xiàn)在，在「知乎」也能找到我們了

進入知乎首頁搜索「PaperWeekly」

點擊「關注」訂閱我們的專欄吧

關于PaperWeekly

PaperWeekly 是一個推薦、解讀、討論、報道人工智能前沿論文成果的學術平臺。如果你研究或從事 AI 領域，歡迎在公眾號后臺點擊「交流群」，小助手將把你帶入 PaperWeekly 的交流群里。

總結(jié)

以上是生活随笔為你收集整理的10万元奖金语音识别赛进行中！CTC 模型 Baseline 助你轻松上分的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇： 20cm蒸压加气块和10cm发泡陶瓷隔墙
下一篇： Designing GANs：又一个GA