dcase_util教程
關于音頻場景識別的一個基于python的程序合集dcase_util,是英文的,所以在此將其翻譯成了中文,以便查閱。
dcase_util文檔描述了為聲場景和事件檢測和分類(DCASE)創建的實用程序集合。這些實用程序最初是為DCASE挑戰基線系統(2016和2017)創建的,并捆綁到獨立庫中以允許其在其他研究項目中重復使用。
這些實用程序的主要目標是簡化研究代碼,使其更易讀,更易于維護。大多數實施的實用程序都與音頻數據集相關:處理元數據和各種形式的其他結構化數據,并為各種來源的音頻數據集提供標準化的使用API。
dcase_util中包含的概念有:
- Dataset:一組音頻信號和與之相關的參考標注。
- Container:類存儲數據并基于要存儲的數據類型,提供有意義且清晰的數據訪問。
- Metadata:數據注釋。
- Repository:容器來存儲多個數據容器。
- Encoder:用于將數據從不同類型轉換為另一類型。
- Processor:為處理數據提供統一API的類。
- ProcessingChain:在一個鏈中連接多個數據處理器,允許構建數據的復雜處理。
在搭建好了python環境之后,使用下列pip命令安裝dcase_util。
| 1 | pip install dcase_util |
下面是各個單元的使用教程
1.Container
該庫提供數據container以簡化工作流程。 這些container是從標準的Python container(例如對象,列表和字典)繼承的,以使它們可以與其他工具和庫一起使用。 這些數據的目的是為了包裝數據,使用有用的方法來訪問和操作數據,以及加載和存儲數據。
1.1基本用法
- 四種從文件中加載內容的方式:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 | # 1 dict_container = dcase_util.containers.DictContainer(filename='test.yaml') dict_container.load() # 2 dict_container = dcase_util.containers.DictContainer() dict_container.load(filename='test.yaml') # 3 dict_container = dcase_util.containers.DictContainer(filename='test.yaml').load() # 4 dict_container = dcase_util.containers.DictContainer().load(filename='test.yaml') |
- 將內容保存到文件
| 1 | dict_container.save(filename='test.yaml') |
- 查看并在控制器中打印container內容
| 1 | dict_container.show() |
- 查看并在標準日志記錄系統中打印
| 1 | dict_container.log() |
如果日志記錄系統在調用之前未被初始化,那么將使用帶默認參數的dcase_util.utils.setup_logging來初始化它。
1.2 字典
dcase_util.containers.DictContainer設計用于嵌套字典比標準字典數據container更容易一些。它允許通過所謂的虛線路徑或路徑部分列表訪問嵌套字典中的字段。
- 用字典初始化container
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | dict_container = dcase_util.containers.DictContainer( { 'test': { 'field1': 1, 'field2': 2, }, 'test2': 100, 'test3': { 'field1': { 'fieldA': 1 }, 'field2': { 'fieldA': 1 }, 'field3': { 'fieldA': 1 }, } } ) |
- 初始化container,從文件中加載內容
| 1 | dict_container = dcase_util.containers.DictContainer().load(filename='test.yaml') |
- 通過點路徑(dotted path)訪問字段:
| 1 2 3 4 5 6 7 8 | # Field exists value = dict_container.get_path('test.field1') # Using wild card values = dict_container.get_path('test3.*') # Non existing field with default value value = dict_container.get_path('test.fieldA', 'default_value') |
- 通過路徑部分列表訪問字段
| 1 2 3 4 5 | # Field exists value = dict_container.get_path(['test', 'field1']) # Non existing field with default value value = dict_container.get_path(['test', 'fieldA'], 'default_value) |
- 通過點路勁設置字段
| 1 | dict_container.set_path('test.field2', 200) |
- 獲取嵌套字典中所有葉節點的點路徑
| 1 | dict_container.get_leaf_path_list() |
- 獲取嵌套字典中以’field’開頭的所有葉節點的虛線路徑
| 1 | dict_container.get_leaf_path_list(target_field_startswith='field') |
- 把container存成.yaml文件
| 1 | dict_container.save(filename='test.yaml') |
- 從.yaml文件中加載container內容
| 1 | dict_container.load(filename='test.yaml') |
1.3 字典列表
dcase_util.containers.ListDictContainer是用于存儲dcase_util.containers.DictContainer的列表。
- 用字典列表初始化container
| 1 2 3 4 5 6 7 8 | listdict_container = dcase_util.containers.ListDictContainer( [ {'field1': 1, 'field2': 2}, {'field1': 10, 'field2': 20}, {'field1': 100, 'field2': 200}, ] ) |
- 根據鍵和值訪問列表中的項目
| 1 2 3 4 | print(listdict_container.search(key='field1', value=10)) # DictContainer # field1 : 10 # field2 : 20 |
- 得到字典中特定字段的值
| 1 2 | print(ld.get_field(field_name='field2')) # [2, 20, 200] |
1.4 Data Containers
- 其中三種是可用的幾種數據容器類型:
- dcase_util.containers.DataArrayContainer,數組數據的數據容器,內部數據存儲在numpy.array中。(可用)
- dcase_util.containers.DataMatrix2DContainer,二維數據矩陣的數據容器,內部數據存儲在二維numpy.ndarray中。(可用)
- case_util.containers.DataMatrix3DContainer`,用于三維數據矩陣的數據容器,內部數據存儲在3-D numpy.ndarray中。(可用)
- dcase_util.containers.BinaryMatrixContainer,用于二維二進制數據矩陣的數據容器,內部數據存儲在二維numpy.ndarray中。(不可用)
- 用隨機矩陣10x100初始化container,并將時間分辨率設置為20ms:
| 1 2 3 4 | data_container = dcase_util.containers.DataMatrix2DContainer( data=numpy.random.rand(10,100), time_resolution=0.02 ) |
當存儲例如聲學特征時,時間分辨率對應于特征提取幀跳長。
- 直接訪問數據矩陣:
| 1 2 | print(data_container.data.shape) # (10, 100) |
- 顯示container信息
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | data_container.show() # DataMatrix2DContainer :: Class # Data # data : matrix (10,100) # Dimensions # time_axis : 1 # data_axis : 0 # Timing information # time_resolution : 0.02 sec # Meta # stats : Calculated # metadata : - # processing_chain : - # Duration # Frames : 100 # Seconds : 2.00 sec |
該容器具有聚焦機制,可靈活捕捉數據矩陣的一部分。 可以根據時間進行對焦(如果時間分辨率已定義,則以秒為單位),或基于幀ID。
- 使用焦點在0.5秒和1.0秒之間獲取部分數據:
| 1 2 | print(data_container.set_focus(start_seconds=0.5, stop_seconds=1.0).get_focused().shape) # (10, 25) |
- 使用焦點獲取第10幀和第50幀之間的零件數據:
| 1 2 | print(data_container.set_focus(start=10, stop=50).get_focused().shape) # (10, 40) |
- 重置焦點并訪問完整的數據矩陣:
| 1 2 3 | data_container.reset_focus() print(data_container.get_focused().shape) # (10, 100) |
- 訪問幀1,2,10和30
| 1 | data_container.get_frames(frame_ids=[1,2,10,30]) |
- 訪問幀1-5,每列只有第一個值:
| 1 | data_container.get_frames(frame_ids=[1,2,3,4,5], vector_ids=[0]) |
- 轉置矩陣:
| 1 2 3 | transposed_data = data_container.T print(transposed_data.shape) # (100, 10) |
- 繪制數據:
| 1 | data_container.plot() |
dcase_util.containers.BinaryMatrixContainer提供與DataMatrix2DContainer相同的用途,但是用于二進制內容。
1.5 存儲庫(Repositories)
dcase_util.containers.DataRepository和dcase_util.containers.FeatureRepository是可用于存儲多個其他數據容器的容器。 存儲庫存儲具有兩個級別信息的數據:標簽和流。 標簽是更高級別的密鑰,流是第二級。
例如,可以使用儲存庫來儲存與相同音頻信號有關的多個不同聲學特征。 流ID可用于存儲從不同音頻通道提取的特征。 后面的功能可以使用提取器標簽和流ID進行訪問。
- 用數據初始化容器:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | data_repository = dcase_util.containers.DataRepository( data={ 'label1': { 'stream0': { 'data': 100 }, 'stream1': { 'data': 200 } }, 'label2': { 'stream0': { 'data': 300 }, 'stream1': { 'data': 400 } } } ) |
- 顯示container信息:
| 1 2 3 4 5 6 7 8 9 10 11 | data_repository. show() # DataRepository :: Class # Repository info # Item class : DataMatrix2DContainer # Item count : 2 # Labels : ['label1', 'label2'] # Content # [label1][stream1] : {'data': 200} # [label1][stream0] : {'data': 100} # [label2][stream1] : {'data': 400} # [label2][stream0] : {'data': 300} |
- 訪問存儲庫中的數據:
| 1 2 | data_repository.get_container(label='label1',stream_id='stream1') # {'data': 200} |
- 設置數據
| 1 2 3 4 5 6 7 8 9 10 11 12 13 | data_repository.set_container(label='label3',stream_id='stream0', container={'data':500}) data_repository. show() # DataRepository :: Class # Repository info # Item class : DataMatrix2DContainer # Item count : 3 # Labels : ['label1', 'label2', 'label3'] # Content # [label1][stream1] : {'data': 200} # [label1][stream0] : {'data': 100} # [label2][stream1] : {'data': 400} # [label2][stream0] : {'data': 300} # [label3][stream0] : {'data': 500} |
2. 音頻
dcase_util.containers.AudioContainer是多聲道音頻的數據容器。它讀取多種格式(WAV,FLAC,M4A,WEBM)并寫入WAV和FLAC文件。 直接從Youtube下載音頻內容也受支持。
2.1 創建container
- 創建雙通道的音頻container
| 1 2 3 4 5 | audio_container = dcase_util.containers.AudioContainer(fs=44100) t = numpy.linspace(0, 2, 2 * audio_container.fs, endpoint=False) x1 = numpy.sin(220 * 2 * numpy.pi * t) x2 = numpy.sin(440 * 2 * numpy.pi * t) audio_container.data = numpy.vstack([x1, x2]) |
- 顯示的container信息
| 1 2 3 4 5 6 7 | # AudioContainer :: Class # Sampling rate : 44100 # Channels : 2 # Duration # Seconds : 2.00 sec # Milliseconds : 2000.00 ms # Samples : 88200 samples |
2.2 加載和保存
加載
1
2
3audio_container = dcase_util.containers.AudioContainer().load(
filename=dcase_util.utils.Example.audio_filename()
)顯示的container信息
| 1 2 3 4 5 6 7 8 9 | # AudioContainer :: Class # Filename : acoustic_scene.flac # Synced : Yes # Sampling rate : 44100 # Channels : 2 # Duration # Seconds : 10.00 sec # Milliseconds : 10000.02 ms # Samples : 441001 samples |
- 從Youtube加載content
| 1 2 3 4 5 | audio_container = dcase_util.containers.AudioContainer().load_from_youtube( query_id='2ceUOv8A3FE', start=1, stop=5 ) |
2.3 焦點段(Focus segment)
container具有聚焦機制,可靈活捕捉部分音頻數據,同時保持完整的音頻信號不變。 可以根據時間進行聚焦(如果時間分辨率已定義,則以秒為單位),或基于樣本ID。 可以對單聲道或混音(單聲道)頻道進行聚焦。 音頻容器內容可以通過凍結來替代焦點細分。
- 使用焦點在0.5秒和1.0秒之間獲取部分數據:
| 1 2 | print(audio_container.set_focus(start_seconds=0.5, stop_seconds=1.0).get_focused().shape) # (2, 22050) |
- 使用焦點從5秒開始持續2秒獲取部分數據:
| 1 2 | print(audio_container.set_focus(start_seconds=5, duration_seconds=2.0).get_focused().shape) # (2, 88200) |
- 使用焦點從5秒開始持續2秒獲取部分數據,混合兩個立體聲聲道:
| 1 2 | print(audio_container.set_focus(start_seconds=5, duration_seconds=2.0, channel='mixdown').get_focused().shape) # (88200,) |
- 使用焦點從5秒開始2秒開始數取部分數據,在兩個立體聲通道左側:
| 1 2 | print(audio_container.set_focus(start_seconds=5, duration_seconds=2.0, channel='left').get_focused().shape) # (88200,) |
- 使用焦點從5秒開始2秒開始數取部分數據,秒音頻通道(索引從0開始):
| 1 2 | print(audio_container.set_focus(start_seconds=5, duration_seconds=2.0, channel=1).get_focused().shape) # (88200,) |
- 使用焦點獲取樣本44100和88200之間的零件數據:
| 1 2 | print(audio_container.set_focus(start=44100, stop=88200).get_focused().shape) # (2, 44100) |
- 重置焦點并訪問完整的數據矩陣:
| 1 2 3 | audio_container.reset_focus() print(audio_container.get_focused().shape) # (2, 441001) |
- 使用焦點從5秒開始2秒開始數取部分數據,,并凍結該部分
| 1 2 3 | audio_container.set_focus(start_seconds=5, duration_seconds=2.0).freeze() print(audio_container.shape) # (2, 88200) |
2.4 處理
- 正則化音頻
| 1 | audio_container.normalize() |
- 對音頻重采樣到目標采樣率:
| 1 | audio_container.resample(target_fs=16000) |
2.5 可視化
- 繪圖波形:
| 1 | audio_container.plot_wave() |
- 繪制頻譜
| 1 | audio_container.plot_spec() |
3. 聲學特征
庫提供基本的聲學特征提取器:dcase_util.features.MelExtractor,dcase_util.features.MfccStaticExtractor,dcase_util.features.MfccDeltaExtractor,dcase_util.features.MfccAccelerationExtractor,dcase_util.features.ZeroCrossingRateExtractor,dcase_util.features.RMSEnergyExtractor和dcase_util.features.SpectralCentroidExtractor。
3.1 特征提取
- 為音頻信號提取梅爾帶能量(使用默認參數):
| 1 2 3 4 5 6 7 8 9 10 | # Get audio in a container, mixdown of a stereo signal audio_container = dcase_util.containers.AudioContainer().load( filename=dcase_util.utils.Example.audio_filename() ).mixdown() # Create extractor mel_extractor = dcase_util.features.MelExtractor() # Extract features mels = mel_extractor.extract(audio_container) |
- 為特定音頻段提取梅爾帶能量:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | # Get audio in a container, mixdown of a stereo signal audio_container = dcase_util.containers.AudioContainer().load( filename=dcase_util.utils.Example.audio_filename() ).mixdown() # Set focus audio_container.set_focus(start_seconds=1.0, stop_seconds=4.0) # Create extractor mel_extractor = dcase_util.features.MelExtractor() # Extract features mels = mel_extractor.extract(audio_container.get_focused()) # Plot dcase_util.containers.DataMatrix2DContainer( data=mels, time_resolution=mel_extractor.hop_length_seconds ).plot() |
- 直接從numpy矩陣中提取特征:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | # Create an audio signal t = numpy.linspace(0, 2, 2 * 44100, endpoint=False) x1 = numpy.sin(220 * 2 * numpy.pi * t) # Create extractor mel_extractor = dcase_util.features.MelExtractor() # Extract features mels = mel_extractor.extract(x1) # Plot dcase_util.containers.DataMatrix2DContainer( data=mels, time_resolution=mel_extractor.hop_length_seconds ).plot() |
庫中提供的所有音頻提取器適用于單聲道音頻。
3.2 可視化
- 繪制提取的特征
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | # Get audio in a container, mixdown of a stereo signal audio_container = dcase_util.containers.AudioContainer().load( filename=dcase_util.utils.Example.audio_filename() ).mixdown() # Create extractor mel_extractor = dcase_util.features.MelExtractor() # Extract features mels = mel_extractor.extract(audio_container) # Plotting dcase_util.containers.DataMatrix2DContainer( data=mels, time_resolution=mel_extractor.hop_length_seconds ).plot() |
4. 數據處理
4.1 數據操作
有幾個不同的實用程序來操作數據:
- dcase_util.data.Normalizer,計算歸一化因子和歸一化數據。
- dcase_util.data.RepositoryNormalizer,一次性標準化數據存儲庫。
- dcase_util.data.Aggregator,聚合滑動處理窗口中的數據。
- dcase_util.data.Sequencer,對數據矩陣進行排序。
- dcase_util.data.Stacker,基于給定的矢量配方堆疊數據矩陣。
- dcase_util.data.Selector,根據具有開始和偏移量的事件選擇數據的數據段。
- dcase_util.data.Masker,基于具有開始和偏移的事件來掩蓋數據的數據段。
4.1.1 歸一化
dcase_util.data.Normalizer類可用于計算數據的歸一化因子(平均值和標準偏差),而不一次讀取所有數據。在小部分讀取數據時累計中間統計數據。
- 逐個文件計算歸一化因子:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | data = dcase_util.utils.Example.feature_container() # Initialize normalizer normalizer = dcase_util.data.Normalizer() # Accumulate -- feed data per file in normalizer.accumulate(data=data) # After accumulation calculate normalization factors (mean + std) normalizer.finalize() # Save normalizer.save(filename='norm_factors.cpickle') # Load normalizer = dcase_util.data.Normalizer().load(filename='norm_factors.cpickle') |
- 使用聲明:
| 1 2 3 4 5 6 7 8 | data = dcase_util.utils.Example.feature_container() # Accumulate with dcase_util.data.Normalizer() as normalizer: normalizer.accumulate(data=data) # Save normalizer.save(filename='norm_factors.cpickle') |
- 用預先計算的值初始化標準化器(normalizer):
| 1 2 3 4 | data = dcase_util.utils.Example.feature_container() normalizer = dcase_util.data.Normalizer( **data.stats ) |
- 歸一化數據
| 1 2 3 | data = dcase_util.utils.Example.feature_container() normalizer = dcase_util.data.Normalizer().load(filename='norm_factors.cpickle') normalizer.normalize(data) |
4.1.2 聚合(Aggregator)
數據聚合器類(dcase_util.data.Aggregator)可用于在滑動處理窗口中處理數據矩陣。 這個處理階段可以用來通過計算它們的平均值和標準偏差來折疊具有特定窗口長度的數據,或者將該矩陣平坦化為單個向量。
支持的處理方法:flatten、mean、std、cov、kurtosis、skew。所有這些處理方法都可以結合使用。
- 計算分為10幀窗口的均值和標準差,以1幀跳躍(Calculating mean and standard deviation in 10 frame window, with 1 frame hop):
| 1 2 3 4 5 6 7 8 9 10 11 12 | data = dcase_util.utils.Example.feature_container() print(data.shape) # (40, 501) data_aggregator = dcase_util.data.Aggregator( recipe=['mean', 'std'], win_length_frames=10, hop_length_frames=1, ) data_aggregator.aggregate(data) print(data.shape) # (80, 501) |
- 將具有10幀的數據矩陣壓縮成一個單獨的向量,具有1幀跳躍(with 1 frame hop):
| 1 2 3 4 5 6 7 8 9 10 11 12 | data = dcase_util.utils.Example.feature_container() print(data.shape) # (40, 501) data_aggregator = dcase_util.data.Aggregator( recipe=['flatten'], win_length_frames=10, hop_length_frames=1, ) data_aggregator.aggregate(data) print(data.shape) # (400, 501) |
4.1.3 測序(Sequencing)
Sequencer類(dcase_util.data.Sequencer)將數據矩陣處理成序列(圖像)。序列可以重疊,并且可以在調用之間改變排序網格(移位)。
- 創建序列:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | data = dcase_util.utils.Example.feature_container() print(data.shape) # (40, 501) data_sequencer = dcase_util.data.Sequencer( frames=10, hop_length_frames=10 ) sequenced_data = data_sequencer.sequence(data) print(sequenced_data.shape) # (40, 10, 50) sequenced_data.show() # DataMatrix3DContainer :: Class # Data # data : matrix (40,10,50) # Dimensions # time_axis : 1 # Timing information # time_resolution : None # Meta # stats : Calculated # metadata : - # processing_chain : - # Duration # Frames : 10 # Data # Dimensions # time_axis : 1 # data_axis : 0 # sequence_axis : 2 |
4.1.4 堆疊(Stacker)
Stacker類(dcase_util.data.Stacker)根據配方堆疊存儲在數據存儲庫中的數據。例如,可以使用此類創建包含使用多個特征提取器提取的數據的新特征矩陣。使用配方,您可以選擇全矩陣,只有部分具有開始和結束索引的數據向量,或選擇單個數據行。
例:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | # Load data repository repo = dcase_util.utils.Example.feature_repository() # Show labels in the repository print(repo.labels) # Select full matrix from 'mel' and with default stream (0) (40 mel bands). data = dcase_util.data.Stacker(recipe='mel').stack(repo) print(data.shape) # (40, 501) # Select full matrix from 'mel' and define stream 0 (40 mel bands). data = dcase_util.data.Stacker(recipe='mel=0').stack(repo) print(data.shape) # (40, 501) # Select full matrix from 'mel' and 'mfcc' with default stream (0) (40 mel bands + 20 mfccs). data = dcase_util.data.Stacker(recipe='mel;mfcc').stack(repo) print(data.shape) # (60, 501) # Select data from 'mfcc' matrix with default stream (0), and omit first coefficient (19 mfccs). data = dcase_util.data.Stacker(recipe='mfcc=1-19').stack(repo) print(data.shape) # (19, 501) # Select data from 'mfcc' matrix with default stream (0), select coefficients 1,5,7 (3 mfccs). data = dcase_util.data.Stacker(recipe='mfcc=1,5,7').stack(repo) print(data.shape) # (3, 501) |
4.2 數據編碼
數據編碼器可用于將參考元數據轉換為二進制矩陣。
- one hot
OneHotEncoder類(dcase_util.data.OneHotEncoder)可用于創建二進制矩陣,其中單個類在整個信號中處于活動狀態。該編碼器適用于多類單標簽分類應用。
例:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 | # Initilize encoder onehot_encoder = dcase_util.data.OneHotEncoder( label_list=['class A','class B','class C'], time_resolution=0.02 ) # Encode binary_matrix = onehot_encoder.encode( label='class B', length_seconds=10.0 ) # Visualize binary_matrix.plot() |
- Many-hot
ManyHotEncoder類(dcase_util.data.ManyHotEncoder)可用于創建二進制矩陣,其中多個類在整個信號中處于活動狀態。 該編碼器適用于多類多標簽分類應用,如音頻標簽。
例:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 | # Initilize encoder manyhot_encoder = dcase_util.data.ManyHotEncoder( label_list=['class A','class B','class C'], time_resolution=0.02 ) # Encode binary_matrix = manyhot_encoder.encode( label_list=['class A', 'class B'], length_seconds=10.0 ) # Visualize binary_matrix.plot() |
- 事件滾動(Event roll)
EventRollEncoder類(dcase_util.data.EventRollEncoder)可用于創建二進制矩陣,其中多個事件在指定的時間段內處于活動狀態。 該編碼器適用于事件檢測應用。
例子:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | # Metadata meta = dcase_util.containers.MetaDataContainer([ { 'filename': 'test1.wav', 'event_label': 'cat', 'onset': 1.0, 'offset': 3.0 }, { 'filename': 'test1.wav', 'event_label': 'dog', 'onset': 2.0, 'offset': 6.0 }, { 'filename': 'test1.wav', 'event_label': 'speech', 'onset': 5.0, 'offset': 8.0 }, ]) # Initilize encoder event_roll_encoder = dcase_util.data.EventRollEncoder( label_list=meta.unique_event_labels, time_resolution=0.02 ) # Encode event_roll = event_roll_encoder.encode( metadata_container=meta, length_seconds=10.0 ) # Visualize event_roll.plot() |
5. Metadata
Library提供容器dcase_util.containers.MetaDataContainer,用于處理來自大多數DCASE相關應用程序區域的元數據:聲場景分類,事件檢測和音頻標記。
原則上,元數據是一個包含元項目字典的列表,它可以像普通列表一樣使用。
5.1 創建容器(container)
用聲場列表初始化元數據容器:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16meta_container_scenes = dcase_util.containers.MetaDataContainer(
[
{
'filename': 'file1.wav',
'scene_label': 'office'
},
{
'filename': 'file2.wav',
'scene_label': 'street'
},
{
'filename': 'file3.wav',
'scene_label': 'home'
}
]
)用聲音事件列表初始化元數據容器:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | meta_container_events = dcase_util.containers.MetaDataContainer( [ { 'filename': 'file1.wav', 'event_label': 'speech', 'onset': 10.0, 'offset': 15.0, }, { 'filename': 'file1.wav', 'event_label': 'footsteps', 'onset': 23.0, 'offset': 26.0, }, { 'filename': 'file2.wav', 'event_label': 'speech', 'onset': 2.0, 'offset': 5.0, } ] ) |
- 用標簽初始化元數據容器:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | meta_container_tags = dcase_util.containers.MetaDataContainer( [ { 'filename': 'file1.wav', 'tags': ['cat','dog'] }, { 'filename': 'file2.wav', 'tags': ['dog'] }, { 'filename': 'file3.wav', 'tags': ['dog','horse'] } ] ) |
- 顯示內容
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | meta_container_scenes.show() # MetaDataContainer :: Class # Items : 3 # Unique # Files : 3 # Scene labels : 3 # Event labels : 0 # Tags : 0 # # Scene statistics # Scene label Count # -------------------- ------ # home 1 # office 1 # street 1 meta_container_events.show() # MetaDataContainer :: Class # Items : 3 # Unique # Files : 2 # Scene labels : 0 # Event labels : 2 # Tags : 0 # # Event statistics # Event label Count Tot. Length Avg. Length # -------------------- ------ ----------- ----------- # footsteps 1 3.00 3.00 # speech 2 8.00 4.00 meta_container_tags.show() # MetaDataContainer :: Class # Items : 3 # Unique # Files : 3 # Scene labels : 0 # Event labels : 0 # Tags : 3 # # Tag statistics # Tag Count # -------------------- ------ # cat 1 # dog 3 # horse 1 |
- 顯示內容和每個元數據項目:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | meta_container_scenes.show_all() # MetaDataContainer :: Class # Items : 3 # Unique # Files : 3 # Scene labels : 3 # Event labels : 0 # Tags : 0 # # Meta data # Source Onset Offset Scene Event Tags Identifier # -------------------- ------ ------ --------------- --------------- --------------- ----- # file1.wav - - office - - - # file2.wav - - street - - - # file3.wav - - home - - - # # Scene statistics # Scene label Count # -------------------- ------ # home 1 # office 1 # street 1 meta_container_events.show_all() # MetaDataContainer :: Class # Items : 3 # Unique # Files : 2 # Scene labels : 0 # Event labels : 2 # Tags : 0 # # Meta data # Source Onset Offset Scene Event Tags Identifier # -------------------- ------ ------ --------------- --------------- --------------- ----- # file1.wav 10.00 15.00 - speech - - # file1.wav 23.00 26.00 - footsteps - - # file2.wav 2.00 5.00 - speech - - # # Event statistics # Event label Count Tot. Length Avg. Length # -------------------- ------ ----------- ----------- # footsteps 1 3.00 3.00 # speech 2 8.00 4.00 |
5.2 加載和保存
- 將元數據保存到文件中:
| 1 | meta_container_events.save(filename='events.txt') |
- 從注釋文件加載元數據:
| 1 2 3 | meta_container_events = dcase_util.containers.MetaDataContainer().load( filename='events.txt' ) |
5.3 訪問數據
- 獲取元數據中提及的音頻文件及其數量:
| 1 2 3 4 5 | print(meta_container_events.unique_files) # ['file1.wav', 'file2.wav'] print(meta_container_events.file_count) # 2 |
- 獲取獨特的場景標簽及其數量:
| 1 2 3 4 5 | print(meta_container_scenes.unique_scene_labels) # ['home', 'office', 'street'] print(meta_container_scenes.scene_label_count) # 3 |
- 獲取元數據中使用的特定事件標簽及其數量:
| 1 2 3 4 | print(meta_container_events.unique_event_labels) # ['footsteps', 'speech'] print(meta_container_scenes.event_label_count) |
- 獲取元數據中使用的獨特標簽及其數量:
| 1 2 3 4 5 | print(meta_container_tags.unique_tags) # ['cat', 'dog', 'horse'] print(meta_container_tags.tag_count) # 3 |
5.4 過濾(Filtering)
- 基于文件名過濾元數據:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | filtered = meta_container_events.filter(filename='file1.wav') filtered.show_all() # MetaDataContainer :: Class # Items : 2 # Unique # Files : 1 # Scene labels : 0 # Event labels : 2 # Tags : 0 # # Meta data # Source Onset Offset Scene Event Tags Identifier # -------------------- ------ ------ --------------- --------------- --------------- ----- # file1.wav 10.00 15.00 - speech - - # file1.wav 23.00 26.00 - footsteps - - # # Event statistics # Event label Count Tot. Length Avg. Length # -------------------- ------ ----------- ----------- # footsteps 1 3.00 3.00 # speech 1 5.00 5.00 |
- 基于事件標簽過濾元數據:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | filtered = meta_container_events.filter(event_label='speech') filtered.show_all() # MetaDataContainer :: Class # Items : 2 # Unique # Files : 2 # Scene labels : 0 # Event labels : 1 # Tags : 0 # # Meta data # Source Onset Offset Scene Event Tags Identifier # -------------------- ------ ------ --------------- --------------- --------------- ----- # file1.wav 10.00 15.00 - speech - - # file2.wav 2.00 5.00 - speech - - # # Event statistics # Event label Count Tot. Length Avg. Length # -------------------- ------ ----------- ----------- # speech 2 8.00 4.00 |
- 基于文件和事件標簽過濾元數據:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | filtered = meta_container_events.filter(filename='file1.wav', event_label='speech') filtered.show_all() # MetaDataContainer :: Class # Items : 1 # Unique # Files : 1 # Scene labels : 0 # Event labels : 1 # Tags : 0 # # Meta data # Source Onset Offset Scene Event Tags Identifier # -------------------- ------ ------ --------------- --------------- --------------- ----- # file1.wav 10.00 15.00 - speech - - # # Event statistics # Event label Count Tot. Length Avg. Length # -------------------- ------ ----------- ----------- # speech 1 5.00 5.00 |
- 基于時間段過濾,并使段開始新的零時間:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | filtered = meta_container_events.filter_time_segment(filename='file1.wav', start=12, stop=24) filtered.show_all() # MetaDataContainer :: Class # Items : 2 # Unique # Files : 1 # Scene labels : 0 # Event labels : 2 # Tags : 0 # # Meta data # Source Onset Offset Scene Event Tags Identifier # -------------------- ------ ------ --------------- --------------- --------------- ----- # file1.wav 0.00 3.00 - speech - - # file1.wav 11.00 12.00 - footsteps - - # # Event statistics # Event label Count Tot. Length Avg. Length # -------------------- ------ ----------- ----------- # footsteps 1 1.00 1.00 # speech 1 3.00 3.00 |
5.5 處理(Processing)
- 將時間偏移量添加到元數據項中設置的開始和偏移量:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | meta_container_events.add_time(time=10) meta_container_events.show_all() # MetaDataContainer :: Class # Items : 3 # Unique # Files : 2 # Scene labels : 0 # Event labels : 2 # Tags : 0 # # Meta data # Source Onset Offset Scene Event Tags Identifier # -------------------- ------ ------ --------------- --------------- --------------- ----- # file1.wav 20.00 25.00 - speech - - # file1.wav 33.00 36.00 - footsteps - - # file2.wav 12.00 15.00 - speech - - # # Event statistics # Event label Count Tot. Length Avg. Length # -------------------- ------ ----------- ----------- # footsteps 1 3.00 3.00 # speech 2 8.00 4.00 |
- 刪除非常短的事件并合并它們之間有小間隙的事件(相同的事件標簽):
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | meta_container_events = dcase_util.containers.MetaDataContainer( [ { 'filename': 'file1.wav', 'event_label': 'speech', 'onset': 1.0, 'offset': 2.0, }, { 'filename': 'file1.wav', 'event_label': 'speech', 'onset': 2.05, 'offset': 2.5, }, { 'filename': 'file1.wav', 'event_label': 'speech', 'onset': 5.1, 'offset': 5.15, }, ] ) processed = meta_container_events.process_events(minimum_event_length=0.2, minimum_event_gap=0.1) processed.show_all() # MetaDataContainer :: Class # Items : 1 # Unique # Files : 1 # Scene labels : 0 # Event labels : 1 # Tags : 0 # # Meta data # Source Onset Offset Scene Event Tags Identifier # -------------------- ------ ------ --------------- --------------- --------------- ----- # file1.wav 1.00 2.50 - speech - - # # Event statistics # Event label Count Tot. Length Avg. Length # -------------------- ------ ----------- ----------- # speech 1 1.50 1.50 |
5.6 事件滾動(Event roll)
將事件列表轉換為事件滾動(具有事件活動的二進制矩陣):
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | meta_container_events = dcase_util.containers.MetaDataContainer( [ { 'filename': 'file1.wav', 'event_label': 'speech', 'onset': 1.0, 'offset': 2.0, }, { 'filename': 'file1.wav', 'event_label': 'speech', 'onset': 2.05, 'offset': 2.5, }, { 'filename': 'file1.wav', 'event_label': 'speech', 'onset': 5.1, 'offset': 5.15, }, { 'filename': 'file1.wav', 'event_label': 'footsteps', 'onset': 3.1, 'offset': 4.15, }, { 'filename': 'file1.wav', 'event_label': 'dog', 'onset': 2.6, 'offset': 3.6, }, ] ) event_roll = meta_container_events.to_event_roll() # Plot event_roll.plot() |
6. Datasets
數據庫類提供在庫中為許多不同組織的音頻數據集創建統一的接口。數據集將在第一次使用時下載,提取并準備好使用情況。
提供了四種類型的數據集:
- 聲場景數據集,從dcase_util.datasets.AcousticSceneDataset類繼承的類。
- 聲音事件數據集,從dcase_util.datasets.SoundEventDataset類繼承的類。
- 聲音事件數據集合成數據創建,從dcase_util.datasets.SyntheticSoundEventDataset類繼承的類。
- 音頻標記數據集,從dcase_util.datasets.AudioTaggingDataset類繼承的類。
獲取所有可用數據集的列表:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | print(dcase_util.datasets.dataset_list()) # Dataset list # Class Name | Group | Remote | Local | Audio | Scenes | Events | Tags # ------------------------------------------ | ----- | ------ | ------ | ----- | ------ | ------ | ---- # DCASE2013_Scenes_DevelopmentSet | scene | 344 MB | 849 MB | 100 | 10 | | # TUTAcousticScenes_2016_DevelopmentSet | scene | 7 GB | 23 GB | 1170 | 15 | | # TUTAcousticScenes_2016_EvaluationSet | scene | 2 GB | 5 GB | 390 | 15 | | # TUTAcousticScenes_2017_DevelopmentSet | scene | 9 GB | 21 GB | 4680 | 15 | | # TUTAcousticScenes_2017_EvaluationSet | scene | 3 GB | 7 GB | | | | # DCASE2017_Task4tagging_DevelopmentSet | event | 5 MB | 24 GB | 56700 | 1 | 17 | # DCASE2017_Task4tagging_EvaluationSet | event | 823 MB | 1 GB | | | | # TUTRareSoundEvents_2017_DevelopmentSet | event | 7 GB | 28 GB | | | 3 | # TUTRareSoundEvents_2017_EvaluationSet | event | 4 GB | 4 GB | | | 3 | # TUTSoundEvents_2016_DevelopmentSet | event | 967 MB | 2 GB | 954 | 2 | 17 | # TUTSoundEvents_2016_EvaluationSet | event | 449 MB | 989 MB | 511 | 2 | 17 | # TUTSoundEvents_2017_DevelopmentSet | event | 1 GB | 2 GB | 659 | 1 | 6 | # TUTSoundEvents_2017_EvaluationSet | event | 370 MB | 798 MB | | | | # TUT_SED_Synthetic_2016 | event | 4 GB | 5 GB | | | | # CHiMEHome_DomesticAudioTag_DevelopmentSet | tag | 3 GB | 9 GB | 1946 | 1 | | 7 |
6.1 初始化數據集
要下載,提取和準備數據集(在這種情況下,數據集將放置在臨時目錄中,并且只下載與元數據相關的文件):
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | import tempfile db = dcase_util.datasets.TUTAcousticScenes_2016_DevelopmentSet( data_path=tempfile.gettempdir(), included_content_types=['meta'] ) db.initialize() db.show() # DictContainer :: Class # audio_source : Field recording # audio_type : Natural # authors : Annamaria Mesaros, Toni Heittola, and Tuomas Virtanen # licence : free non-commercial # microphone_model : Soundman OKM II Klassik/studio A3 electret microphone # recording_device_model : Roland Edirol R-09 # title : TUT Acoustic Scenes 2016, development dataset # url : https://zenodo.org/record/45739 # # MetaDataContainer :: Class # Filename : /tmp/TUT-acoustic-scenes-2016-development/meta.txt # Items : 1170 # Unique # Files : 1170 # Scene labels : 15 # Event labels : 0 # Tags : 0 # # Scene statistics # Scene label Count # -------------------- ------ # beach 78 # bus 78 # cafe/restaurant 78 # car 78 # city_center 78 # forest_path 78 # grocery_store 78 # home 78 # library 78 # metro_station 78 # office 78 # park 78 # residential_area 78 # train 78 # tram 78 |
6.2 交叉驗證步驟
通常數據集通過交叉驗證設置提供。
獲得fold訓練材料1:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | training_material = db.train(fold=1) training_material.show() # MetaDataContainer :: Class # Filename : /tmp/TUT-acoustic-scenes-2016-development/evaluation_setup/fold1_train.txt # Items : 880 # Unique # Files : 880 # Scene labels : 15 # Event labels : 0 # Tags : 0 # # Scene statistics # Scene label Count # -------------------- ------ # beach 59 # bus 59 # cafe/restaurant 60 # car 58 # city_center 60 # forest_path 57 # grocery_store 59 # home 56 # library 57 # metro_station 59 # office 59 # park 58 # residential_area 59 # train 60 # tram 60 |
- 獲取fold1的測試材料(無參考數據的材料):
| 1 2 3 4 5 6 7 8 9 10 | testing_material = db.test(fold=1) testing_material.show() # MetaDataContainer :: Class # Filename : /tmp/TUT-acoustic-scenes-2016-development/evaluation_setup/fold1_test.txt # Items : 290 # Unique # Files : 290 # Scene labels : 0 # Event labels : 0 # Tags : 0 |
- 使用完整的參考數據獲取fold1的測試材料:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | eval_material = db.eval(fold=1) eval_material.show() # MetaDataContainer :: Class # Filename : /tmp/TUT-acoustic-scenes-2016-development/evaluation_setup/fold1_evaluate.txt # Items : 290 # Unique # Files : 290 # Scene labels : 15 # Event labels : 0 # Tags : 0 # # Scene statistics # Scene label Count # -------------------- ------ # beach 19 # bus 19 # cafe/restaurant 18 # car 20 # city_center 18 # forest_path 21 # grocery_store 19 # home 22 # library 21 # metro_station 19 # office 19 # park 20 # residential_area 19 # train 18 # tram 18 |
- 要將所有數據集合折疊為無:(To get all data set fold to None)
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | all_material = db.train(fold=None) all_material.show() # MetaDataContainer :: Class # Filename : /tmp/TUT-acoustic-scenes-2016-development/meta.txt # Items : 1170 # Unique # Files : 1170 # Scene labels : 15 # Event labels : 0 # Tags : 0 # # Scene statistics # Scene label Count # -------------------- ------ # beach 78 # bus 78 # cafe/restaurant 78 # car 78 # city_center 78 # forest_path 78 # grocery_store 78 # home 78 # library 78 # metro_station 78 # office 78 # park 78 # residential_area 78 # train 78 # tram 78 |
- 迭代所有fold:
| 1 2 | for fold in db.folds: train_material = db.train(fold=fold) |
大多數數據集不提供驗證集拆分。 但是,數據集類提供了幾種方法從訓練集中生成它,同時保持數據統計數據并確保在訓練集和驗證集中不會有來自同一源的數據。
為fold1生成平衡的驗證集(平衡完成,以便將來自相同位置的記錄分配給同一集):
| 1 2 3 4 5 | training_files, validation_files = db.validation_split( fold=1, split_type='balanced', validation_amount=0.3 ) |
- 為fold1生成隨機驗證集(不平衡):
| 1 2 3 4 5 | training_files, validation_files = db.validation_split( fold=1, split_type='random', validation_amount=0.3 ) |
- 獲取數據集提供的驗證集(示例中使用的數據集不提供它,這會引發錯誤。):
| 1 2 3 4 | training_files, validation_files = db.validation_split( fold=1, split_type='dataset' ) |
6.3 元數據(Meta data)
- 獲取與該文件關聯的元數據:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | items = db.file_meta(filename='audio/b086_150_180.wav') print(items) # MetaDataContainer :: Class # Items : 1 # Unique # Files : 1 # Scene labels : 1 # Event labels : 0 # Tags : 0 # # Meta data # Source Onset Offset Scene Event Tags Identifier # -------------------- ------ ------ --------------- --------------- --------------- ----- # audio/b086_150_180.. - - grocery_store - - - # # Scene statistics # Scene label Count # -------------------- ------ # grocery_store 1 |
7. 處理鏈(Processing chain)
除了基本的實用程序外,該庫還提供了將各種數據處理類鏈接在一起的機制。這樣可以更輕松地構建復雜的數據處理管道。數據處理在包含在鏈中的處理器(處理器列表)中完成。
所有處理器類都使用dcase_util.processors.ProcessorMixin mixin和特定的實用程序類。例如,dcase_util.processors.AudioReadingProcessor從dcase_util.processors.ProcessorMixin和dcase_util.containers.AudioContainer繼承。
- 音頻相關處理器:
dcase_util.processors.AudioReadingProcessor,音頻讀取,支持多聲道音頻。
dcase_util.processors.MonoAudioReadingProcessor,在多聲道音頻通道的情況下,音頻讀取將混合到單個通道中。
- 數據處理處理器:
dcase_util.processors.AggregationProcessor,通過滑動處理窗口聚合數據。
dcase_util.processors.SequencingProcessor,將數據矩陣分割成序列。
dcase_util.processors.NormalizationProcessor,規范數據矩陣。
dcase_util.processors.RepositoryNormalizationProcessor,規范存儲在倉庫中的數據。
dcase_util.processors.StackingProcessor根據存儲在存儲庫中的數據堆疊新的特征矩陣。
- 數據編碼處理器:
dcase_util.processors.OneHotEncodingProcessor,one-hot編碼(分類)。
dcase_util.processors.ManyHotEncodingProcessor,many-hot編碼(標記)。
dcase_util.processors.EventRollEncodingProcessor,事件滾動編碼(檢測)。
特征提取處理器:
dcase_util.processors.RepositoryFeatureExtractorProcessor,同時提取許多要素類型,并將它們存儲到單個存儲庫中。支持多聲道音頻輸入。
dcase_util.processors.MelExtractorProcessor,提取梅爾帶能量。僅支持單聲道音頻。
dcase_util.processors.MfccStaticExtractorProcessor,提取靜態MFCC。僅支持單聲道音頻。
dcase_util.processors.MfccDeltaExtractorProcessor,提取delta MFCC。僅支持單聲道音頻。
dcase_util.processors.MfccAccelerationExtractorProcessor,提取加速MFCC。僅支持單聲道音頻。
dcase_util.processors.ZeroCrossingRateExtractorProcessor,提取過零率。僅支持單聲道音頻。
- dcase_util.processors.RMSEnergyExtractorProcessor,提取RMS能量。僅支持單聲道音頻。
- dcase_util.processors.SpectralCentroidExtractorProcessor,提取光譜質心。僅支持單聲道音頻。
元數據處理器:
dcase_util.processors.MetadataReadingProcessor,從文件中讀取元數據。
7.1 特征提取和處理
- 為音頻文件提取梅爾帶能量:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 | # Define processing chain chain = dcase_util.processors.ProcessingChain([ { 'processor_name': 'dcase_util.processors.MonoAudioReadingProcessor', 'init_parameters': { 'fs': 44100 } }, { 'processor_name': 'dcase_util.processors.MelExtractorProcessor', 'init_parameters': {} } ]) # Run the processing chain data = chain.process(filename=dcase_util.utils.Example().audio_filename()) data.show() # FeatureContainer :: Class # Data # data : matrix (40,501) # Dimensions # time_axis : 1 # data_axis : 0 # Timing information # time_resolution : 0.02 sec # Meta # stats : Calculated # metadata : - # processing_chain : ProcessingChain :: Class # [0] # DictContainer :: Class # init_parameters # _focus_channel : None # _focus_start : 0 # _focus_stop : None # channel_axis : 0 # data_synced_with_file : False # filename : None # fs : 44100 # mono : True # time_axis : 1 # input_type : NONE # output_type : AUDIO # process_parameters # filename : dcase_util/utils/example_data/acoustic_scene.flac # focus_channel : None # focus_duration_samples : (None,) # focus_duration_sec : None # focus_start_samples : None # focus_start_sec : None # focus_stop_samples : None # focus_stop_sec : None # processor_name : dcase_util.processors.MonoAudioReadingProcessor # # [1] # DictContainer :: Class # init_parameters # eps : 2.22044604925e-16 # fmax : None # fmin : 0 # fs : 44100 # hop_length_samples : 882 # hop_length_seconds : 0.02 # htk : False # log : True # n_mels : 40 # normalize_mel_bands : False # win_length_samples : 1764 # win_length_seconds : 0.04 # input_type : AUDIO # output_type : DATA_CONTAINER # process_parameters # filename : dcase_util/utils/example_data/acoustic_scene.flac # processor_name : dcase_util.processors.MelExtractorProcessor # # # Duration # Frames : 501 # Seconds : 10.02 sec |
- 關注音頻的某些部分:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | # Define processing chain chain = dcase_util.processors.ProcessingChain([ { 'processor_name': 'dcase_util.processors.MonoAudioReadingProcessor', 'init_parameters': { 'fs': 44100 } }, { 'processor_name': 'dcase_util.processors.MelExtractorProcessor', 'init_parameters': {} } ]) # Run the processing chain data = chain.process( filename=dcase_util.utils.Example().audio_filename(), focus_start_seconds=1.0, duration_seconds=2.0 ) print(data.shape) # (40, 101) # Run the processing chain data = chain.process( filename=dcase_util.utils.Example().audio_filename(), focus_start_samples=44100, focus_stop_samples=44100*2 ) print(data.shape) # (40, 51) |
- 為音頻文件提取幾個不同的聲學特征,并形成數據矩陣:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 | # Define processing chain chain = dcase_util.processors.ProcessingChain([ { 'processor_name': 'dcase_util.processors.MonoAudioReadingProcessor', 'init_parameters': { 'fs': 44100 } }, { 'processor_name': 'dcase_util.processors.RepositoryFeatureExtractorProcessor', 'init_parameters': { 'parameters': { 'mel': {}, 'mfcc': {} } } }, { 'processor_name': 'dcase_util.processors.StackingProcessor', 'init_parameters': { 'recipe': 'mel;mfcc=1-19' } } ]) # Run the processing chain data = chain.process(filename=dcase_util.utils.Example().audio_filename()) data.show() # FeatureContainer :: Class # Data # data : matrix (59,501) # Dimensions # time_axis : 1 # data_axis : 0 # Timing information # time_resolution : 0.02 sec # Meta # stats : Calculated # metadata : - # processing_chain : ProcessingChain :: Class # [0] # DictContainer :: Class # init_parameters # _focus_channel : None # _focus_start : 0 # _focus_stop : None # channel_axis : 0 # data_synced_with_file : False # filename : None # fs : 44100 # mono : True # time_axis : 1 # input_type : NONE # output_type : AUDIO # process_parameters # filename : dcase_util/utils/example_data/acoustic_scene.flac # focus_channel : None # focus_duration_samples : (None,) # focus_duration_sec : None # focus_start_samples : None # focus_start_sec : None # focus_stop_samples : None # focus_stop_sec : None # processor_name : dcase_util.processors.MonoAudioReadingProcessor # # [1] # DictContainer :: Class # init_parameters # parameters # mel # mfcc # input_type : AUDIO # output_type : DATA_REPOSITORY # process_parameters # processor_name : dcase_util.processors.RepositoryFeatureExtractorProcessor # # [2] # DictContainer :: Class # init_parameters # hop : 1 # recipe : list (2) # [0] # label : mel # [1] # label : mfcc # vector-index # full : False # selection : False # start : 1 # stop : 20 # stream : 0 # input_type : DATA_REPOSITORY # output_type : DATA_CONTAINER # process_parameters # processor_name : dcase_util.processors.StackingProcessor # # # Duration # Frames : 501 # Seconds : 10.02 sec data.plot() |
- 為音頻文件提取幾個不同的聲學特征,對它們進行歸一化處理,形成數據矩陣,沿時間軸聚合(上下文窗口化),以及將數據拆分成序列:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 | import numpy # Normalization factors mel_mean = numpy.array([ -3.26094211, -4.20447522, -4.57860912, -5.11036974, -5.33019526, -5.48390484, -5.50473626, -5.54014946, -5.28249358, -5.12090705, -5.21508926, -5.3824216 , -5.37758142, -5.38829567, -5.4912112 , -5.55352419, -5.72801733, -6.02412347, -6.41367833, -6.64073975, -6.80493457, -6.8717373 , -6.88140949, -6.91464104, -7.00929399, -7.13497673, -7.36417664, -7.73457445, -8.25007518, -8.79878143, -9.22709866, -9.28843908, -9.57054527, -9.82846299, -9.85425306, -9.90253041, -9.85194976, -9.62786338, -9.38480217, -9.18478766 ]) mel_std = numpy.array([ 0.3450398 , 0.47330394, 0.53112192, 0.57607313, 0.66710664, 0.70052532, 0.79045046, 0.81864229, 0.79422025, 0.76691708, 0.64798516, 0.59340713, 0.57756029, 0.64032687, 0.70226395, 0.75670044, 0.80861907, 0.79305124, 0.7289238 , 0.75346821, 0.77785602, 0.7350573 , 0.75137917, 0.77171676, 0.80314121, 0.78965339, 0.79256442, 0.82524546, 0.84596991, 0.76430333, 0.69690919, 0.69591269, 0.54718615, 0.5277196 , 0.61271734, 0.54482468, 0.42716334, 0.25561558, 0.08991936, 0.06402002 ]) mfcc_mean = numpy.array([ -1.89603847e+02, 4.88930395e+01, -8.37911555e+00, 2.58522036e+00, 4.51964497e+00, -3.87312873e-01, 8.97250541e+00, 1.61597737e+00, 1.74111135e+00, 2.50223131e+00, 3.03385048e+00, 1.34561742e-01, 1.04119803e+00, -2.57486399e-01, 7.58245525e-01, 1.11375319e+00, 5.45536494e-01, 7.62699140e-01, 9.34355519e-01, 1.57158221e-01 ]) mfcc_std = numpy.array([ 15.94006483, 2.39593761, 4.78748908, 2.39555341, 2.66573364, 1.75496556, 2.75005027, 1.5436589 , 1.81070379, 1.39476785, 1.22560606, 1.25575051, 1.34613239, 1.46778281, 1.19398649, 1.1590474 , 1.1309816 , 1.12975486, 0.95503429, 1.01747647 ]) # Define processing chain chain = dcase_util.processors.ProcessingChain([ { 'processor_name': 'dcase_util.processors.MonoAudioReadingProcessor', 'init_parameters': { 'fs': 44100 } }, { 'processor_name': 'dcase_util.processors.RepositoryFeatureExtractorProcessor', 'init_parameters': { 'parameters': { 'mel': {}, 'mfcc': {} } } }, { 'processor_name': 'dcase_util.processors.RepositoryNormalizationProcessor', 'init_parameters': { 'parameters': { 'mel': { 'mean': mel_mean, 'std': mel_std }, 'mfcc': { 'mean': mfcc_mean, 'std': mfcc_std } } } }, { 'processor_name': 'dcase_util.processors.StackingProcessor', 'init_parameters': { 'recipe': 'mel;mfcc=1-19' } }, { 'processor_name': 'dcase_util.processors.AggregationProcessor', 'init_parameters': { 'recipe': ['flatten'], 'win_length_frames': 5, 'hop_length_frames': 1, } }, { 'processor_name': 'dcase_util.processors.SequencingProcessor', 'init_parameters': { 'frames': 20, 'hop_length_frames': 20, 'padding': True } }, ]) data = chain.process(filename=dcase_util.utils.Example().audio_filename()) data.show() # DataMatrix3DContainer :: Class # Data # data : matrix (295,20,26) # Dimensions # time_axis : 1 # data_axis : 0 # sequence_axis : 2 # Timing information # time_resolution : None # Meta # stats : Calculated # metadata : - # processing_chain : ProcessingChain :: Class # [0] # DictContainer :: Class # init_parameters # _focus_channel : None # _focus_start : 0 # _focus_stop : None # channel_axis : 0 # data_synced_with_file : False # filename : None # fs : 44100 # mono : True # time_axis : 1 # input_type : NONE # output_type : AUDIO # process_parameters # filename : dcase_util/utils/example_data/acoustic_scene.flac # focus_channel : None # focus_duration_samples : (None,) # focus_duration_sec : None # focus_start_samples : None # focus_start_sec : None # focus_stop_samples : None # focus_stop_sec : None # processor_name : dcase_util.processors.MonoAudioReadingProcessor # # [1] # DictContainer :: Class # init_parameters # parameters # mel # mfcc # input_type : AUDIO # output_type : DATA_REPOSITORY # process_parameters # processor_name : dcase_util.processors.RepositoryFeatureExtractorProcessor # # [2] # DictContainer :: Class # init_parameters # parameters # mel # mean : matrix (40,1) # std : matrix (40,1) # mfcc # mean : matrix (20,1) # std : matrix (20,1) # input_type : DATA_REPOSITORY # output_type : DATA_REPOSITORY # process_parameters # processor_name : dcase_util.processors.RepositoryNormalizationProcessor # # [3] # DictContainer :: Class # init_parameters # hop : 1 # recipe : list (2) # [0] # label : mel # [1] # label : mfcc # vector-index # full : False # selection : False # start : 1 # stop : 20 # stream : 0 # input_type : DATA_REPOSITORY # output_type : DATA_CONTAINER # process_parameters # processor_name : dcase_util.processors.StackingProcessor # # [4] # DictContainer :: Class # init_parameters # hop_length_frames : 1 # recipe : list (1) # [0] : flatten # win_length_frames : 5 # input_type : DATA_CONTAINER # output_type : DATA_CONTAINER # process_parameters # processor_name : dcase_util.processors.AggregationProcessor # # [5] # DictContainer :: Class # init_parameters # frames : 20 # hop_length_frames : 20 # padding : True # shift : 0 # shift_border : roll # shift_max : None # shift_step : 0 # input_type : DATA_CONTAINER # output_type : DATA_CONTAINER # process_parameters # processor_name : dcase_util.processors.SequencingProcessor # # # Duration # Frames : 20 |
7.2 元數據處理(Meta data processing)
- 獲得事件卷(event roll):
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | import tempfile tmp = tempfile.NamedTemporaryFile('r+', suffix='.txt', delete=False) dcase_util.utils.Example.event_metadata_container().save(filename=tmp.name) # Define processing chain chain = dcase_util.processors.ProcessingChain([ { 'processor_name': 'dcase_util.processors.MetadataReadingProcessor', 'init_parameters': {} }, { 'processor_name': 'dcase_util.processors.EventRollEncodingProcessor', 'init_parameters': { 'label_list': dcase_util.utils.Example.event_metadata_container().unique_event_labels, 'time_resolution': 0.02, } } ]) # Do the processing data = chain.process( filename=tmp.name, focus_filename='test1.wav' ) # Plot data data.plot() |
- 獲取用于焦點段事件卷:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | import tempfile tmp = tempfile.NamedTemporaryFile('r+', suffix='.txt', delete=False) dcase_util.utils.Example.event_metadata_container().save(filename=tmp.name) # Define processing chain chain = dcase_util.processors.ProcessingChain([ { 'processor_name': 'dcase_util.processors.MetadataReadingProcessor', 'init_parameters': {} }, { 'processor_name': 'dcase_util.processors.EventRollEncodingProcessor', 'init_parameters': { 'label_list': dcase_util.utils.Example.event_metadata_container().unique_event_labels, 'time_resolution': 0.02, } } ]) # Do the processing data = chain.process( filename=tmp.name, focus_filename='test1.wav', focus_start_seconds=2.0, focus_stop_seconds=6.5, ) # Plot data data.plot() |
應用的例子
聲場景分類器
本教程演示如何使用dcase_util中的實用程序構建簡單的聲場景分類器。 聲場景分類器應用程序通常包含以下階段:
- 數據集初始化階段,使用數據集已下載并準備使用。
- 特征提取階段將為開發數據集中的所有音頻文件提取聲學特征,并存儲到磁盤以便稍后訪問
- 特征標準化階段,通過交叉驗證折疊訓練材料并計算聲學特征的均值和標準差以稍后對特征數據進行歸一化
- 學習階段,通過交叉驗證折疊的培訓材料,并學習聲學模型。
- 測試階段,通過交叉驗證折疊測試材料,并估計每個樣本的場景類別。
- 評估,評估系統輸出與地面真相。
本例使用DCASE2013(10場景類),靜態MFCC作為特征,GMM作為分類器發布的聲場景數據集。 例子只顯示了最少的代碼,通常的開發系統需要更好的參數化來使系統開發更容易。
完整的代碼示例可以找到examples / asc_gmm_simple.py。
- 數據集初始化
本示例使用針對DCASE2013發布的聲場景數據集,處理此類的數據集類隨dcase_utils:dcase_util.datasets.DCASE2013_Scenes_DevelopmentSet一起提供。
數據集需要先下載,解壓縮到磁盤,并準備好使用:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 | import os import dcase_util # Setup logging dcase_util.utils.setup_logging() log = dcase_util.ui.FancyLogger() log.title('Acoustic Scene Classification Example / GMM') # Create dataset object and set dataset to be stored under 'data' directory. db = dcase_util.datasets.DCASE2013_Scenes_DevelopmentSet( data_path='data' ) # Initialize dataset (download, extract and prepare it). db.initialize() # Show dataset information db.show() # DictContainer :: Class # audio_source : Field recording # audio_type : Natural # authors : D. Giannoulis, E. Benetos, D. Stowell, and M. D. Plumbley # microphone_model : Soundman OKM II Klassik/studio A3 electret microphone # recording_device_model : Unknown # title : IEEE AASP CASA Challenge - Public Dataset for Scene Classification Task # url : https://archive.org/details/dcase2013_scene_classification # # MetaDataContainer :: Class # Filename : data/DCASE2013-acoustic-scenes-development/meta.txt # Items : 100 # Unique # Files : 100 # Scene labels : 10 # Event labels : 0 # Tags : 0 # # Scene statistics # Scene label Count # -------------------- ------ # bus 10 # busystreet 10 # office 10 # openairmarket 10 # park 10 # quietstreet 10 # restaurant 10 # supermarket 10 # tube 10 # tubestation 10 |
- 特征提取
通常,為所有音頻文件提取特征并將其存儲在磁盤上是最有效的,而不是在每次需要聲學特征時都提取它們。 示例如何執行此操作:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | log.section_header('Feature Extraction') # Prepare feature extractor extractor = dcase_util.features.MfccStaticExtractor( fs=44100, win_length_seconds=0.04, hop_length_seconds=0.02, n_mfcc=14 ) # Define feature storage path feature_storage_path = os.path.join('system_data', 'features') # Make sure path exists dcase_util.utils.Path().create(feature_storage_path) # Loop over all audio files in the dataset and extract features for them. for audio_filename in db.audio_files: # Show some progress log.line(os.path.split(audio_filename)[1], indent=2) # Get filename for feature data from audio filename feature_filename = os.path.join( feature_storage_path, os.path.split(audio_filename)[1].replace('.wav', '.cpickle') ) # Load audio data audio = dcase_util.containers.AudioContainer().load( filename=audio_filename, mono=True, fs=extractor.fs ) # Extract features and store them into FeatureContainer, and save it to the disk features = dcase_util.containers.FeatureContainer( filename=feature_filename, data=extractor.extract(audio.data), time_resolution=extractor.hop_length_seconds ).save() log.foot() |
- 特征標準化(Feature normalization)
在這個階段,每個交叉驗證折疊都會通過培訓材料,并計算聲學特征的均值和標準差。 這些歸一化因子用于在特征數據在學習和測試階段使用之前對其進行歸一化。
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | log.section_header('Feature Normalization') # Define normalization data storage path normalization_storage_path = os.path.join('system_data', 'normalization') # Make sure path exists dcase_util.utils.Path().create(normalization_storage_path) # Loop over all cross-validation folds and calculate mean and std for the training data for fold in db.folds(): # Show some progress log.line('Fold {fold:d}'.format(fold=fold), indent=2) # Get filename for the normalization factors fold_stats_filename = os.path.join( normalization_storage_path, 'norm_fold_{fold:d}.cpickle'.format(fold=fold) ) # Normalizer normalizer = dcase_util.data.Normalizer(filename=fold_stats_filename) # Loop through all training data for item in db.train(fold=fold): # Get feature filename feature_filename = os.path.join( feature_storage_path, os.path.split(item.filename)[1].replace('.wav', '.cpickle') ) # Load feature matrix features = dcase_util.containers.FeatureContainer().load( filename=feature_filename ) # Accumulate statistics normalizer.accumulate(features.data) # Finalize and save normalizer.finalize().save() log.foot() |
- 模型學習
在這個階段,雖然通過交叉驗證折疊了訓練材料,并且學習和存儲聲學模型。
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 | log.section_header('Learning') # Imports from sklearn.mixture import GaussianMixture import numpy # Define model data storage path model_storage_path = os.path.join('system_data', 'model') # Make sure path exists dcase_util.utils.Path().create(model_storage_path) # Loop over all cross-validation folds and learn acoustic models for fold in db.folds(): # Show some progress log.line('Fold {fold:d}'.format(fold=fold), indent=2) # Get model filename fold_model_filename = os.path.join( model_storage_path, 'model_fold_{fold:d}.cpickle'.format(fold=fold) ) # Get filename for the normalizer fold_stats_filename = os.path.join( normalization_storage_path, 'norm_fold_{fold:d}.cpickle'.format(fold=fold) ) # Normalizer normalizer = dcase_util.data.Normalizer().load(filename=fold_stats_filename) # Collect class wise training data class_wise_data = {} for scene_label in db.scene_labels(): class_wise_data[scene_label] = [] # Loop through all training items from specific scene class for item in db.train(fold=fold).filter(scene_label=scene_label): # Get feature filename feature_filename = os.path.join( feature_storage_path, os.path.split(item.filename)[1].replace('.wav', '.cpickle') ) # Load all features. features = dcase_util.containers.FeatureContainer().load( filename=feature_filename ) # Normalize features. normalizer.normalize(features) # Store feature data. class_wise_data[scene_label].append(features.data) # Initialize model container. model = dcase_util.containers.DictContainer(filename=fold_model_filename) # Loop though all scene classes and train acoustic model for each for scene_label in db.scene_labels(): # Show some progress log.line('[{scene_label}]'.format(scene_label=scene_label), indent=4) # Train acoustic model model[scene_label] = GaussianMixture( n_components=8 ).fit( numpy.hstack(class_wise_data[scene_label]).T ) # Save model to the disk model.save() log.foot() |
- 測試
在這個階段,測試材料通過每次交叉驗證折疊,并且針對每個測試樣本估計場景類別。
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 | log.section_header('Testing') # Define model data storage path results_storage_path = os.path.join('system_data', 'results') # Make sure path exists dcase_util.utils.Path().create(results_storage_path) # Loop over all cross-validation folds and test for fold in db.folds(): # Show some progress log.line('Fold {fold:d}'.format(fold=fold), indent=2) # Get model filename fold_model_filename = os.path.join( model_storage_path, 'model_fold_{fold:d}.cpickle'.format(fold=fold) ) # Load model model = dcase_util.containers.DictContainer().load( filename=fold_model_filename ) # Get filename for the normalizer fold_stats_filename = os.path.join( normalization_storage_path, 'norm_fold_{fold:d}.cpickle'.format(fold=fold) ) # Normalizer normalizer = dcase_util.data.Normalizer().load(filename=fold_stats_filename) # Get results filename fold_results_filename = os.path.join(results_storage_path, 'res_fold_{fold:d}.txt'.format(fold=fold)) # Initialize results container res = dcase_util.containers.MetaDataContainer(filename=fold_results_filename) # Loop through all test files from the current cross-validation fold for item in db.test(fold=fold): # Get feature filename feature_filename = os.path.join( feature_storage_path, os.path.split(item.filename)[1].replace('.wav', '.cpickle') ) # Load all features. features = dcase_util.containers.FeatureContainer().load( filename=feature_filename ) # Normalize features. normalizer.normalize(features) # Initialize log likelihoods matrix logls = numpy.ones((db.scene_label_count(), features.frames)) * -numpy.inf # Loop through all scene classes and get likelihood for each per frame for scene_label_id, scene_label in enumerate(db.scene_labels()): logls[scene_label_id] = model[scene_label].score_samples(features.data.T) # Accumulate log likelihoods accumulated_logls = dcase_util.data.ProbabilityEncoder().collapse_probabilities( probabilities=logls, operator='sum' ) # Estimate scene label based on max likelihood. estimated_scene_label = dcase_util.data.ProbabilityEncoder( label_list=db.scene_labels() ).max_selection( probabilities=accumulated_logls ) # Store result into results container res.append( { 'filename': item.filename, 'scene_label': estimated_scene_label } ) # Save results container res.save() log.foot() |
- 評估
在這個階段,系統輸出是根據數據集提供的地面實況來評估的。
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 | log.section_header('Evaluation') # Imports import sed_eval all_res = [] overall = [] class_wise_results = numpy.zeros((len(db.folds()), len(db.scene_labels()))) for fold in db.folds(): # Get results filename fold_results_filename = os.path.join( results_storage_path, 'res_fold_{fold:d}.txt'.format(fold=fold) ) # Get reference scenes reference_scene_list = db.eval(fold=fold) for item_id, item in enumerate(reference_scene_list): # Modify data for sed_eval reference_scene_list[item_id]['file'] = item.filename # Load estimated scenes estimated_scene_list = dcase_util.containers.MetaDataContainer().load( filename=fold_results_filename ) for item_id, item in enumerate(estimated_scene_list): # Modify data for sed_eval estimated_scene_list[item_id]['file'] = item.filename # Initialize evaluator evaluator = sed_eval.scene.SceneClassificationMetrics(scene_labels=db.scene_labels()) # Evaluate estimated against reference. evaluator.evaluate( reference_scene_list=reference_scene_list, estimated_scene_list=estimated_scene_list ) # Get results results = dcase_util.containers.DictContainer(evaluator.results()) # Store fold-wise results all_res.append(results) overall.append(results.get_path('overall.accuracy')*100) # Get scene class-wise results class_wise_accuracy = [] for scene_label_id, scene_label in enumerate(db.scene_labels()): class_wise_accuracy.append(results.get_path(['class_wise', scene_label, 'accuracy', 'accuracy'])) class_wise_results[fold-1, scene_label_id] = results.get_path(['class_wise', scene_label, 'accuracy', 'accuracy']) # Form results table cell_data = class_wise_results scene_mean_accuracy = numpy.mean(cell_data, axis=0).reshape((1, -1)) cell_data = numpy.vstack((cell_data, scene_mean_accuracy)) fold_mean_accuracy = numpy.mean(cell_data, axis=1).reshape((-1, 1)) cell_data = numpy.hstack((cell_data, fold_mean_accuracy)) scene_list = db.scene_labels() scene_list.extend(['Average']) cell_data = [scene_list] + (cell_data*100).tolist() column_headers = ['Scene'] for fold in db.folds(): column_headers.append('Fold {fold:d}'.format(fold=fold)) column_headers.append('Average') log.table( cell_data=cell_data, column_headers=column_headers, column_separators=[0, 5], row_separators=[10], indent=2 ) log.foot() |
- Results:
| row 1 col 1 | row 1 col 2 | row 1 col 2 | row 1 col 2 | row 1 col 2 | row 1 col 2 | row 1 col 2 |
| row 2 col 1 | row 2 col 2 |
| bus | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
| busystreet | 100.00 | 33.33 | 33.33 | 100.00 | 66.67 | 66.67 |
| office | 66.67 | 100.00 | 100.00 | 66.67 | 100.00 | 86.67 |
| openairmarket | 66.67 | 100.00 | 0.00 | 66.67 | 100.00 | 66.67 |
| park | 33.33 | 33.33 | 0.00 | 33.33 | 33.33 | 26.67 |
| quietstreet | 66.67 | 100.00 | 33.33 | 66.67 | 66.67 | 66.67 |
| restaurant | 66.67 | 0.00 | 66.67 | 0.00 | 33.33 | 33.33 |
| supermarket | 33.33 | 0.00 | 33.33 | 0.00 | 33.33 | 20.00 |
| tube | 100.00 | 33.33 | 33.33 | 66.67 | 66.67 | 60.00 |
| tubestation | 0.00 | 66.67 | 66.67 | 0.00 | 0.00 | 26.67 |
| Average | 63.33 | 56.67 | 46.67 | 50.00 | 60.00 | 55.33 |
總結
以上是生活随笔為你收集整理的dcase_util教程的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 意外险是指哪些意外
- 下一篇: 老干妈如今做到这么大,为什么她就是没遇到