日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

dcase_util教程

發(fā)布時間:2023/12/15 编程问答 28 豆豆
生活随笔 收集整理的這篇文章主要介紹了 dcase_util教程 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

關(guān)于音頻場景識別的一個基于python的程序合集dcase_util,是英文的,所以在此將其翻譯成了中文,以便查閱。

dcase_util文檔描述了為聲場景和事件檢測和分類(DCASE)創(chuàng)建的實用程序集合。這些實用程序最初是為DCASE挑戰(zhàn)基線系統(tǒng)(2016和2017)創(chuàng)建的,并捆綁到獨立庫中以允許其在其他研究項目中重復(fù)使用。

這些實用程序的主要目標(biāo)是簡化研究代碼,使其更易讀,更易于維護。大多數(shù)實施的實用程序都與音頻數(shù)據(jù)集相關(guān):處理元數(shù)據(jù)和各種形式的其他結(jié)構(gòu)化數(shù)據(jù),并為各種來源的音頻數(shù)據(jù)集提供標(biāo)準(zhǔn)化的使用API。

dcase_util中包含的概念有:

  • Dataset:一組音頻信號和與之相關(guān)的參考標(biāo)注。
  • Container:類存儲數(shù)據(jù)并基于要存儲的數(shù)據(jù)類型,提供有意義且清晰的數(shù)據(jù)訪問。
  • Metadata:數(shù)據(jù)注釋。
  • Repository:容器來存儲多個數(shù)據(jù)容器。
  • Encoder:用于將數(shù)據(jù)從不同類型轉(zhuǎn)換為另一類型。
  • Processor:為處理數(shù)據(jù)提供統(tǒng)一API的類。
  • ProcessingChain:在一個鏈中連接多個數(shù)據(jù)處理器,允許構(gòu)建數(shù)據(jù)的復(fù)雜處理。

在搭建好了python環(huán)境之后,使用下列pip命令安裝dcase_util。

1
pip install dcase_util

下面是各個單元的使用教程

1.Container

該庫提供數(shù)據(jù)container以簡化工作流程。 這些container是從標(biāo)準(zhǔn)的Python container(例如對象,列表和字典)繼承的,以使它們可以與其他工具和庫一起使用。 這些數(shù)據(jù)的目的是為了包裝數(shù)據(jù),使用有用的方法來訪問和操作數(shù)據(jù),以及加載和存儲數(shù)據(jù)。

1.1基本用法

  • 四種從文件中加載內(nèi)容的方式:
1
2
3
4
5
6
7
8
9
10
11
12
13
# 1
dict_container = dcase_util.containers.DictContainer(filename='test.yaml')
dict_container.load()

# 2
dict_container = dcase_util.containers.DictContainer()
dict_container.load(filename='test.yaml')

# 3
dict_container = dcase_util.containers.DictContainer(filename='test.yaml').load()

# 4
dict_container = dcase_util.containers.DictContainer().load(filename='test.yaml')
  • 將內(nèi)容保存到文件
1
dict_container.save(filename='test.yaml')
  • 查看并在控制器中打印container內(nèi)容
1
dict_container.show()
  • 查看并在標(biāo)準(zhǔn)日志記錄系統(tǒng)中打印
1
dict_container.log()

如果日志記錄系統(tǒng)在調(diào)用之前未被初始化,那么將使用帶默認(rèn)參數(shù)的dcase_util.utils.setup_logging來初始化它。

1.2 字典

dcase_util.containers.DictContainer設(shè)計用于嵌套字典比標(biāo)準(zhǔn)字典數(shù)據(jù)container更容易一些。它允許通過所謂的虛線路徑或路徑部分列表訪問嵌套字典中的字段。

  • 用字典初始化container
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
dict_container = dcase_util.containers.DictContainer(
{
'test': {
'field1': 1,
'field2': 2,
},
'test2': 100,
'test3': {
'field1': {
'fieldA': 1
},
'field2': {
'fieldA': 1
},
'field3': {
'fieldA': 1
},
}
}
)
  • 初始化container,從文件中加載內(nèi)容
1
dict_container = dcase_util.containers.DictContainer().load(filename='test.yaml')
  • 通過點路徑(dotted path)訪問字段:
1
2
3
4
5
6
7
8
# Field exists
value = dict_container.get_path('test.field1')

# Using wild card
values = dict_container.get_path('test3.*')

# Non existing field with default value
value = dict_container.get_path('test.fieldA', 'default_value')
  • 通過路徑部分列表訪問字段
1
2
3
4
5
# Field exists
value = dict_container.get_path(['test', 'field1'])

# Non existing field with default value
value = dict_container.get_path(['test', 'fieldA'], 'default_value)
  • 通過點路勁設(shè)置字段
1
dict_container.set_path('test.field2', 200)
  • 獲取嵌套字典中所有葉節(jié)點的點路徑
1
dict_container.get_leaf_path_list()
  • 獲取嵌套字典中以’field’開頭的所有葉節(jié)點的虛線路徑
1
dict_container.get_leaf_path_list(target_field_startswith='field')
  • 把container存成.yaml文件
1
dict_container.save(filename='test.yaml')
  • 從.yaml文件中加載container內(nèi)容
1
dict_container.load(filename='test.yaml')

1.3 字典列表

dcase_util.containers.ListDictContainer是用于存儲dcase_util.containers.DictContainer的列表。

  • 用字典列表初始化container
1
2
3
4
5
6
7
8
listdict_container = dcase_util.containers.ListDictContainer(
[
{'field1': 1, 'field2': 2},
{'field1': 10, 'field2': 20},
{'field1': 100, 'field2': 200},
]

)
  • 根據(jù)鍵和值訪問列表中的項目
1
2
3
4
print(listdict_container.search(key='field1', value=10))
# DictContainer
# field1 : 10
# field2 : 20
  • 得到字典中特定字段的值
1
2
print(ld.get_field(field_name='field2'))
# [2, 20, 200]

1.4 Data Containers

  • 其中三種是可用的幾種數(shù)據(jù)容器類型:
    • dcase_util.containers.DataArrayContainer,數(shù)組數(shù)據(jù)的數(shù)據(jù)容器,內(nèi)部數(shù)據(jù)存儲在numpy.array中。(可用)
    • dcase_util.containers.DataMatrix2DContainer,二維數(shù)據(jù)矩陣的數(shù)據(jù)容器,內(nèi)部數(shù)據(jù)存儲在二維numpy.ndarray中。(可用)
    • case_util.containers.DataMatrix3DContainer`,用于三維數(shù)據(jù)矩陣的數(shù)據(jù)容器,內(nèi)部數(shù)據(jù)存儲在3-D numpy.ndarray中。(可用)
    • dcase_util.containers.BinaryMatrixContainer,用于二維二進制數(shù)據(jù)矩陣的數(shù)據(jù)容器,內(nèi)部數(shù)據(jù)存儲在二維numpy.ndarray中。(不可用)
  • 用隨機矩陣10x100初始化container,并將時間分辨率設(shè)置為20ms:
1
2
3
4
data_container = dcase_util.containers.DataMatrix2DContainer(
data=numpy.random.rand(10,100),
time_resolution=0.02
)

當(dāng)存儲例如聲學(xué)特征時,時間分辨率對應(yīng)于特征提取幀跳長。

  • 直接訪問數(shù)據(jù)矩陣:
1
2
print(data_container.data.shape)
# (10, 100)
  • 顯示container信息
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
data_container.show()
# DataMatrix2DContainer :: Class
# Data
# data : matrix (10,100)
# Dimensions
# time_axis : 1
# data_axis : 0
# Timing information
# time_resolution : 0.02 sec
# Meta
# stats : Calculated
# metadata : -
# processing_chain : -
# Duration
# Frames : 100
# Seconds : 2.00 sec

該容器具有聚焦機制,可靈活捕捉數(shù)據(jù)矩陣的一部分。 可以根據(jù)時間進行對焦(如果時間分辨率已定義,則以秒為單位),或基于幀ID。

  • 使用焦點在0.5秒和1.0秒之間獲取部分?jǐn)?shù)據(jù):
1
2
print(data_container.set_focus(start_seconds=0.5, stop_seconds=1.0).get_focused().shape)
# (10, 25)
  • 使用焦點獲取第10幀和第50幀之間的零件數(shù)據(jù):
1
2
print(data_container.set_focus(start=10, stop=50).get_focused().shape)
# (10, 40)
  • 重置焦點并訪問完整的數(shù)據(jù)矩陣:
1
2
3
data_container.reset_focus()
print(data_container.get_focused().shape)
# (10, 100)
  • 訪問幀1,2,10和30
1
data_container.get_frames(frame_ids=[1,2,10,30])
  • 訪問幀1-5,每列只有第一個值:
1
data_container.get_frames(frame_ids=[1,2,3,4,5], vector_ids=[0])
  • 轉(zhuǎn)置矩陣:
1
2
3
transposed_data = data_container.T
print(transposed_data.shape)
# (100, 10)
  • 繪制數(shù)據(jù):
1
data_container.plot()

dcase_util.containers.BinaryMatrixContainer提供與DataMatrix2DContainer相同的用途,但是用于二進制內(nèi)容。

1.5 存儲庫(Repositories)

dcase_util.containers.DataRepository和dcase_util.containers.FeatureRepository是可用于存儲多個其他數(shù)據(jù)容器的容器。 存儲庫存儲具有兩個級別信息的數(shù)據(jù):標(biāo)簽和流。 標(biāo)簽是更高級別的密鑰,流是第二級。

例如,可以使用儲存庫來儲存與相同音頻信號有關(guān)的多個不同聲學(xué)特征。 流ID可用于存儲從不同音頻通道提取的特征。 后面的功能可以使用提取器標(biāo)簽和流ID進行訪問。

  • 用數(shù)據(jù)初始化容器:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
data_repository = dcase_util.containers.DataRepository(
data={
'label1': {
'stream0': {
'data': 100
},
'stream1': {
'data': 200
}
},
'label2': {
'stream0': {
'data': 300
},
'stream1': {
'data': 400
}
}
}
)
  • 顯示container信息:
1
2
3
4
5
6
7
8
9
10
11
data_repository. show()
# DataRepository :: Class
# Repository info
# Item class : DataMatrix2DContainer
# Item count : 2
# Labels : ['label1', 'label2']
# Content
# [label1][stream1] : {'data': 200}
# [label1][stream0] : {'data': 100}
# [label2][stream1] : {'data': 400}
# [label2][stream0] : {'data': 300}
  • 訪問存儲庫中的數(shù)據(jù):
1
2
data_repository.get_container(label='label1',stream_id='stream1')
# {'data': 200}
  • 設(shè)置數(shù)據(jù)
1
2
3
4
5
6
7
8
9
10
11
12
13
data_repository.set_container(label='label3',stream_id='stream0', container={'data':500})
data_repository. show()
# DataRepository :: Class
# Repository info
# Item class : DataMatrix2DContainer
# Item count : 3
# Labels : ['label1', 'label2', 'label3']
# Content
# [label1][stream1] : {'data': 200}
# [label1][stream0] : {'data': 100}
# [label2][stream1] : {'data': 400}
# [label2][stream0] : {'data': 300}
# [label3][stream0] : {'data': 500}

2. 音頻

dcase_util.containers.AudioContainer是多聲道音頻的數(shù)據(jù)容器。它讀取多種格式(WAV,FLAC,M4A,WEBM)并寫入WAV和FLAC文件。 直接從Youtube下載音頻內(nèi)容也受支持。

2.1 創(chuàng)建container

  • 創(chuàng)建雙通道的音頻container
1
2
3
4
5
audio_container = dcase_util.containers.AudioContainer(fs=44100)
t = numpy.linspace(0, 2, 2 * audio_container.fs, endpoint=False)
x1 = numpy.sin(220 * 2 * numpy.pi * t)
x2 = numpy.sin(440 * 2 * numpy.pi * t)
audio_container.data = numpy.vstack([x1, x2])
  • 顯示的container信息
1
2
3
4
5
6
7
# AudioContainer :: Class
# Sampling rate : 44100
# Channels : 2
# Duration
# Seconds : 2.00 sec
# Milliseconds : 2000.00 ms
# Samples : 88200 samples

2.2 加載和保存

  • 加載

    1
    2
    3
    audio_container = dcase_util.containers.AudioContainer().load(
    filename=dcase_util.utils.Example.audio_filename()
    )
  • 顯示的container信息

1
2
3
4
5
6
7
8
9
# AudioContainer :: Class
# Filename : acoustic_scene.flac
# Synced : Yes
# Sampling rate : 44100
# Channels : 2
# Duration
# Seconds : 10.00 sec
# Milliseconds : 10000.02 ms
# Samples : 441001 samples
  • 從Youtube加載content
1
2
3
4
5
audio_container = dcase_util.containers.AudioContainer().load_from_youtube(
query_id='2ceUOv8A3FE',
start=1,
stop=5
)

2.3 焦點段(Focus segment)

container具有聚焦機制,可靈活捕捉部分音頻數(shù)據(jù),同時保持完整的音頻信號不變。 可以根據(jù)時間進行聚焦(如果時間分辨率已定義,則以秒為單位),或基于樣本ID。 可以對單聲道或混音(單聲道)頻道進行聚焦。 音頻容器內(nèi)容可以通過凍結(jié)來替代焦點細(xì)分。

  • 使用焦點在0.5秒和1.0秒之間獲取部分?jǐn)?shù)據(jù):
1
2
print(audio_container.set_focus(start_seconds=0.5, stop_seconds=1.0).get_focused().shape)
# (2, 22050)
  • 使用焦點從5秒開始持續(xù)2秒獲取部分?jǐn)?shù)據(jù):
1
2
print(audio_container.set_focus(start_seconds=5, duration_seconds=2.0).get_focused().shape)
# (2, 88200)
  • 使用焦點從5秒開始持續(xù)2秒獲取部分?jǐn)?shù)據(jù),混合兩個立體聲聲道:
1
2
print(audio_container.set_focus(start_seconds=5, duration_seconds=2.0, channel='mixdown').get_focused().shape)
# (88200,)
  • 使用焦點從5秒開始2秒開始數(shù)取部分?jǐn)?shù)據(jù),在兩個立體聲通道左側(cè):
1
2
print(audio_container.set_focus(start_seconds=5, duration_seconds=2.0, channel='left').get_focused().shape)
# (88200,)
  • 使用焦點從5秒開始2秒開始數(shù)取部分?jǐn)?shù)據(jù),秒音頻通道(索引從0開始):
1
2
print(audio_container.set_focus(start_seconds=5, duration_seconds=2.0, channel=1).get_focused().shape)
# (88200,)
  • 使用焦點獲取樣本44100和88200之間的零件數(shù)據(jù):
1
2
print(audio_container.set_focus(start=44100, stop=88200).get_focused().shape)
# (2, 44100)
  • 重置焦點并訪問完整的數(shù)據(jù)矩陣:
1
2
3
audio_container.reset_focus()
print(audio_container.get_focused().shape)
# (2, 441001)
  • 使用焦點從5秒開始2秒開始數(shù)取部分?jǐn)?shù)據(jù),,并凍結(jié)該部分
1
2
3
audio_container.set_focus(start_seconds=5, duration_seconds=2.0).freeze()
print(audio_container.shape)
# (2, 88200)

2.4 處理

  • 正則化音頻
1
audio_container.normalize()
  • 對音頻重采樣到目標(biāo)采樣率:
1
audio_container.resample(target_fs=16000)

2.5 可視化

  • 繪圖波形:
1
audio_container.plot_wave()
  • 繪制頻譜
1
audio_container.plot_spec()

3. 聲學(xué)特征

庫提供基本的聲學(xué)特征提取器:dcase_util.features.MelExtractor,dcase_util.features.MfccStaticExtractor,dcase_util.features.MfccDeltaExtractor,dcase_util.features.MfccAccelerationExtractor,dcase_util.features.ZeroCrossingRateExtractor,dcase_util.features.RMSEnergyExtractor和dcase_util.features.SpectralCentroidExtractor。

3.1 特征提取

  • 為音頻信號提取梅爾帶能量(使用默認(rèn)參數(shù)):
1
2
3
4
5
6
7
8
9
10
# Get audio in a container, mixdown of a stereo signal
audio_container = dcase_util.containers.AudioContainer().load(
filename=dcase_util.utils.Example.audio_filename()
).mixdown()

# Create extractor
mel_extractor = dcase_util.features.MelExtractor()

# Extract features
mels = mel_extractor.extract(audio_container)
  • 為特定音頻段提取梅爾帶能量:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# Get audio in a container, mixdown of a stereo signal
audio_container = dcase_util.containers.AudioContainer().load(
filename=dcase_util.utils.Example.audio_filename()
).mixdown()

# Set focus
audio_container.set_focus(start_seconds=1.0, stop_seconds=4.0)

# Create extractor
mel_extractor = dcase_util.features.MelExtractor()

# Extract features
mels = mel_extractor.extract(audio_container.get_focused())

# Plot
dcase_util.containers.DataMatrix2DContainer(
data=mels,
time_resolution=mel_extractor.hop_length_seconds
).plot()
  • 直接從numpy矩陣中提取特征:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# Create an audio signal
t = numpy.linspace(0, 2, 2 * 44100, endpoint=False)
x1 = numpy.sin(220 * 2 * numpy.pi * t)

# Create extractor
mel_extractor = dcase_util.features.MelExtractor()

# Extract features
mels = mel_extractor.extract(x1)

# Plot
dcase_util.containers.DataMatrix2DContainer(
data=mels,
time_resolution=mel_extractor.hop_length_seconds
).plot()

庫中提供的所有音頻提取器適用于單聲道音頻。

3.2 可視化

  • 繪制提取的特征
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# Get audio in a container, mixdown of a stereo signal
audio_container = dcase_util.containers.AudioContainer().load(
filename=dcase_util.utils.Example.audio_filename()
).mixdown()

# Create extractor
mel_extractor = dcase_util.features.MelExtractor()

# Extract features
mels = mel_extractor.extract(audio_container)

# Plotting
dcase_util.containers.DataMatrix2DContainer(
data=mels,
time_resolution=mel_extractor.hop_length_seconds
).plot()

4. 數(shù)據(jù)處理

4.1 數(shù)據(jù)操作

有幾個不同的實用程序來操作數(shù)據(jù):

  • dcase_util.data.Normalizer,計算歸一化因子和歸一化數(shù)據(jù)。
  • dcase_util.data.RepositoryNormalizer,一次性標(biāo)準(zhǔn)化數(shù)據(jù)存儲庫。
  • dcase_util.data.Aggregator,聚合滑動處理窗口中的數(shù)據(jù)。
  • dcase_util.data.Sequencer,對數(shù)據(jù)矩陣進行排序。
  • dcase_util.data.Stacker,基于給定的矢量配方堆疊數(shù)據(jù)矩陣。
  • dcase_util.data.Selector,根據(jù)具有開始和偏移量的事件選擇數(shù)據(jù)的數(shù)據(jù)段。
  • dcase_util.data.Masker,基于具有開始和偏移的事件來掩蓋數(shù)據(jù)的數(shù)據(jù)段。

4.1.1 歸一化

dcase_util.data.Normalizer類可用于計算數(shù)據(jù)的歸一化因子(平均值和標(biāo)準(zhǔn)偏差),而不一次讀取所有數(shù)據(jù)。在小部分讀取數(shù)據(jù)時累計中間統(tǒng)計數(shù)據(jù)。

  • 逐個文件計算歸一化因子:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
data = dcase_util.utils.Example.feature_container()

# Initialize normalizer
normalizer = dcase_util.data.Normalizer()

# Accumulate -- feed data per file in
normalizer.accumulate(data=data)

# After accumulation calculate normalization factors (mean + std)
normalizer.finalize()

# Save
normalizer.save(filename='norm_factors.cpickle')

# Load
normalizer = dcase_util.data.Normalizer().load(filename='norm_factors.cpickle')
  • 使用聲明:
1
2
3
4
5
6
7
8
data = dcase_util.utils.Example.feature_container()

# Accumulate
with dcase_util.data.Normalizer() as normalizer:
normalizer.accumulate(data=data)

# Save
normalizer.save(filename='norm_factors.cpickle')
  • 用預(yù)先計算的值初始化標(biāo)準(zhǔn)化器(normalizer):
1
2
3
4
data = dcase_util.utils.Example.feature_container()
normalizer = dcase_util.data.Normalizer(
**data.stats
)
  • 歸一化數(shù)據(jù)
1
2
3
data = dcase_util.utils.Example.feature_container()
normalizer = dcase_util.data.Normalizer().load(filename='norm_factors.cpickle')
normalizer.normalize(data)

4.1.2 聚合(Aggregator)

數(shù)據(jù)聚合器類(dcase_util.data.Aggregator)可用于在滑動處理窗口中處理數(shù)據(jù)矩陣。 這個處理階段可以用來通過計算它們的平均值和標(biāo)準(zhǔn)偏差來折疊具有特定窗口長度的數(shù)據(jù),或者將該矩陣平坦化為單個向量。

支持的處理方法:flatten、mean、std、cov、kurtosis、skew。所有這些處理方法都可以結(jié)合使用。

  • 計算分為10幀窗口的均值和標(biāo)準(zhǔn)差,以1幀跳躍(Calculating mean and standard deviation in 10 frame window, with 1 frame hop):
1
2
3
4
5
6
7
8
9
10
11
12
data = dcase_util.utils.Example.feature_container()
print(data.shape)
# (40, 501)

data_aggregator = dcase_util.data.Aggregator(
recipe=['mean', 'std'],
win_length_frames=10,
hop_length_frames=1,
)
data_aggregator.aggregate(data)
print(data.shape)
# (80, 501)
  • 將具有10幀的數(shù)據(jù)矩陣壓縮成一個單獨的向量,具有1幀跳躍(with 1 frame hop):
1
2
3
4
5
6
7
8
9
10
11
12
data = dcase_util.utils.Example.feature_container()
print(data.shape)
# (40, 501)

data_aggregator = dcase_util.data.Aggregator(
recipe=['flatten'],
win_length_frames=10,
hop_length_frames=1,
)
data_aggregator.aggregate(data)
print(data.shape)
# (400, 501)

4.1.3 測序(Sequencing)

Sequencer類(dcase_util.data.Sequencer)將數(shù)據(jù)矩陣處理成序列(圖像)。序列可以重疊,并且可以在調(diào)用之間改變排序網(wǎng)格(移位)。

  • 創(chuàng)建序列:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
data = dcase_util.utils.Example.feature_container()
print(data.shape)
# (40, 501)
data_sequencer = dcase_util.data.Sequencer(
frames=10,
hop_length_frames=10
)
sequenced_data = data_sequencer.sequence(data)
print(sequenced_data.shape)
# (40, 10, 50)

sequenced_data.show()
# DataMatrix3DContainer :: Class
# Data
# data : matrix (40,10,50)
# Dimensions
# time_axis : 1
# Timing information
# time_resolution : None
# Meta
# stats : Calculated
# metadata : -
# processing_chain : -
# Duration
# Frames : 10
# Data
# Dimensions
# time_axis : 1
# data_axis : 0
# sequence_axis : 2

4.1.4 堆疊(Stacker)

Stacker類(dcase_util.data.Stacker)根據(jù)配方堆疊存儲在數(shù)據(jù)存儲庫中的數(shù)據(jù)。例如,可以使用此類創(chuàng)建包含使用多個特征提取器提取的數(shù)據(jù)的新特征矩陣。使用配方,您可以選擇全矩陣,只有部分具有開始和結(jié)束索引的數(shù)據(jù)向量,或選擇單個數(shù)據(jù)行。

例:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# Load data repository
repo = dcase_util.utils.Example.feature_repository()

# Show labels in the repository
print(repo.labels)

# Select full matrix from 'mel' and with default stream (0) (40 mel bands).
data = dcase_util.data.Stacker(recipe='mel').stack(repo)
print(data.shape)
# (40, 501)

# Select full matrix from 'mel' and define stream 0 (40 mel bands).
data = dcase_util.data.Stacker(recipe='mel=0').stack(repo)
print(data.shape)
# (40, 501)

# Select full matrix from 'mel' and 'mfcc' with default stream (0) (40 mel bands + 20 mfccs).
data = dcase_util.data.Stacker(recipe='mel;mfcc').stack(repo)
print(data.shape)
# (60, 501)

# Select data from 'mfcc' matrix with default stream (0), and omit first coefficient (19 mfccs).
data = dcase_util.data.Stacker(recipe='mfcc=1-19').stack(repo)
print(data.shape)
# (19, 501)

# Select data from 'mfcc' matrix with default stream (0), select coefficients 1,5,7 (3 mfccs).
data = dcase_util.data.Stacker(recipe='mfcc=1,5,7').stack(repo)
print(data.shape)
# (3, 501)

4.2 數(shù)據(jù)編碼

數(shù)據(jù)編碼器可用于將參考元數(shù)據(jù)轉(zhuǎn)換為二進制矩陣。

  • one hot
    OneHotEncoder類(dcase_util.data.OneHotEncoder)可用于創(chuàng)建二進制矩陣,其中單個類在整個信號中處于活動狀態(tài)。該編碼器適用于多類單標(biāo)簽分類應(yīng)用。

例:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# Initilize encoder
onehot_encoder = dcase_util.data.OneHotEncoder(
label_list=['class A','class B','class C'],
time_resolution=0.02
)

# Encode
binary_matrix = onehot_encoder.encode(
label='class B',
length_seconds=10.0
)

# Visualize
binary_matrix.plot()
  • Many-hot
    ManyHotEncoder類(dcase_util.data.ManyHotEncoder)可用于創(chuàng)建二進制矩陣,其中多個類在整個信號中處于活動狀態(tài)。 該編碼器適用于多類多標(biāo)簽分類應(yīng)用,如音頻標(biāo)簽。

例:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# Initilize encoder
manyhot_encoder = dcase_util.data.ManyHotEncoder(
label_list=['class A','class B','class C'],
time_resolution=0.02
)

# Encode
binary_matrix = manyhot_encoder.encode(
label_list=['class A', 'class B'],
length_seconds=10.0
)

# Visualize
binary_matrix.plot()
  • 事件滾動(Event roll)

EventRollEncoder類(dcase_util.data.EventRollEncoder)可用于創(chuàng)建二進制矩陣,其中多個事件在指定的時間段內(nèi)處于活動狀態(tài)。 該編碼器適用于事件檢測應(yīng)用。

例子:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# Metadata
meta = dcase_util.containers.MetaDataContainer([
{
'filename': 'test1.wav',
'event_label': 'cat',
'onset': 1.0,
'offset': 3.0
},
{
'filename': 'test1.wav',
'event_label': 'dog',
'onset': 2.0,
'offset': 6.0
},
{
'filename': 'test1.wav',
'event_label': 'speech',
'onset': 5.0,
'offset': 8.0
},
])

# Initilize encoder
event_roll_encoder = dcase_util.data.EventRollEncoder(
label_list=meta.unique_event_labels,
time_resolution=0.02
)

# Encode
event_roll = event_roll_encoder.encode(
metadata_container=meta,
length_seconds=10.0
)

# Visualize
event_roll.plot()

5. Metadata

Library提供容器dcase_util.containers.MetaDataContainer,用于處理來自大多數(shù)DCASE相關(guān)應(yīng)用程序區(qū)域的元數(shù)據(jù):聲場景分類,事件檢測和音頻標(biāo)記。

原則上,元數(shù)據(jù)是一個包含元項目字典的列表,它可以像普通列表一樣使用。

5.1 創(chuàng)建容器(container)

  • 用聲場列表初始化元數(shù)據(jù)容器:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    meta_container_scenes = dcase_util.containers.MetaDataContainer(
    [
    {
    'filename': 'file1.wav',
    'scene_label': 'office'
    },
    {
    'filename': 'file2.wav',
    'scene_label': 'street'
    },
    {
    'filename': 'file3.wav',
    'scene_label': 'home'
    }
    ]
    )
  • 用聲音事件列表初始化元數(shù)據(jù)容器:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
meta_container_events = dcase_util.containers.MetaDataContainer(
[
{
'filename': 'file1.wav',
'event_label': 'speech',
'onset': 10.0,
'offset': 15.0,
},
{
'filename': 'file1.wav',
'event_label': 'footsteps',
'onset': 23.0,
'offset': 26.0,
},
{
'filename': 'file2.wav',
'event_label': 'speech',
'onset': 2.0,
'offset': 5.0,
}
]
)
  • 用標(biāo)簽初始化元數(shù)據(jù)容器:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
meta_container_tags = dcase_util.containers.MetaDataContainer(
[
{
'filename': 'file1.wav',
'tags': ['cat','dog']
},
{
'filename': 'file2.wav',
'tags': ['dog']
},
{
'filename': 'file3.wav',
'tags': ['dog','horse']
}
]
)
  • 顯示內(nèi)容
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
meta_container_scenes.show()
# MetaDataContainer :: Class
# Items : 3
# Unique
# Files : 3
# Scene labels : 3
# Event labels : 0
# Tags : 0
#
# Scene statistics
# Scene label Count
# -------------------- ------
# home 1
# office 1
# street 1

meta_container_events.show()
# MetaDataContainer :: Class
# Items : 3
# Unique
# Files : 2
# Scene labels : 0
# Event labels : 2
# Tags : 0
#
# Event statistics
# Event label Count Tot. Length Avg. Length
# -------------------- ------ ----------- -----------
# footsteps 1 3.00 3.00
# speech 2 8.00 4.00

meta_container_tags.show()
# MetaDataContainer :: Class
# Items : 3
# Unique
# Files : 3
# Scene labels : 0
# Event labels : 0
# Tags : 3
#
# Tag statistics
# Tag Count
# -------------------- ------
# cat 1
# dog 3
# horse 1
  • 顯示內(nèi)容和每個元數(shù)據(jù)項目:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
meta_container_scenes.show_all()
# MetaDataContainer :: Class
# Items : 3
# Unique
# Files : 3
# Scene labels : 3
# Event labels : 0
# Tags : 0
#
# Meta data
# Source Onset Offset Scene Event Tags Identifier
# -------------------- ------ ------ --------------- --------------- --------------- -----
# file1.wav - - office - - -
# file2.wav - - street - - -
# file3.wav - - home - - -
#
# Scene statistics
# Scene label Count
# -------------------- ------
# home 1
# office 1
# street 1

meta_container_events.show_all()
# MetaDataContainer :: Class
# Items : 3
# Unique
# Files : 2
# Scene labels : 0
# Event labels : 2
# Tags : 0
#
# Meta data
# Source Onset Offset Scene Event Tags Identifier
# -------------------- ------ ------ --------------- --------------- --------------- -----
# file1.wav 10.00 15.00 - speech - -
# file1.wav 23.00 26.00 - footsteps - -
# file2.wav 2.00 5.00 - speech - -
#
# Event statistics
# Event label Count Tot. Length Avg. Length
# -------------------- ------ ----------- -----------
# footsteps 1 3.00 3.00
# speech 2 8.00 4.00

5.2 加載和保存

  • 將元數(shù)據(jù)保存到文件中:
1
meta_container_events.save(filename='events.txt')
  • 從注釋文件加載元數(shù)據(jù):
1
2
3
meta_container_events = dcase_util.containers.MetaDataContainer().load(
filename='events.txt'
)

5.3 訪問數(shù)據(jù)

  • 獲取元數(shù)據(jù)中提及的音頻文件及其數(shù)量:
1
2
3
4
5
print(meta_container_events.unique_files)
# ['file1.wav', 'file2.wav']

print(meta_container_events.file_count)
# 2
  • 獲取獨特的場景標(biāo)簽及其數(shù)量:
1
2
3
4
5
print(meta_container_scenes.unique_scene_labels)
# ['home', 'office', 'street']

print(meta_container_scenes.scene_label_count)
# 3
  • 獲取元數(shù)據(jù)中使用的特定事件標(biāo)簽及其數(shù)量:
1
2
3
4
print(meta_container_events.unique_event_labels)
# ['footsteps', 'speech']

print(meta_container_scenes.event_label_count)
  • 獲取元數(shù)據(jù)中使用的獨特標(biāo)簽及其數(shù)量:
1
2
3
4
5
print(meta_container_tags.unique_tags)
# ['cat', 'dog', 'horse']

print(meta_container_tags.tag_count)
# 3

5.4 過濾(Filtering)

  • 基于文件名過濾元數(shù)據(jù):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
filtered = meta_container_events.filter(filename='file1.wav')
filtered.show_all()
# MetaDataContainer :: Class
# Items : 2
# Unique
# Files : 1
# Scene labels : 0
# Event labels : 2
# Tags : 0
#
# Meta data
# Source Onset Offset Scene Event Tags Identifier
# -------------------- ------ ------ --------------- --------------- --------------- -----
# file1.wav 10.00 15.00 - speech - -
# file1.wav 23.00 26.00 - footsteps - -
#
# Event statistics
# Event label Count Tot. Length Avg. Length
# -------------------- ------ ----------- -----------
# footsteps 1 3.00 3.00
# speech 1 5.00 5.00
  • 基于事件標(biāo)簽過濾元數(shù)據(jù):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
filtered = meta_container_events.filter(event_label='speech')
filtered.show_all()
# MetaDataContainer :: Class
# Items : 2
# Unique
# Files : 2
# Scene labels : 0
# Event labels : 1
# Tags : 0
#
# Meta data
# Source Onset Offset Scene Event Tags Identifier
# -------------------- ------ ------ --------------- --------------- --------------- -----
# file1.wav 10.00 15.00 - speech - -
# file2.wav 2.00 5.00 - speech - -
#
# Event statistics
# Event label Count Tot. Length Avg. Length
# -------------------- ------ ----------- -----------
# speech 2 8.00 4.00
  • 基于文件和事件標(biāo)簽過濾元數(shù)據(jù):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
filtered = meta_container_events.filter(filename='file1.wav', event_label='speech')
filtered.show_all()
# MetaDataContainer :: Class
# Items : 1
# Unique
# Files : 1
# Scene labels : 0
# Event labels : 1
# Tags : 0
#
# Meta data
# Source Onset Offset Scene Event Tags Identifier
# -------------------- ------ ------ --------------- --------------- --------------- -----
# file1.wav 10.00 15.00 - speech - -
#
# Event statistics
# Event label Count Tot. Length Avg. Length
# -------------------- ------ ----------- -----------
# speech 1 5.00 5.00
  • 基于時間段過濾,并使段開始新的零時間:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
filtered = meta_container_events.filter_time_segment(filename='file1.wav', start=12, stop=24)
filtered.show_all()
# MetaDataContainer :: Class
# Items : 2
# Unique
# Files : 1
# Scene labels : 0
# Event labels : 2
# Tags : 0
#
# Meta data
# Source Onset Offset Scene Event Tags Identifier
# -------------------- ------ ------ --------------- --------------- --------------- -----
# file1.wav 0.00 3.00 - speech - -
# file1.wav 11.00 12.00 - footsteps - -
#
# Event statistics
# Event label Count Tot. Length Avg. Length
# -------------------- ------ ----------- -----------
# footsteps 1 1.00 1.00
# speech 1 3.00 3.00

5.5 處理(Processing)

  • 將時間偏移量添加到元數(shù)據(jù)項中設(shè)置的開始和偏移量:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
meta_container_events.add_time(time=10)
meta_container_events.show_all()
# MetaDataContainer :: Class
# Items : 3
# Unique
# Files : 2
# Scene labels : 0
# Event labels : 2
# Tags : 0
#
# Meta data
# Source Onset Offset Scene Event Tags Identifier
# -------------------- ------ ------ --------------- --------------- --------------- -----
# file1.wav 20.00 25.00 - speech - -
# file1.wav 33.00 36.00 - footsteps - -
# file2.wav 12.00 15.00 - speech - -
#
# Event statistics
# Event label Count Tot. Length Avg. Length
# -------------------- ------ ----------- -----------
# footsteps 1 3.00 3.00
# speech 2 8.00 4.00
  • 刪除非常短的事件并合并它們之間有小間隙的事件(相同的事件標(biāo)簽):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
meta_container_events = dcase_util.containers.MetaDataContainer(
[
{
'filename': 'file1.wav',
'event_label': 'speech',
'onset': 1.0,
'offset': 2.0,
},
{
'filename': 'file1.wav',
'event_label': 'speech',
'onset': 2.05,
'offset': 2.5,
},
{
'filename': 'file1.wav',
'event_label': 'speech',
'onset': 5.1,
'offset': 5.15,
},
]
)
processed = meta_container_events.process_events(minimum_event_length=0.2, minimum_event_gap=0.1)
processed.show_all()
# MetaDataContainer :: Class
# Items : 1
# Unique
# Files : 1
# Scene labels : 0
# Event labels : 1
# Tags : 0
#
# Meta data
# Source Onset Offset Scene Event Tags Identifier
# -------------------- ------ ------ --------------- --------------- --------------- -----
# file1.wav 1.00 2.50 - speech - -
#
# Event statistics
# Event label Count Tot. Length Avg. Length
# -------------------- ------ ----------- -----------
# speech 1 1.50 1.50

5.6 事件滾動(Event roll)

將事件列表轉(zhuǎn)換為事件滾動(具有事件活動的二進制矩陣):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
meta_container_events = dcase_util.containers.MetaDataContainer(
[
{
'filename': 'file1.wav',
'event_label': 'speech',
'onset': 1.0,
'offset': 2.0,
},
{
'filename': 'file1.wav',
'event_label': 'speech',
'onset': 2.05,
'offset': 2.5,
},
{
'filename': 'file1.wav',
'event_label': 'speech',
'onset': 5.1,
'offset': 5.15,
},
{
'filename': 'file1.wav',
'event_label': 'footsteps',
'onset': 3.1,
'offset': 4.15,
},
{
'filename': 'file1.wav',
'event_label': 'dog',
'onset': 2.6,
'offset': 3.6,
},
]
)
event_roll = meta_container_events.to_event_roll()

# Plot
event_roll.plot()

6. Datasets

數(shù)據(jù)庫類提供在庫中為許多不同組織的音頻數(shù)據(jù)集創(chuàng)建統(tǒng)一的接口。數(shù)據(jù)集將在第一次使用時下載,提取并準(zhǔn)備好使用情況。

提供了四種類型的數(shù)據(jù)集:

  • 聲場景數(shù)據(jù)集,從dcase_util.datasets.AcousticSceneDataset類繼承的類。
  • 聲音事件數(shù)據(jù)集,從dcase_util.datasets.SoundEventDataset類繼承的類。
  • 聲音事件數(shù)據(jù)集合成數(shù)據(jù)創(chuàng)建,從dcase_util.datasets.SyntheticSoundEventDataset類繼承的類。
  • 音頻標(biāo)記數(shù)據(jù)集,從dcase_util.datasets.AudioTaggingDataset類繼承的類。

獲取所有可用數(shù)據(jù)集的列表:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
print(dcase_util.datasets.dataset_list())
# Dataset list
# Class Name | Group | Remote | Local | Audio | Scenes | Events | Tags
# ------------------------------------------ | ----- | ------ | ------ | ----- | ------ | ------ | ----
# DCASE2013_Scenes_DevelopmentSet | scene | 344 MB | 849 MB | 100 | 10 | |
# TUTAcousticScenes_2016_DevelopmentSet | scene | 7 GB | 23 GB | 1170 | 15 | |
# TUTAcousticScenes_2016_EvaluationSet | scene | 2 GB | 5 GB | 390 | 15 | |
# TUTAcousticScenes_2017_DevelopmentSet | scene | 9 GB | 21 GB | 4680 | 15 | |
# TUTAcousticScenes_2017_EvaluationSet | scene | 3 GB | 7 GB | | | |
# DCASE2017_Task4tagging_DevelopmentSet | event | 5 MB | 24 GB | 56700 | 1 | 17 |
# DCASE2017_Task4tagging_EvaluationSet | event | 823 MB | 1 GB | | | |
# TUTRareSoundEvents_2017_DevelopmentSet | event | 7 GB | 28 GB | | | 3 |
# TUTRareSoundEvents_2017_EvaluationSet | event | 4 GB | 4 GB | | | 3 |
# TUTSoundEvents_2016_DevelopmentSet | event | 967 MB | 2 GB | 954 | 2 | 17 |
# TUTSoundEvents_2016_EvaluationSet | event | 449 MB | 989 MB | 511 | 2 | 17 |
# TUTSoundEvents_2017_DevelopmentSet | event | 1 GB | 2 GB | 659 | 1 | 6 |
# TUTSoundEvents_2017_EvaluationSet | event | 370 MB | 798 MB | | | |
# TUT_SED_Synthetic_2016 | event | 4 GB | 5 GB | | | |
# CHiMEHome_DomesticAudioTag_DevelopmentSet | tag | 3 GB | 9 GB | 1946 | 1 | | 7

6.1 初始化數(shù)據(jù)集

要下載,提取和準(zhǔn)備數(shù)據(jù)集(在這種情況下,數(shù)據(jù)集將放置在臨時目錄中,并且只下載與元數(shù)據(jù)相關(guān)的文件):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
import tempfile

db = dcase_util.datasets.TUTAcousticScenes_2016_DevelopmentSet(
data_path=tempfile.gettempdir(),
included_content_types=['meta']
)
db.initialize()
db.show()
# DictContainer :: Class
# audio_source : Field recording
# audio_type : Natural
# authors : Annamaria Mesaros, Toni Heittola, and Tuomas Virtanen
# licence : free non-commercial
# microphone_model : Soundman OKM II Klassik/studio A3 electret microphone
# recording_device_model : Roland Edirol R-09
# title : TUT Acoustic Scenes 2016, development dataset
# url : https://zenodo.org/record/45739
#
# MetaDataContainer :: Class
# Filename : /tmp/TUT-acoustic-scenes-2016-development/meta.txt
# Items : 1170
# Unique
# Files : 1170
# Scene labels : 15
# Event labels : 0
# Tags : 0
#
# Scene statistics
# Scene label Count
# -------------------- ------
# beach 78
# bus 78
# cafe/restaurant 78
# car 78
# city_center 78
# forest_path 78
# grocery_store 78
# home 78
# library 78
# metro_station 78
# office 78
# park 78
# residential_area 78
# train 78
# tram 78

6.2 交叉驗證步驟

通常數(shù)據(jù)集通過交叉驗證設(shè)置提供。

獲得fold訓(xùn)練材料1:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
training_material = db.train(fold=1)
training_material.show()
# MetaDataContainer :: Class
# Filename : /tmp/TUT-acoustic-scenes-2016-development/evaluation_setup/fold1_train.txt
# Items : 880
# Unique
# Files : 880
# Scene labels : 15
# Event labels : 0
# Tags : 0
#
# Scene statistics
# Scene label Count
# -------------------- ------
# beach 59
# bus 59
# cafe/restaurant 60
# car 58
# city_center 60
# forest_path 57
# grocery_store 59
# home 56
# library 57
# metro_station 59
# office 59
# park 58
# residential_area 59
# train 60
# tram 60
  • 獲取fold1的測試材料(無參考數(shù)據(jù)的材料):
1
2
3
4
5
6
7
8
9
10
testing_material = db.test(fold=1)
testing_material.show()
# MetaDataContainer :: Class
# Filename : /tmp/TUT-acoustic-scenes-2016-development/evaluation_setup/fold1_test.txt
# Items : 290
# Unique
# Files : 290
# Scene labels : 0
# Event labels : 0
# Tags : 0
  • 使用完整的參考數(shù)據(jù)獲取fold1的測試材料:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
eval_material = db.eval(fold=1)
eval_material.show()

# MetaDataContainer :: Class
# Filename : /tmp/TUT-acoustic-scenes-2016-development/evaluation_setup/fold1_evaluate.txt
# Items : 290
# Unique
# Files : 290
# Scene labels : 15
# Event labels : 0
# Tags : 0
#
# Scene statistics
# Scene label Count
# -------------------- ------
# beach 19
# bus 19
# cafe/restaurant 18
# car 20
# city_center 18
# forest_path 21
# grocery_store 19
# home 22
# library 21
# metro_station 19
# office 19
# park 20
# residential_area 19
# train 18
# tram 18
  • 要將所有數(shù)據(jù)集合折疊為無:(To get all data set fold to None)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
all_material = db.train(fold=None)
all_material.show()
# MetaDataContainer :: Class
# Filename : /tmp/TUT-acoustic-scenes-2016-development/meta.txt
# Items : 1170
# Unique
# Files : 1170
# Scene labels : 15
# Event labels : 0
# Tags : 0
#
# Scene statistics
# Scene label Count
# -------------------- ------
# beach 78
# bus 78
# cafe/restaurant 78
# car 78
# city_center 78
# forest_path 78
# grocery_store 78
# home 78
# library 78
# metro_station 78
# office 78
# park 78
# residential_area 78
# train 78
# tram 78
  • 迭代所有fold:
1
2
for fold in db.folds:
train_material = db.train(fold=fold)

大多數(shù)數(shù)據(jù)集不提供驗證集拆分。 但是,數(shù)據(jù)集類提供了幾種方法從訓(xùn)練集中生成它,同時保持?jǐn)?shù)據(jù)統(tǒng)計數(shù)據(jù)并確保在訓(xùn)練集和驗證集中不會有來自同一源的數(shù)據(jù)。

為fold1生成平衡的驗證集(平衡完成,以便將來自相同位置的記錄分配給同一集):

1
2
3
4
5
training_files, validation_files = db.validation_split(
fold=1,
split_type='balanced',
validation_amount=0.3
)
  • 為fold1生成隨機驗證集(不平衡):
1
2
3
4
5
training_files, validation_files = db.validation_split(
fold=1,
split_type='random',
validation_amount=0.3
)
  • 獲取數(shù)據(jù)集提供的驗證集(示例中使用的數(shù)據(jù)集不提供它,這會引發(fā)錯誤。):
1
2
3
4
training_files, validation_files = db.validation_split(
fold=1,
split_type='dataset'
)

6.3 元數(shù)據(jù)(Meta data)

  • 獲取與該文件關(guān)聯(lián)的元數(shù)據(jù):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
items = db.file_meta(filename='audio/b086_150_180.wav')
print(items)
# MetaDataContainer :: Class
# Items : 1
# Unique
# Files : 1
# Scene labels : 1
# Event labels : 0
# Tags : 0
#
# Meta data
# Source Onset Offset Scene Event Tags Identifier
# -------------------- ------ ------ --------------- --------------- --------------- -----
# audio/b086_150_180.. - - grocery_store - - -
#
# Scene statistics
# Scene label Count
# -------------------- ------
# grocery_store 1

7. 處理鏈(Processing chain)

除了基本的實用程序外,該庫還提供了將各種數(shù)據(jù)處理類鏈接在一起的機制。這樣可以更輕松地構(gòu)建復(fù)雜的數(shù)據(jù)處理管道。數(shù)據(jù)處理在包含在鏈中的處理器(處理器列表)中完成。

所有處理器類都使用dcase_util.processors.ProcessorMixin mixin和特定的實用程序類。例如,dcase_util.processors.AudioReadingProcessor從dcase_util.processors.ProcessorMixin和dcase_util.containers.AudioContainer繼承。

  • 音頻相關(guān)處理器:

dcase_util.processors.AudioReadingProcessor,音頻讀取,支持多聲道音頻。
dcase_util.processors.MonoAudioReadingProcessor,在多聲道音頻通道的情況下,音頻讀取將混合到單個通道中。

  • 數(shù)據(jù)處理處理器:

dcase_util.processors.AggregationProcessor,通過滑動處理窗口聚合數(shù)據(jù)。
dcase_util.processors.SequencingProcessor,將數(shù)據(jù)矩陣分割成序列。
dcase_util.processors.NormalizationProcessor,規(guī)范數(shù)據(jù)矩陣。
dcase_util.processors.RepositoryNormalizationProcessor,規(guī)范存儲在倉庫中的數(shù)據(jù)。
dcase_util.processors.StackingProcessor根據(jù)存儲在存儲庫中的數(shù)據(jù)堆疊新的特征矩陣。

  • 數(shù)據(jù)編碼處理器:

dcase_util.processors.OneHotEncodingProcessor,one-hot編碼(分類)。
dcase_util.processors.ManyHotEncodingProcessor,many-hot編碼(標(biāo)記)。
dcase_util.processors.EventRollEncodingProcessor,事件滾動編碼(檢測)。

  • 特征提取處理器:

    • dcase_util.processors.RepositoryFeatureExtractorProcessor,同時提取許多要素類型,并將它們存儲到單個存儲庫中。支持多聲道音頻輸入。

    • dcase_util.processors.MelExtractorProcessor,提取梅爾帶能量。僅支持單聲道音頻。

    • dcase_util.processors.MfccStaticExtractorProcessor,提取靜態(tài)MFCC。僅支持單聲道音頻。

    • dcase_util.processors.MfccDeltaExtractorProcessor,提取delta MFCC。僅支持單聲道音頻。

    • dcase_util.processors.MfccAccelerationExtractorProcessor,提取加速MFCC。僅支持單聲道音頻。

    • dcase_util.processors.ZeroCrossingRateExtractorProcessor,提取過零率。僅支持單聲道音頻。

    • dcase_util.processors.RMSEnergyExtractorProcessor,提取RMS能量。僅支持單聲道音頻。
    • dcase_util.processors.SpectralCentroidExtractorProcessor,提取光譜質(zhì)心。僅支持單聲道音頻。
  • 元數(shù)據(jù)處理器:

dcase_util.processors.MetadataReadingProcessor,從文件中讀取元數(shù)據(jù)。

7.1 特征提取和處理

  • 為音頻文件提取梅爾帶能量:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
# Define processing chain
chain = dcase_util.processors.ProcessingChain([
{
'processor_name': 'dcase_util.processors.MonoAudioReadingProcessor',
'init_parameters': {
'fs': 44100
}
},
{
'processor_name': 'dcase_util.processors.MelExtractorProcessor',
'init_parameters': {}
}
])
# Run the processing chain
data = chain.process(filename=dcase_util.utils.Example().audio_filename())
data.show()
# FeatureContainer :: Class
# Data
# data : matrix (40,501)
# Dimensions
# time_axis : 1
# data_axis : 0
# Timing information
# time_resolution : 0.02 sec
# Meta
# stats : Calculated
# metadata : -
# processing_chain : ProcessingChain :: Class
# [0]
# DictContainer :: Class
# init_parameters
# _focus_channel : None
# _focus_start : 0
# _focus_stop : None
# channel_axis : 0
# data_synced_with_file : False
# filename : None
# fs : 44100
# mono : True
# time_axis : 1
# input_type : NONE
# output_type : AUDIO
# process_parameters
# filename : dcase_util/utils/example_data/acoustic_scene.flac
# focus_channel : None
# focus_duration_samples : (None,)
# focus_duration_sec : None
# focus_start_samples : None
# focus_start_sec : None
# focus_stop_samples : None
# focus_stop_sec : None
# processor_name : dcase_util.processors.MonoAudioReadingProcessor
#
# [1]
# DictContainer :: Class
# init_parameters
# eps : 2.22044604925e-16
# fmax : None
# fmin : 0
# fs : 44100
# hop_length_samples : 882
# hop_length_seconds : 0.02
# htk : False
# log : True
# n_mels : 40
# normalize_mel_bands : False
# win_length_samples : 1764
# win_length_seconds : 0.04
# input_type : AUDIO
# output_type : DATA_CONTAINER
# process_parameters
# filename : dcase_util/utils/example_data/acoustic_scene.flac
# processor_name : dcase_util.processors.MelExtractorProcessor
#
#
# Duration
# Frames : 501
# Seconds : 10.02 sec
  • 關(guān)注音頻的某些部分:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# Define processing chain
chain = dcase_util.processors.ProcessingChain([
{
'processor_name': 'dcase_util.processors.MonoAudioReadingProcessor',
'init_parameters': {
'fs': 44100
}
},
{
'processor_name': 'dcase_util.processors.MelExtractorProcessor',
'init_parameters': {}
}
])
# Run the processing chain
data = chain.process(
filename=dcase_util.utils.Example().audio_filename(),
focus_start_seconds=1.0,
duration_seconds=2.0
)
print(data.shape)
# (40, 101)

# Run the processing chain
data = chain.process(
filename=dcase_util.utils.Example().audio_filename(),
focus_start_samples=44100,
focus_stop_samples=44100*2
)
print(data.shape)
# (40, 51)
  • 為音頻文件提取幾個不同的聲學(xué)特征,并形成數(shù)據(jù)矩陣:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
# Define processing chain
chain = dcase_util.processors.ProcessingChain([
{
'processor_name': 'dcase_util.processors.MonoAudioReadingProcessor',
'init_parameters': {
'fs': 44100
}
},
{
'processor_name': 'dcase_util.processors.RepositoryFeatureExtractorProcessor',
'init_parameters': {
'parameters': {
'mel': {},
'mfcc': {}
}
}
},
{
'processor_name': 'dcase_util.processors.StackingProcessor',
'init_parameters': {
'recipe': 'mel;mfcc=1-19'
}
}
])
# Run the processing chain
data = chain.process(filename=dcase_util.utils.Example().audio_filename())
data.show()
# FeatureContainer :: Class
# Data
# data : matrix (59,501)
# Dimensions
# time_axis : 1
# data_axis : 0
# Timing information
# time_resolution : 0.02 sec
# Meta
# stats : Calculated
# metadata : -
# processing_chain : ProcessingChain :: Class
# [0]
# DictContainer :: Class
# init_parameters
# _focus_channel : None
# _focus_start : 0
# _focus_stop : None
# channel_axis : 0
# data_synced_with_file : False
# filename : None
# fs : 44100
# mono : True
# time_axis : 1
# input_type : NONE
# output_type : AUDIO
# process_parameters
# filename : dcase_util/utils/example_data/acoustic_scene.flac
# focus_channel : None
# focus_duration_samples : (None,)
# focus_duration_sec : None
# focus_start_samples : None
# focus_start_sec : None
# focus_stop_samples : None
# focus_stop_sec : None
# processor_name : dcase_util.processors.MonoAudioReadingProcessor
#
# [1]
# DictContainer :: Class
# init_parameters
# parameters
# mel
# mfcc
# input_type : AUDIO
# output_type : DATA_REPOSITORY
# process_parameters
# processor_name : dcase_util.processors.RepositoryFeatureExtractorProcessor
#
# [2]
# DictContainer :: Class
# init_parameters
# hop : 1
# recipe : list (2)
# [0]
# label : mel
# [1]
# label : mfcc
# vector-index
# full : False
# selection : False
# start : 1
# stop : 20
# stream : 0
# input_type : DATA_REPOSITORY
# output_type : DATA_CONTAINER
# process_parameters
# processor_name : dcase_util.processors.StackingProcessor
#
#
# Duration
# Frames : 501
# Seconds : 10.02 sec
data.plot()
  • 為音頻文件提取幾個不同的聲學(xué)特征,對它們進行歸一化處理,形成數(shù)據(jù)矩陣,沿時間軸聚合(上下文窗口化),以及將數(shù)據(jù)拆分成序列:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
import numpy
# Normalization factors
mel_mean = numpy.array([
-3.26094211, -4.20447522, -4.57860912, -5.11036974, -5.33019526,
-5.48390484, -5.50473626, -5.54014946, -5.28249358, -5.12090705,
-5.21508926, -5.3824216 , -5.37758142, -5.38829567, -5.4912112 ,
-5.55352419, -5.72801733, -6.02412347, -6.41367833, -6.64073975,
-6.80493457, -6.8717373 , -6.88140949, -6.91464104, -7.00929399,
-7.13497673, -7.36417664, -7.73457445, -8.25007518, -8.79878143,
-9.22709866, -9.28843908, -9.57054527, -9.82846299, -9.85425306,
-9.90253041, -9.85194976, -9.62786338, -9.38480217, -9.18478766
])
mel_std = numpy.array([
0.3450398 , 0.47330394, 0.53112192, 0.57607313, 0.66710664,
0.70052532, 0.79045046, 0.81864229, 0.79422025, 0.76691708,
0.64798516, 0.59340713, 0.57756029, 0.64032687, 0.70226395,
0.75670044, 0.80861907, 0.79305124, 0.7289238 , 0.75346821,
0.77785602, 0.7350573 , 0.75137917, 0.77171676, 0.80314121,
0.78965339, 0.79256442, 0.82524546, 0.84596991, 0.76430333,
0.69690919, 0.69591269, 0.54718615, 0.5277196 , 0.61271734,
0.54482468, 0.42716334, 0.25561558, 0.08991936, 0.06402002
])

mfcc_mean = numpy.array([
-1.89603847e+02, 4.88930395e+01, -8.37911555e+00,
2.58522036e+00, 4.51964497e+00, -3.87312873e-01,
8.97250541e+00, 1.61597737e+00, 1.74111135e+00,
2.50223131e+00, 3.03385048e+00, 1.34561742e-01,
1.04119803e+00, -2.57486399e-01, 7.58245525e-01,
1.11375319e+00, 5.45536494e-01, 7.62699140e-01,
9.34355519e-01, 1.57158221e-01
])
mfcc_std = numpy.array([
15.94006483, 2.39593761, 4.78748908, 2.39555341,
2.66573364, 1.75496556, 2.75005027, 1.5436589 ,
1.81070379, 1.39476785, 1.22560606, 1.25575051,
1.34613239, 1.46778281, 1.19398649, 1.1590474 ,
1.1309816 , 1.12975486, 0.95503429, 1.01747647
])

# Define processing chain
chain = dcase_util.processors.ProcessingChain([
{
'processor_name': 'dcase_util.processors.MonoAudioReadingProcessor',
'init_parameters': {
'fs': 44100
}
},
{
'processor_name': 'dcase_util.processors.RepositoryFeatureExtractorProcessor',
'init_parameters': {
'parameters': {
'mel': {},
'mfcc': {}
}
}
},
{
'processor_name': 'dcase_util.processors.RepositoryNormalizationProcessor',
'init_parameters': {
'parameters': {
'mel': {
'mean': mel_mean,
'std': mel_std
},
'mfcc': {
'mean': mfcc_mean,
'std': mfcc_std
}
}
}
},
{
'processor_name': 'dcase_util.processors.StackingProcessor',
'init_parameters': {
'recipe': 'mel;mfcc=1-19'
}
},
{
'processor_name': 'dcase_util.processors.AggregationProcessor',
'init_parameters': {
'recipe': ['flatten'],
'win_length_frames': 5,
'hop_length_frames': 1,
}
},
{
'processor_name': 'dcase_util.processors.SequencingProcessor',
'init_parameters': {
'frames': 20,
'hop_length_frames': 20,
'padding': True
}
},
])
data = chain.process(filename=dcase_util.utils.Example().audio_filename())
data.show()
# DataMatrix3DContainer :: Class
# Data
# data : matrix (295,20,26)
# Dimensions
# time_axis : 1
# data_axis : 0
# sequence_axis : 2
# Timing information
# time_resolution : None
# Meta
# stats : Calculated
# metadata : -
# processing_chain : ProcessingChain :: Class
# [0]
# DictContainer :: Class
# init_parameters
# _focus_channel : None
# _focus_start : 0
# _focus_stop : None
# channel_axis : 0
# data_synced_with_file : False
# filename : None
# fs : 44100
# mono : True
# time_axis : 1
# input_type : NONE
# output_type : AUDIO
# process_parameters
# filename : dcase_util/utils/example_data/acoustic_scene.flac
# focus_channel : None
# focus_duration_samples : (None,)
# focus_duration_sec : None
# focus_start_samples : None
# focus_start_sec : None
# focus_stop_samples : None
# focus_stop_sec : None
# processor_name : dcase_util.processors.MonoAudioReadingProcessor
#
# [1]
# DictContainer :: Class
# init_parameters
# parameters
# mel
# mfcc
# input_type : AUDIO
# output_type : DATA_REPOSITORY
# process_parameters
# processor_name : dcase_util.processors.RepositoryFeatureExtractorProcessor
#
# [2]
# DictContainer :: Class
# init_parameters
# parameters
# mel
# mean : matrix (40,1)
# std : matrix (40,1)
# mfcc
# mean : matrix (20,1)
# std : matrix (20,1)
# input_type : DATA_REPOSITORY
# output_type : DATA_REPOSITORY
# process_parameters
# processor_name : dcase_util.processors.RepositoryNormalizationProcessor
#
# [3]
# DictContainer :: Class
# init_parameters
# hop : 1
# recipe : list (2)
# [0]
# label : mel
# [1]
# label : mfcc
# vector-index
# full : False
# selection : False
# start : 1
# stop : 20
# stream : 0
# input_type : DATA_REPOSITORY
# output_type : DATA_CONTAINER
# process_parameters
# processor_name : dcase_util.processors.StackingProcessor
#
# [4]
# DictContainer :: Class
# init_parameters
# hop_length_frames : 1
# recipe : list (1)
# [0] : flatten
# win_length_frames : 5
# input_type : DATA_CONTAINER
# output_type : DATA_CONTAINER
# process_parameters
# processor_name : dcase_util.processors.AggregationProcessor
#
# [5]
# DictContainer :: Class
# init_parameters
# frames : 20
# hop_length_frames : 20
# padding : True
# shift : 0
# shift_border : roll
# shift_max : None
# shift_step : 0
# input_type : DATA_CONTAINER
# output_type : DATA_CONTAINER
# process_parameters
# processor_name : dcase_util.processors.SequencingProcessor
#
#
# Duration
# Frames : 20

7.2 元數(shù)據(jù)處理(Meta data processing)

  • 獲得事件卷(event roll):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import tempfile
tmp = tempfile.NamedTemporaryFile('r+', suffix='.txt', delete=False)
dcase_util.utils.Example.event_metadata_container().save(filename=tmp.name)

# Define processing chain
chain = dcase_util.processors.ProcessingChain([
{
'processor_name': 'dcase_util.processors.MetadataReadingProcessor',
'init_parameters': {}
},
{
'processor_name': 'dcase_util.processors.EventRollEncodingProcessor',
'init_parameters': {
'label_list': dcase_util.utils.Example.event_metadata_container().unique_event_labels,
'time_resolution': 0.02,
}
}
])

# Do the processing
data = chain.process(
filename=tmp.name,
focus_filename='test1.wav'
)

# Plot data
data.plot()
  • 獲取用于焦點段事件卷:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
import tempfile
tmp = tempfile.NamedTemporaryFile('r+', suffix='.txt', delete=False)
dcase_util.utils.Example.event_metadata_container().save(filename=tmp.name)

# Define processing chain
chain = dcase_util.processors.ProcessingChain([
{
'processor_name': 'dcase_util.processors.MetadataReadingProcessor',
'init_parameters': {}
},
{
'processor_name': 'dcase_util.processors.EventRollEncodingProcessor',
'init_parameters': {
'label_list': dcase_util.utils.Example.event_metadata_container().unique_event_labels,
'time_resolution': 0.02,
}
}
])

# Do the processing
data = chain.process(
filename=tmp.name,
focus_filename='test1.wav',
focus_start_seconds=2.0,
focus_stop_seconds=6.5,
)

# Plot data
data.plot()

應(yīng)用的例子

聲場景分類器
本教程演示如何使用dcase_util中的實用程序構(gòu)建簡單的聲場景分類器。 聲場景分類器應(yīng)用程序通常包含以下階段:

  • 數(shù)據(jù)集初始化階段,使用數(shù)據(jù)集已下載并準(zhǔn)備使用。
  • 特征提取階段將為開發(fā)數(shù)據(jù)集中的所有音頻文件提取聲學(xué)特征,并存儲到磁盤以便稍后訪問
  • 特征標(biāo)準(zhǔn)化階段,通過交叉驗證折疊訓(xùn)練材料并計算聲學(xué)特征的均值和標(biāo)準(zhǔn)差以稍后對特征數(shù)據(jù)進行歸一化
  • 學(xué)習(xí)階段,通過交叉驗證折疊的培訓(xùn)材料,并學(xué)習(xí)聲學(xué)模型。
  • 測試階段,通過交叉驗證折疊測試材料,并估計每個樣本的場景類別。
  • 評估,評估系統(tǒng)輸出與地面真相。

本例使用DCASE2013(10場景類),靜態(tài)MFCC作為特征,GMM作為分類器發(fā)布的聲場景數(shù)據(jù)集。 例子只顯示了最少的代碼,通常的開發(fā)系統(tǒng)需要更好的參數(shù)化來使系統(tǒng)開發(fā)更容易。

完整的代碼示例可以找到examples / asc_gmm_simple.py。

  • 數(shù)據(jù)集初始化
    本示例使用針對DCASE2013發(fā)布的聲場景數(shù)據(jù)集,處理此類的數(shù)據(jù)集類隨dcase_utils:dcase_util.datasets.DCASE2013_Scenes_DevelopmentSet一起提供。

數(shù)據(jù)集需要先下載,解壓縮到磁盤,并準(zhǔn)備好使用:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
import os
import dcase_util
# Setup logging
dcase_util.utils.setup_logging()

log = dcase_util.ui.FancyLogger()
log.title('Acoustic Scene Classification Example / GMM')

# Create dataset object and set dataset to be stored under 'data' directory.
db = dcase_util.datasets.DCASE2013_Scenes_DevelopmentSet(
data_path='data'
)

# Initialize dataset (download, extract and prepare it).
db.initialize()

# Show dataset information
db.show()
# DictContainer :: Class
# audio_source : Field recording
# audio_type : Natural
# authors : D. Giannoulis, E. Benetos, D. Stowell, and M. D. Plumbley
# microphone_model : Soundman OKM II Klassik/studio A3 electret microphone
# recording_device_model : Unknown
# title : IEEE AASP CASA Challenge - Public Dataset for Scene Classification Task
# url : https://archive.org/details/dcase2013_scene_classification
#
# MetaDataContainer :: Class
# Filename : data/DCASE2013-acoustic-scenes-development/meta.txt
# Items : 100
# Unique
# Files : 100
# Scene labels : 10
# Event labels : 0
# Tags : 0
#
# Scene statistics
# Scene label Count
# -------------------- ------
# bus 10
# busystreet 10
# office 10
# openairmarket 10
# park 10
# quietstreet 10
# restaurant 10
# supermarket 10
# tube 10
# tubestation 10
  • 特征提取
    通常,為所有音頻文件提取特征并將其存儲在磁盤上是最有效的,而不是在每次需要聲學(xué)特征時都提取它們。 示例如何執(zhí)行此操作:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
log.section_header('Feature Extraction')

# Prepare feature extractor
extractor = dcase_util.features.MfccStaticExtractor(
fs=44100,
win_length_seconds=0.04,
hop_length_seconds=0.02,
n_mfcc=14
)
# Define feature storage path
feature_storage_path = os.path.join('system_data', 'features')

# Make sure path exists
dcase_util.utils.Path().create(feature_storage_path)

# Loop over all audio files in the dataset and extract features for them.
for audio_filename in db.audio_files:
# Show some progress
log.line(os.path.split(audio_filename)[1], indent=2)

# Get filename for feature data from audio filename
feature_filename = os.path.join(
feature_storage_path,
os.path.split(audio_filename)[1].replace('.wav', '.cpickle')
)

# Load audio data
audio = dcase_util.containers.AudioContainer().load(
filename=audio_filename,
mono=True,
fs=extractor.fs
)

# Extract features and store them into FeatureContainer, and save it to the disk
features = dcase_util.containers.FeatureContainer(
filename=feature_filename,
data=extractor.extract(audio.data),
time_resolution=extractor.hop_length_seconds
).save()

log.foot()
  • 特征標(biāo)準(zhǔn)化(Feature normalization)
    在這個階段,每個交叉驗證折疊都會通過培訓(xùn)材料,并計算聲學(xué)特征的均值和標(biāo)準(zhǔn)差。 這些歸一化因子用于在特征數(shù)據(jù)在學(xué)習(xí)和測試階段使用之前對其進行歸一化。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
log.section_header('Feature Normalization')

# Define normalization data storage path
normalization_storage_path = os.path.join('system_data', 'normalization')

# Make sure path exists
dcase_util.utils.Path().create(normalization_storage_path)

# Loop over all cross-validation folds and calculate mean and std for the training data
for fold in db.folds():
# Show some progress
log.line('Fold {fold:d}'.format(fold=fold), indent=2)

# Get filename for the normalization factors
fold_stats_filename = os.path.join(
normalization_storage_path,
'norm_fold_{fold:d}.cpickle'.format(fold=fold)
)

# Normalizer
normalizer = dcase_util.data.Normalizer(filename=fold_stats_filename)

# Loop through all training data
for item in db.train(fold=fold):
# Get feature filename
feature_filename = os.path.join(
feature_storage_path,
os.path.split(item.filename)[1].replace('.wav', '.cpickle')
)

# Load feature matrix
features = dcase_util.containers.FeatureContainer().load(
filename=feature_filename
)

# Accumulate statistics
normalizer.accumulate(features.data)

# Finalize and save
normalizer.finalize().save()

log.foot()
  • 模型學(xué)習(xí)
    在這個階段,雖然通過交叉驗證折疊了訓(xùn)練材料,并且學(xué)習(xí)和存儲聲學(xué)模型。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
log.section_header('Learning')

# Imports
from sklearn.mixture import GaussianMixture
import numpy

# Define model data storage path
model_storage_path = os.path.join('system_data', 'model')

# Make sure path exists
dcase_util.utils.Path().create(model_storage_path)

# Loop over all cross-validation folds and learn acoustic models
for fold in db.folds():
# Show some progress
log.line('Fold {fold:d}'.format(fold=fold), indent=2)

# Get model filename
fold_model_filename = os.path.join(
model_storage_path,
'model_fold_{fold:d}.cpickle'.format(fold=fold)
)

# Get filename for the normalizer
fold_stats_filename = os.path.join(
normalization_storage_path,
'norm_fold_{fold:d}.cpickle'.format(fold=fold)
)

# Normalizer
normalizer = dcase_util.data.Normalizer().load(filename=fold_stats_filename)

# Collect class wise training data
class_wise_data = {}
for scene_label in db.scene_labels():
class_wise_data[scene_label] = []

# Loop through all training items from specific scene class
for item in db.train(fold=fold).filter(scene_label=scene_label):
# Get feature filename
feature_filename = os.path.join(
feature_storage_path,
os.path.split(item.filename)[1].replace('.wav', '.cpickle')
)

# Load all features.
features = dcase_util.containers.FeatureContainer().load(
filename=feature_filename
)

# Normalize features.
normalizer.normalize(features)

# Store feature data.
class_wise_data[scene_label].append(features.data)

# Initialize model container.
model = dcase_util.containers.DictContainer(filename=fold_model_filename)

# Loop though all scene classes and train acoustic model for each
for scene_label in db.scene_labels():
# Show some progress
log.line('[{scene_label}]'.format(scene_label=scene_label), indent=4)

# Train acoustic model
model[scene_label] = GaussianMixture(
n_components=8
).fit(
numpy.hstack(class_wise_data[scene_label]).T
)

# Save model to the disk
model.save()

log.foot()
  • 測試
    在這個階段,測試材料通過每次交叉驗證折疊,并且針對每個測試樣本估計場景類別。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
log.section_header('Testing')

# Define model data storage path
results_storage_path = os.path.join('system_data', 'results')

# Make sure path exists
dcase_util.utils.Path().create(results_storage_path)

# Loop over all cross-validation folds and test
for fold in db.folds():
# Show some progress
log.line('Fold {fold:d}'.format(fold=fold), indent=2)

# Get model filename
fold_model_filename = os.path.join(
model_storage_path,
'model_fold_{fold:d}.cpickle'.format(fold=fold)
)

# Load model
model = dcase_util.containers.DictContainer().load(
filename=fold_model_filename
)

# Get filename for the normalizer
fold_stats_filename = os.path.join(
normalization_storage_path,
'norm_fold_{fold:d}.cpickle'.format(fold=fold)
)

# Normalizer
normalizer = dcase_util.data.Normalizer().load(filename=fold_stats_filename)

# Get results filename
fold_results_filename = os.path.join(results_storage_path, 'res_fold_{fold:d}.txt'.format(fold=fold))

# Initialize results container
res = dcase_util.containers.MetaDataContainer(filename=fold_results_filename)

# Loop through all test files from the current cross-validation fold
for item in db.test(fold=fold):
# Get feature filename
feature_filename = os.path.join(
feature_storage_path,
os.path.split(item.filename)[1].replace('.wav', '.cpickle')
)

# Load all features.
features = dcase_util.containers.FeatureContainer().load(
filename=feature_filename
)

# Normalize features.
normalizer.normalize(features)

# Initialize log likelihoods matrix
logls = numpy.ones((db.scene_label_count(), features.frames)) * -numpy.inf

# Loop through all scene classes and get likelihood for each per frame
for scene_label_id, scene_label in enumerate(db.scene_labels()):
logls[scene_label_id] = model[scene_label].score_samples(features.data.T)

# Accumulate log likelihoods
accumulated_logls = dcase_util.data.ProbabilityEncoder().collapse_probabilities(
probabilities=logls,
operator='sum'
)

# Estimate scene label based on max likelihood.
estimated_scene_label = dcase_util.data.ProbabilityEncoder(
label_list=db.scene_labels()
).max_selection(
probabilities=accumulated_logls
)

# Store result into results container
res.append(
{
'filename': item.filename,
'scene_label': estimated_scene_label
}
)

# Save results container
res.save()
log.foot()
  • 評估
    在這個階段,系統(tǒng)輸出是根據(jù)數(shù)據(jù)集提供的地面實況來評估的。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
log.section_header('Evaluation')

# Imports
import sed_eval

all_res = []
overall = []
class_wise_results = numpy.zeros((len(db.folds()), len(db.scene_labels())))
for fold in db.folds():
# Get results filename
fold_results_filename = os.path.join(
results_storage_path,
'res_fold_{fold:d}.txt'.format(fold=fold)
)

# Get reference scenes
reference_scene_list = db.eval(fold=fold)
for item_id, item in enumerate(reference_scene_list):
# Modify data for sed_eval
reference_scene_list[item_id]['file'] = item.filename

# Load estimated scenes
estimated_scene_list = dcase_util.containers.MetaDataContainer().load(
filename=fold_results_filename
)
for item_id, item in enumerate(estimated_scene_list):
# Modify data for sed_eval
estimated_scene_list[item_id]['file'] = item.filename

# Initialize evaluator
evaluator = sed_eval.scene.SceneClassificationMetrics(scene_labels=db.scene_labels())

# Evaluate estimated against reference.
evaluator.evaluate(
reference_scene_list=reference_scene_list,
estimated_scene_list=estimated_scene_list
)

# Get results
results = dcase_util.containers.DictContainer(evaluator.results())

# Store fold-wise results
all_res.append(results)
overall.append(results.get_path('overall.accuracy')*100)

# Get scene class-wise results
class_wise_accuracy = []
for scene_label_id, scene_label in enumerate(db.scene_labels()):
class_wise_accuracy.append(results.get_path(['class_wise', scene_label, 'accuracy', 'accuracy']))
class_wise_results[fold-1, scene_label_id] = results.get_path(['class_wise', scene_label, 'accuracy', 'accuracy'])

# Form results table
cell_data = class_wise_results
scene_mean_accuracy = numpy.mean(cell_data, axis=0).reshape((1, -1))
cell_data = numpy.vstack((cell_data, scene_mean_accuracy))
fold_mean_accuracy = numpy.mean(cell_data, axis=1).reshape((-1, 1))
cell_data = numpy.hstack((cell_data, fold_mean_accuracy))

scene_list = db.scene_labels()
scene_list.extend(['Average'])
cell_data = [scene_list] + (cell_data*100).tolist()

column_headers = ['Scene']
for fold in db.folds():
column_headers.append('Fold {fold:d}'.format(fold=fold))

column_headers.append('Average')

log.table(
cell_data=cell_data,
column_headers=column_headers,
column_separators=[0, 5],
row_separators=[10],
indent=2
)
log.foot()
  • Results:
SceneFold 1Fold 2Fold 3Fold4Fold 5header 2header 2
row 1 col 1row 1 col 2row 1 col 2row 1 col 2row 1 col 2row 1 col 2row 1 col 2
row 2 col 1row 2 col 2
SceneFold 1Fold 2Fold 3Fold 4Fold 5Average
bus100.00100.00100.00100.00100.00100.00
busystreet100.0033.3333.33100.0066.6766.67
office66.67100.00100.0066.67100.0086.67
openairmarket66.67100.000.0066.67100.0066.67
park33.3333.330.0033.3333.3326.67
quietstreet66.67100.0033.3366.6766.6766.67
restaurant66.670.0066.670.0033.3333.33
supermarket33.330.0033.330.0033.3320.00
tube100.0033.3333.3366.6766.6760.00
tubestation0.0066.6766.670.000.0026.67
Average63.3356.6746.6750.0060.0055.33

總結(jié)

以上是生活随笔為你收集整理的dcase_util教程的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔推薦給好友。