阵列麦克风声音定位-代码python实现-二维与三维声音定位
0 聲音處理基礎(chǔ)專業(yè)名詞
FT - 傅立葉變換FT(Fourier Transform) 時(shí)域頻域轉(zhuǎn)換,此鏈接講的很清晰。
FFT - 快速傅里葉變換 (fast Fourier transform):計(jì)算機(jī)計(jì)算DFT
DTFT - 離散時(shí)間傅立葉變換:時(shí)域離散,頻域連續(xù)
DFT-離散傅立葉變換:時(shí)域離散,頻域也離散時(shí)域離散,頻域連續(xù)
相當(dāng)于對(duì)時(shí)域的連續(xù)信號(hào)進(jìn)行抽樣(等間隔采樣)后,再進(jìn)行傅立葉變換。(FFT DFT DTDF 的關(guān)系 可視化原理? 此鏈接講的很清晰)??????????? IFT IFFT IDFT ...為逆變換??
?
STFT - 短時(shí)傅里葉變換short-time Fourier transform:在信號(hào)做傅里葉變換之前乘一個(gè)時(shí)間有限的窗函數(shù) h(t),并假定非平穩(wěn)信號(hào)在分析窗的短時(shí)間隔內(nèi)是平穩(wěn)的,通過(guò)窗函數(shù) h(t)在時(shí)間軸上的移動(dòng),對(duì)信號(hào)進(jìn)行逐段分析得到信號(hào)的一組局部“頻譜”。STFT對(duì)聲音處理很重要,可以生成頻譜圖,詳細(xì)原理此STFT鏈接講的很清晰。
MFCC -?梅爾頻率倒譜系數(shù)。?此MFCC鏈接講的很清晰。梅爾頻率:梅爾頻率是一種給予人耳對(duì)等距的音高變化的感官判斷而定的非線性頻率刻度。它與頻率赫茲的關(guān)系為:
?倒譜:是一種信號(hào)的頻譜經(jīng)過(guò)對(duì)數(shù)運(yùn)算后再進(jìn)行傅里葉反變換得到的譜。
DCT-離散余弦變換 Discrete Cosine Transform:不同頻率振蕩的余弦函數(shù)之和來(lái)表示數(shù)據(jù)點(diǎn)的有限序列。
?幅度譜、相位譜、能量譜等語(yǔ)音信號(hào)處理中的基礎(chǔ)知識(shí)_IMU_YY的博客-CSDN博客_幅度譜
1 簡(jiǎn)介
1.1 什么是陣列麥克風(fēng)
麥克風(fēng)陣列是由一定數(shù)目的麥克風(fēng)組成,對(duì)聲場(chǎng)的空間特性進(jìn)行采樣并濾波的系統(tǒng)。目前常用的麥克風(fēng)陣列可以按布局形狀分為:線性陣列,平面陣列,以及立體陣列。其幾何構(gòu)型是按設(shè)計(jì)已知,所有麥克風(fēng)的頻率響應(yīng)一致,麥克風(fēng)的采樣時(shí)鐘也是同步的。
1.2 麥克風(fēng)陣列的作用
麥克風(fēng)陣列一般用于:聲源定位,包括角度和距離的測(cè)量,抑制背景噪聲、干擾、混響、回聲,信號(hào)提取,信號(hào)分離。其中聲源定位技術(shù)利用麥克風(fēng)陣列計(jì)算聲源距離陣列的角度和距離,實(shí)現(xiàn)對(duì)目標(biāo)聲源的跟蹤。
更多原理
2 acoular庫(kù)
基于acoular庫(kù)來(lái)實(shí)現(xiàn),官網(wǎng)手冊(cè)有詳細(xì)教程。
2.1 安裝
安裝可以通過(guò)pip安裝。
pip install acoular也可以源碼安裝,github下載,進(jìn)入文件夾。
python setup.py install進(jìn)入python環(huán)境檢查安裝。
import acoular acoular.demo.acoular_demo.run()出現(xiàn)64陣列麥克風(fēng)與三個(gè)模擬聲援范例,安裝成功。?
?3 二維定位
首先準(zhǔn)備陣列麥克風(fēng)的xml配置文件。就改麥格風(fēng)個(gè)數(shù)與空間坐標(biāo)。
<?xml version="1.0" encoding="utf-8"?> <MicArray name="array_64"> <pos Name="Point 1 " x=" 0.4 " y=" -0.1 " z=" 0 "/><pos Name="Point 2 " x=" 0.2 " y=" 0 " z=" 0 "/><pos Name="Point 3 " x=" 0.1 " y=" 0.1 " z=" 0 "/><pos Name="Point 4 " x=" -0.4 " y=" 0.4 " z=" 0 "/><pos Name="Point 5 " x=" -0.2 " y=" 0 " z=" 0 "/><pos Name="Point 6 " x=" -0.1 " y=" -0.2 " z=" 0 "/> </MicArray>準(zhǔn)備這個(gè)麥克風(fēng)的錄音文件,如果有的是USB陣列麥克風(fēng),首先連接上再查對(duì)應(yīng)的麥克風(fēng)ID。
import numbers import pyaudio #//cat /proc/asound/devices p=pyaudio.PyAudio() info=p.get_host_api_info_by_index(0) numberdevices=info.get('deviceCount') for i in range(0,numberdevices):if(p.get_device_info_by_host_api_device_index(0,i).get('maxInputChannels'))>0:print('INPUT DVEICES ID:',i,"-",p.get_device_info_by_host_api_device_index(0,i).get('name'))錄音,保存格式為wav(wav為一般音頻文件處理格式) 需要調(diào)節(jié)采樣率等參數(shù)
from chunk import Chunk from ctypes import sizeof import numbers import pyaudio import argparse import numpy as np #//cat /proc/asound/devices import wave import cv2 p=pyaudio.PyAudio()def recode_voice(micid):#打開麥 設(shè)置數(shù)據(jù)流格式#調(diào)節(jié)rate channels stream=p.open(format=pyaudio.paInt16,channels=6,rate=16000,input=True,frames_per_buffer=8000,input_device_index=micid)return streamif __name__ == '__main__':paser=argparse.ArgumentParser(description="This bin is for recode the voice by wav,you need input the micid!")paser.add_argument('micid',type=int,help="the ID of mic device!")args=paser.parse_args()stream=recode_voice(args.micid)stream:pyaudio.Streamframes=[]i=0while(i<20):i+=1print('開始錄音!')data=stream.read(8000,exception_on_overflow=False)audio_data=np.fromstring(data,dtype=np.short) #轉(zhuǎn)numpy獲取最大值# print(len(audio_data)) #8000一記錄 chunk塊temp=np.max(np.abs(audio_data)) #顯示每8000個(gè)的最大數(shù)值print("當(dāng)前最大數(shù)值:",temp)frames.append(data) print('停止錄音!') wf=wave.open("./recordV.wav",'wb') wf.setnchannels(1) wf.setsampwidth(p.get_sample_size(pyaudio.paInt16)) wf.setframerate(16000) wf.writeframes(b''.join(frames)) wf.close有了wav格式文件,轉(zhuǎn)H5文件,acoular庫(kù)需要h5音頻格式。需要改文件名,對(duì)應(yīng)采樣率。
from sys import byteorder import wave import tables import scipy.io.wavfile as wavfilename_="你的音頻文件名" samplerate,data=wavfile.read(name_+'.wav') # fs=wave.open(name_+'.wav')meh5=tables.open_file(name_+".h5",mode="w") meh5.create_earray('/','time_data',obj=data) meh5.set_node_attr('/time_data','sample_freq',16000)到這里H5格式的音頻文件,xml配置文件都準(zhǔn)備好,利用acoular庫(kù)定位音源。
import acoular import pylab as plt from os import pathmicgeofile = path.join('/home/......./array_6.xml')#############輸入麥格風(fēng)文件rg = acoular.RectGrid( x_min=-1, x_max=1, y_min=-1, y_max=1, z=0.3, increment=0.01 )#畫麥克風(fēng)的網(wǎng)格大小mg = acoular.MicGeom( from_file=micgeofile ) #讀麥位置ts = acoular.TimeSamples( name='memory.h5' ) ###########輸入h5print(ts.numsamples) print(ts.numchannels) print(ts.sample_freq)print(ts.data) ps = acoular.PowerSpectra( time_data=ts, block_size=128, window='Hanning' )#分幀加窗plt.ion() # switch on interactive plotting mode print(mg.mpos[0],type(mg.mpos)) plt.plot(mg.mpos[0],mg.mpos[1],'o') plt.show() plt.waitforbuttonpress() env=acoular.Environment(c=346.04) st = acoular.SteeringVector( grid=rg, mics=mg ,env=env)#用單源傳輸模型實(shí)現(xiàn)轉(zhuǎn)向矢量的基本類bb = acoular.BeamformerBase( freq_data=ps, steer=st )#波束形成在頻域采用基本的延遲和和算法。pm = bb.synthetic( 2000, 2 )Lm = acoular.L_p( pm )plt.figure() # open new figure plt.imshow( Lm.T, origin='lower', vmin=Lm.max()-0.1,extent=rg.extend())plt.colorbar()plt.waitforbuttonpress()?運(yùn)行效果
?三維定位(三維定位要慢一些)改xml,h5路徑。還有分辨率,分貝范圍可以調(diào)節(jié)
# -*- coding: utf-8 -*- """ Example "3D beamforming" for Acoular library.Demonstrates a 3D beamforming setup with point sources.Simulates data on 64 channel array, subsequent beamforming with CLEAN-SC on 3D grid.Copyright (c) 2019 Acoular Development Team. All rights reserved. """ from os import path# imports from acoular import acoular from acoular import __file__ as bpath, L_p, MicGeom, PowerSpectra,\ RectGrid3D, BeamformerBase, BeamformerCleansc, \ SteeringVector, WNoiseGenerator, PointSource, SourceMixer# other imports from numpy import mgrid, arange, array, arccos, pi, cos, sin, sum import mpl_toolkits.mplot3d from pylab import figure, show, scatter, subplot, imshow, title, colorbar,\ xlabel, ylabel#=============================================================================== # First, we define the microphone geometry. #===============================================================================micgeofile = path.join('/home/sunshine/桌面/code_C_PY_2022/py/7.acoular庫(kù)mvdr實(shí)現(xiàn)音源定位/array_6.xml') # generate test data, in real life this would come from an array measurement m = MicGeom( from_file=micgeofile )#=============================================================================== # Now, the sources (signals and types/positions) are defined. #=============================================================================== # sfreq = 51200 # duration = 1 # nsamples = duration*sfreq# n1 = WNoiseGenerator( sample_freq=sfreq, numsamples=nsamples, seed=1 ) # n2 = WNoiseGenerator( sample_freq=sfreq, numsamples=nsamples, seed=2, rms=0.5 ) # n3 = WNoiseGenerator( sample_freq=sfreq, numsamples=nsamples, seed=3, rms=0.25 ) # p1 = PointSource( signal=n1, mics=m, loc=(-0.1,-0.1,0.3) ) # p2 = PointSource( signal=n2, mics=m, loc=(0.15,0,0.17) ) # p3 = PointSource( signal=n3, mics=m, loc=(0,0.1,0.25) ) # pa = SourceMixer( sources=[p1,p2,p3])#=============================================================================== # the 3D grid (very coarse to enable fast computation for this example) #===============================================================================g = RectGrid3D(x_min=-0.2, x_max=0.2, y_min=-0.2, y_max=0.2, z_min=0.1, z_max=0.36, increment=0.02)#=============================================================================== # The following provides the cross spectral matrix and defines the CLEAN-SC beamformer. # To be really fast, we restrict ourselves to only 10 frequencies # in the range 2000 - 6000 Hz (5*400 - 15*400) #=============================================================================== pa = acoular.TimeSamples( name='memory.h5' ) #讀h5 f = PowerSpectra(time_data=pa, window='Hanning', overlap='50%', block_size=128, ind_low=5, ind_high=16) st = SteeringVector(grid=g, mics=m, steer_type='true location') b = BeamformerCleansc(freq_data=f, steer=st)#=============================================================================== # Calculate the result for 4 kHz octave band #===============================================================================map = b.synthetic(2000,2)##=============================================================================== # Display views of setup and result. # For each view, the values along the repsective axis are summed. # Note that, while Acoular uses a left-oriented coordinate system, # for display purposes, the z-axis is inverted, plotting the data in # a right-oriented coordinate system. #===============================================================================fig=figure(1,(8,8))# plot the resultssubplot(221) map_z = sum(map,2) mx = L_p(map_z.max()) imshow(L_p(map_z.T), vmax=mx, vmin=mx-1, origin='lower', interpolation='nearest', extent=(g.x_min, g.x_max, g.y_min, g.y_max)) xlabel('x') ylabel('y') title('Top view (xy)' )subplot(223) map_y = sum(map,1) imshow(L_p(map_y.T), vmax=mx, vmin=mx-1, origin='upper', interpolation='nearest', extent=(g.x_min, g.x_max, -g.z_max, -g.z_min)) xlabel('x') ylabel('z') title('Side view (xz)' )subplot(222) map_x = sum(map,0) imshow(L_p(map_x), vmax=mx, vmin=mx-1, origin='lower', interpolation='nearest', extent=(-g.z_min, -g.z_max,g.y_min, g.y_max)) xlabel('z') ylabel('y') title('Side view (zy)' ) colorbar()# plot the setup# ax0 = fig.add_subplot((224), projection='3d') # ax0.scatter(m.mpos[0],m.mpos[1],-m.mpos[2]) # source_locs=array([p1.loc,p2.loc,p3.loc]).T # ax0.scatter(source_locs[0],source_locs[1],-source_locs[2]) # ax0.set_xlabel('x') # ax0.set_ylabel('y') # ax0.set_zlabel('z') # ax0.set_title('Setup (mic and source positions)')# only display result on screen if this script is run directly if __name__ == '__main__': show()總結(jié)
以上是生活随笔為你收集整理的阵列麦克风声音定位-代码python实现-二维与三维声音定位的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: 哈哈哈!段子手们在家被迫营业,每一个都能
- 下一篇: Python计算Arduino声音方向范