阵列麦克风声音定位-代码python实现-二维与三维声音定位
0 聲音處理基礎專業名詞
FT - 傅立葉變換FT(Fourier Transform) 時域頻域轉換,此鏈接講的很清晰。
FFT - 快速傅里葉變換 (fast Fourier transform):計算機計算DFT
DTFT - 離散時間傅立葉變換:時域離散,頻域連續
DFT-離散傅立葉變換:時域離散,頻域也離散時域離散,頻域連續
相當于對時域的連續信號進行抽樣(等間隔采樣)后,再進行傅立葉變換。(FFT DFT DTDF 的關系 可視化原理? 此鏈接講的很清晰)??????????? IFT IFFT IDFT ...為逆變換??
?
STFT - 短時傅里葉變換short-time Fourier transform:在信號做傅里葉變換之前乘一個時間有限的窗函數 h(t),并假定非平穩信號在分析窗的短時間隔內是平穩的,通過窗函數 h(t)在時間軸上的移動,對信號進行逐段分析得到信號的一組局部“頻譜”。STFT對聲音處理很重要,可以生成頻譜圖,詳細原理此STFT鏈接講的很清晰。
MFCC -?梅爾頻率倒譜系數。?此MFCC鏈接講的很清晰。梅爾頻率:梅爾頻率是一種給予人耳對等距的音高變化的感官判斷而定的非線性頻率刻度。它與頻率赫茲的關系為:
?倒譜:是一種信號的頻譜經過對數運算后再進行傅里葉反變換得到的譜。
DCT-離散余弦變換 Discrete Cosine Transform:不同頻率振蕩的余弦函數之和來表示數據點的有限序列。
?幅度譜、相位譜、能量譜等語音信號處理中的基礎知識_IMU_YY的博客-CSDN博客_幅度譜
1 簡介
1.1 什么是陣列麥克風
麥克風陣列是由一定數目的麥克風組成,對聲場的空間特性進行采樣并濾波的系統。目前常用的麥克風陣列可以按布局形狀分為:線性陣列,平面陣列,以及立體陣列。其幾何構型是按設計已知,所有麥克風的頻率響應一致,麥克風的采樣時鐘也是同步的。
1.2 麥克風陣列的作用
麥克風陣列一般用于:聲源定位,包括角度和距離的測量,抑制背景噪聲、干擾、混響、回聲,信號提取,信號分離。其中聲源定位技術利用麥克風陣列計算聲源距離陣列的角度和距離,實現對目標聲源的跟蹤。
更多原理
2 acoular庫
基于acoular庫來實現,官網手冊有詳細教程。
2.1 安裝
安裝可以通過pip安裝。
pip install acoular也可以源碼安裝,github下載,進入文件夾。
python setup.py install進入python環境檢查安裝。
import acoular acoular.demo.acoular_demo.run()出現64陣列麥克風與三個模擬聲援范例,安裝成功。?
?3 二維定位
首先準備陣列麥克風的xml配置文件。就改麥格風個數與空間坐標。
<?xml version="1.0" encoding="utf-8"?> <MicArray name="array_64"> <pos Name="Point 1 " x=" 0.4 " y=" -0.1 " z=" 0 "/><pos Name="Point 2 " x=" 0.2 " y=" 0 " z=" 0 "/><pos Name="Point 3 " x=" 0.1 " y=" 0.1 " z=" 0 "/><pos Name="Point 4 " x=" -0.4 " y=" 0.4 " z=" 0 "/><pos Name="Point 5 " x=" -0.2 " y=" 0 " z=" 0 "/><pos Name="Point 6 " x=" -0.1 " y=" -0.2 " z=" 0 "/> </MicArray>準備這個麥克風的錄音文件,如果有的是USB陣列麥克風,首先連接上再查對應的麥克風ID。
import numbers import pyaudio #//cat /proc/asound/devices p=pyaudio.PyAudio() info=p.get_host_api_info_by_index(0) numberdevices=info.get('deviceCount') for i in range(0,numberdevices):if(p.get_device_info_by_host_api_device_index(0,i).get('maxInputChannels'))>0:print('INPUT DVEICES ID:',i,"-",p.get_device_info_by_host_api_device_index(0,i).get('name'))錄音,保存格式為wav(wav為一般音頻文件處理格式) 需要調節采樣率等參數
from chunk import Chunk from ctypes import sizeof import numbers import pyaudio import argparse import numpy as np #//cat /proc/asound/devices import wave import cv2 p=pyaudio.PyAudio()def recode_voice(micid):#打開麥 設置數據流格式#調節rate channels stream=p.open(format=pyaudio.paInt16,channels=6,rate=16000,input=True,frames_per_buffer=8000,input_device_index=micid)return streamif __name__ == '__main__':paser=argparse.ArgumentParser(description="This bin is for recode the voice by wav,you need input the micid!")paser.add_argument('micid',type=int,help="the ID of mic device!")args=paser.parse_args()stream=recode_voice(args.micid)stream:pyaudio.Streamframes=[]i=0while(i<20):i+=1print('開始錄音!')data=stream.read(8000,exception_on_overflow=False)audio_data=np.fromstring(data,dtype=np.short) #轉numpy獲取最大值# print(len(audio_data)) #8000一記錄 chunk塊temp=np.max(np.abs(audio_data)) #顯示每8000個的最大數值print("當前最大數值:",temp)frames.append(data) print('停止錄音!') wf=wave.open("./recordV.wav",'wb') wf.setnchannels(1) wf.setsampwidth(p.get_sample_size(pyaudio.paInt16)) wf.setframerate(16000) wf.writeframes(b''.join(frames)) wf.close有了wav格式文件,轉H5文件,acoular庫需要h5音頻格式。需要改文件名,對應采樣率。
from sys import byteorder import wave import tables import scipy.io.wavfile as wavfilename_="你的音頻文件名" samplerate,data=wavfile.read(name_+'.wav') # fs=wave.open(name_+'.wav')meh5=tables.open_file(name_+".h5",mode="w") meh5.create_earray('/','time_data',obj=data) meh5.set_node_attr('/time_data','sample_freq',16000)到這里H5格式的音頻文件,xml配置文件都準備好,利用acoular庫定位音源。
import acoular import pylab as plt from os import pathmicgeofile = path.join('/home/......./array_6.xml')#############輸入麥格風文件rg = acoular.RectGrid( x_min=-1, x_max=1, y_min=-1, y_max=1, z=0.3, increment=0.01 )#畫麥克風的網格大小mg = acoular.MicGeom( from_file=micgeofile ) #讀麥位置ts = acoular.TimeSamples( name='memory.h5' ) ###########輸入h5print(ts.numsamples) print(ts.numchannels) print(ts.sample_freq)print(ts.data) ps = acoular.PowerSpectra( time_data=ts, block_size=128, window='Hanning' )#分幀加窗plt.ion() # switch on interactive plotting mode print(mg.mpos[0],type(mg.mpos)) plt.plot(mg.mpos[0],mg.mpos[1],'o') plt.show() plt.waitforbuttonpress() env=acoular.Environment(c=346.04) st = acoular.SteeringVector( grid=rg, mics=mg ,env=env)#用單源傳輸模型實現轉向矢量的基本類bb = acoular.BeamformerBase( freq_data=ps, steer=st )#波束形成在頻域采用基本的延遲和和算法。pm = bb.synthetic( 2000, 2 )Lm = acoular.L_p( pm )plt.figure() # open new figure plt.imshow( Lm.T, origin='lower', vmin=Lm.max()-0.1,extent=rg.extend())plt.colorbar()plt.waitforbuttonpress()?運行效果
?三維定位(三維定位要慢一些)改xml,h5路徑。還有分辨率,分貝范圍可以調節
# -*- coding: utf-8 -*- """ Example "3D beamforming" for Acoular library.Demonstrates a 3D beamforming setup with point sources.Simulates data on 64 channel array, subsequent beamforming with CLEAN-SC on 3D grid.Copyright (c) 2019 Acoular Development Team. All rights reserved. """ from os import path# imports from acoular import acoular from acoular import __file__ as bpath, L_p, MicGeom, PowerSpectra,\ RectGrid3D, BeamformerBase, BeamformerCleansc, \ SteeringVector, WNoiseGenerator, PointSource, SourceMixer# other imports from numpy import mgrid, arange, array, arccos, pi, cos, sin, sum import mpl_toolkits.mplot3d from pylab import figure, show, scatter, subplot, imshow, title, colorbar,\ xlabel, ylabel#=============================================================================== # First, we define the microphone geometry. #===============================================================================micgeofile = path.join('/home/sunshine/桌面/code_C_PY_2022/py/7.acoular庫mvdr實現音源定位/array_6.xml') # generate test data, in real life this would come from an array measurement m = MicGeom( from_file=micgeofile )#=============================================================================== # Now, the sources (signals and types/positions) are defined. #=============================================================================== # sfreq = 51200 # duration = 1 # nsamples = duration*sfreq# n1 = WNoiseGenerator( sample_freq=sfreq, numsamples=nsamples, seed=1 ) # n2 = WNoiseGenerator( sample_freq=sfreq, numsamples=nsamples, seed=2, rms=0.5 ) # n3 = WNoiseGenerator( sample_freq=sfreq, numsamples=nsamples, seed=3, rms=0.25 ) # p1 = PointSource( signal=n1, mics=m, loc=(-0.1,-0.1,0.3) ) # p2 = PointSource( signal=n2, mics=m, loc=(0.15,0,0.17) ) # p3 = PointSource( signal=n3, mics=m, loc=(0,0.1,0.25) ) # pa = SourceMixer( sources=[p1,p2,p3])#=============================================================================== # the 3D grid (very coarse to enable fast computation for this example) #===============================================================================g = RectGrid3D(x_min=-0.2, x_max=0.2, y_min=-0.2, y_max=0.2, z_min=0.1, z_max=0.36, increment=0.02)#=============================================================================== # The following provides the cross spectral matrix and defines the CLEAN-SC beamformer. # To be really fast, we restrict ourselves to only 10 frequencies # in the range 2000 - 6000 Hz (5*400 - 15*400) #=============================================================================== pa = acoular.TimeSamples( name='memory.h5' ) #讀h5 f = PowerSpectra(time_data=pa, window='Hanning', overlap='50%', block_size=128, ind_low=5, ind_high=16) st = SteeringVector(grid=g, mics=m, steer_type='true location') b = BeamformerCleansc(freq_data=f, steer=st)#=============================================================================== # Calculate the result for 4 kHz octave band #===============================================================================map = b.synthetic(2000,2)##=============================================================================== # Display views of setup and result. # For each view, the values along the repsective axis are summed. # Note that, while Acoular uses a left-oriented coordinate system, # for display purposes, the z-axis is inverted, plotting the data in # a right-oriented coordinate system. #===============================================================================fig=figure(1,(8,8))# plot the resultssubplot(221) map_z = sum(map,2) mx = L_p(map_z.max()) imshow(L_p(map_z.T), vmax=mx, vmin=mx-1, origin='lower', interpolation='nearest', extent=(g.x_min, g.x_max, g.y_min, g.y_max)) xlabel('x') ylabel('y') title('Top view (xy)' )subplot(223) map_y = sum(map,1) imshow(L_p(map_y.T), vmax=mx, vmin=mx-1, origin='upper', interpolation='nearest', extent=(g.x_min, g.x_max, -g.z_max, -g.z_min)) xlabel('x') ylabel('z') title('Side view (xz)' )subplot(222) map_x = sum(map,0) imshow(L_p(map_x), vmax=mx, vmin=mx-1, origin='lower', interpolation='nearest', extent=(-g.z_min, -g.z_max,g.y_min, g.y_max)) xlabel('z') ylabel('y') title('Side view (zy)' ) colorbar()# plot the setup# ax0 = fig.add_subplot((224), projection='3d') # ax0.scatter(m.mpos[0],m.mpos[1],-m.mpos[2]) # source_locs=array([p1.loc,p2.loc,p3.loc]).T # ax0.scatter(source_locs[0],source_locs[1],-source_locs[2]) # ax0.set_xlabel('x') # ax0.set_ylabel('y') # ax0.set_zlabel('z') # ax0.set_title('Setup (mic and source positions)')# only display result on screen if this script is run directly if __name__ == '__main__': show()總結
以上是生活随笔為你收集整理的阵列麦克风声音定位-代码python实现-二维与三维声音定位的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 哈哈哈!段子手们在家被迫营业,每一个都能
- 下一篇: Python计算Arduino声音方向范