當前位置：首頁 >

python pcm 分贝_语音文件 pcm 静默（静音）判断

發布時間：2024/4/11 48 豆豆

生活随笔收集整理的這篇文章主要介紹了 python pcm 分贝_语音文件 pcm 静默（静音）判断小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

pcm 文件存儲的是原始的聲音波型二進制流，沒有文件頭。

(1)首先要確認 pcm文件的每個采樣數據采樣位數，一般為8bit或16bit。

(2)然后確定是雙聲道還是單聲道，雙聲道是兩個聲道的數據交互排列，需要單獨提取出每個聲道的數據。

(3)然后確定有沒有符號位，如采樣點位16bit有符號位的的范圍為-32768~32767

(4)確定當前操作系統的內存方式是大端，還是小端存儲。具體看http://blog.csdn.net/u013378306/article/details/78904238

(5)根據以上四條對pcm文件進行解析，轉化為10進制文件

注意：對于1-3可以在windows使用cooledit 工具設置參數播放pcm文件來確定具體參數，也可以使用以下java代碼進行測試：

本例子的語音為：靜默1秒，然后說 “你好”，然后靜默兩秒。pcm文件下載路徑：http://download.csdn.net/download/u013378306/10175068

packagetest;importjava.io.File;importjava.io.FileInputStream;importjava.io.FileNotFoundException;importjava.io.IOException;importjava.io.InputStream;importjavax.sound.sampled.AudioFormat;importjavax.sound.sampled.AudioSystem;importjavax.sound.sampled.DataLine;importjavax.sound.sampled.LineUnavailableException;importjavax.sound.sampled.SourceDataLine;public classtest {/***@paramargs

*@throwsException*/

public static void main(String[] args) throwsException {//TODO Auto-generated method stub

File file= new File("3.pcm");

System.out.println(file.length());int offset = 0;int bufferSize =Integer.valueOf(String.valueOf(file.length())) ;byte[] audioData = new byte[bufferSize];

InputStream in= newFileInputStream(file);

in.read(audioData);float sampleRate = 20000;int sampleSizeInBits = 16;int channels = 1;boolean signed = true;boolean bigEndian = false;//sampleRate - 每秒的樣本數//sampleSizeInBits - 每個樣本中的位數//channels - 聲道數(單聲道 1 個，立體聲 2 個)//signed - 指示數據是有符號的，還是無符號的//bigEndian -是否為大端存儲，指示是否以 big-endian 字節順序存儲單個樣本中的數據(false 意味著//little-endian)。

AudioFormat af = newAudioFormat(sampleRate, sampleSizeInBits, channels, signed, bigEndian);

SourceDataLine.Info info= new DataLine.Info(SourceDataLine.class, af, bufferSize);

SourceDataLine sdl=(SourceDataLine) AudioSystem.getLine(info);

sdl.open(af);

sdl.start();for(int i=0;i

audioData[i]*=1;while (offset

offset+=sdl.write(audioData, offset, bufferSize);

}

如果測試通過確定了參數就可以對pcm文件進行解析，如下java代碼對每個采樣數據為16bits，單聲道的pcm，在操作系統內存為小端存儲下解析為10進制文件。

packagetest;importjava.io.File;importjava.io.FileInputStream;importjava.io.FileWriter;importjava.io.InputStream;importjava.math.BigInteger;public classffff {/*** 采樣位為16bits，小端存儲，單聲道解析為10進制文件

*@paramargs*/

public static voidmain(String[] args) {try{

File file= new File("3.pcm");

System.out.println(file.length());

System.out.println(file.length());int bufferSize =Integer.valueOf(String.valueOf(file.length()));byte[] buffers = new byte[bufferSize];

InputStream in= newFileInputStream(file);

in.read(buffers);

String rs= "";for (int i = 0; i < buffers.length; i++) {byte[] bs = new byte[2];

bs[0]=buffers[i+1];//小端存儲，

bs[1]=buffers[i];int s = Integer.valueOf(binary(bs, 10));

i= i + 1;

rs+= " " +s;

}

writeFile(rs);

in.close();

}catch(Exception e) {

e.printStackTrace();

}

}public static voidwriteFile(String s) {try{

FileWriter fw= new FileWriter("hello3.txt");

fw.write(s,0, s.length());

fw.flush();

fw.close();

}catch(Exception e) {

e.printStackTrace();

}

}public static String binary(byte[] bytes, intradix) {return new BigInteger(bytes).toString(radix);//這里的1代表正數

}

執行完可以查看hello.txt ，可以看到一開始振幅很小，如下，基本不超過100：

-15 -12 -18 -24 -17 -8 -8 -17 -22 -14 -5 -18 -47 -67 -60 -41 -28 -28 -23 -12 -6 -9 -13 -8 0 6 21 49 68 48 -2 -43 -47 -32 -22 -10 22 56

但說你好的時候，振幅變得很大：

-2507 -2585 -2600 -2596 -2620 -2670 -2703 -2674 -2581 -2468 -2378 -2305 -2200 -2018 -1774 -1523 -1307 -1127 -962 -806 -652 -505 -384 -313 -281 -241 -163

然后靜默兩秒，振幅又變的很小：

5 3 0 -4 -5 -6 -6 -7 -7 -8 -9 -8 -10 -10 -11 -10 -11 -11 -11 -11 -11 -11 -10 -9 -7 -6 -3 -2 -2 -3 -3 -3 -1 2 4 4

具體波形圖可以使用python代碼顯示：

importnumpy as npimportpylab as plimportmathimportcodecs

file=codecs.open("hello3.txt","r") //原文代碼file=codecs.open("hello3.txt","rb")，b是binary，以二進制方式讀取，是錯誤的。

lines=" "

for line infile.readlines():

lines=lines+line

ys=lines.split(" ")

yss=[]

ays=list()

axs=list()

i=0

max1=pow(2,16)-1

for y inys:if y.strip()=="":continueyss.append(y)for index inrange(len(yss)):

y1=yss[index]

i+=1;

y=int(y1)

ays.append(y)

axs.append(i)#print i

file.close()

pl.plot(axs, ays,"ro")#use pylab to plot x and y

pl.show()#show the plot on the screen

得到波形圖

這里音頻振幅與audacity中呈現的結果吻合，只是這里把振幅放大以便用肉眼去觀察。

2019-11-20 更新：

經過實踐發展，可以使用時間單位來檢測該時間內的數據是否檢測振幅。

(數學不太好，隨便用一個字符代替說明一下)

設時間單位為t，音頻采樣率為S，如果連續的時間單位t時間內振幅很小(也可以計算分貝數)，可以認為是靜音(沒有聲音錄入) 。

待檢驗數據長度L=S*t，則檢測目標是長度為L的數組，如果這個時間類振幅(分貝)數據小于閾值(threshold),則認為近似靜音。

例：采樣率16000,2秒以外則認為沒有聲音輸入。即 2*16000長度的數組內，所有數組低于一個閾值。

stackoverflow答案：

How can I detect silence when recording operation is started in Java?

Calculate the?dB?or?RMS?value for a group of sound frames and decide at what level it is considered to be 'silence'.

What is PCM data?

Data that is in?Pulse-code modulation?format.

How can I calculate PCM data in Java?

I do not understand that question. But guessing it has something to do with the?speech-recognition?tag, I have some bad news.

This might theoretically be done using the?Java Speech API. But there are apparently no 'speech to text' implementations available for the API (only 'text to speech').

I have to calculate rms for speech-recognition project. But I do not know how can I calculate in Java.

For a single channel that is represented by signal sizes in a?double?ranging from -1 to 1, you might use this method.

/**Computes the RMS volume of a group of signal sizes ranging from -1 to 1.*/

public double volumeRMS(double[] raw) {double sum =0d;if (raw.length==0) {returnsum;

}else{for (int ii=0; ii

sum+=raw[ii];

}

}double average = sum/raw.length;double sumMeanSquare =0d;for (int ii=0; ii

sumMeanSquare+= Math.pow(raw[ii]-average,2d);

}double averageMeanSquare = sumMeanSquare/raw.length;double rootMeanSquare =Math.sqrt(averageMeanSquare);returnrootMeanSquare;

}

There is a byte buffer to save input values from the line, and what I should have to do with this buffer?

If using the?volumeRMS(double[])?method, convert the?byte?values to an array of?double?values ranging from -1 to 1. ;)

筆者的思路是計算音頻分貝值，可以參考通過pcm音頻數據計算分貝

很多場合我們需要動態顯示實時聲音分貝，下面列舉三種計算分貝的算法。(以雙聲道為例，也就是一個short類型，最大能量值為32767)

1：計算分貝音頻數據與大小

首先我們分別累加每個采樣點的數值，除以采樣個數，得到聲音平均能量值。

然后再將其做100與32767之間的等比量化。得到1-100的量化值。

通常情況下，人聲分布在較低的能量范圍，這樣就會使量化后的數據大致分布在1-20的較小區間，不能夠很敏感的感知變化。

所以我們將其做了5倍的放大，當然計算后大于100的值，我們將其賦值100.

//參數為數據，采樣個數//返回值為分貝

#define VOLUMEMAX 32767

int SimpleCalculate_DB(short* pcmData, intsample)

{

signedshort ret = 0;if (sample > 0){int sum = 0;

signedshort* pos = (signed short *)pcmData;for (int i = 0; i < sample; i++){

sum+= abs(*pos);

pos++;

}

ret= sum * 500.0 / (sample *VOLUMEMAX);if (ret >= 100){

ret= 100;

}

}returnret;

}

2：計算均方根(RMS) 即能量值

static const float kMaxSquaredLevel = 32768 * 32768;

constexprfloat kMinLevel = 30.f;void Process(const int16_t*data, size_t length)

{float sum_square_ = 0;

size_t sample_count_= 0;for (size_t i = 0; i < length; ++i) {

sum_square_+= data[i] *data[i];

}

sample_count_+=length;.float rms = sum_square_ / (sample_count_ *kMaxSquaredLevel);//20log_10(x^0.5) = 10log_10(x)

rms = 10 *log10(rms);if (rms < -kMinLevel)

rms= -kMinLevel;

rms= -rms;return static_cast(rms + 0.5);

}

3：獲取音頻數據最大的振幅(即絕對值最大)(0-32767)，除以1000，得到(0-32)。從數組中獲取相應索引所對應的分貝值。(提取自webrtc)

const int8_t permutation[33] ={0,1,2,3,4,4,5,5,5,5,6,6,6,6,6,7,7,7,7,8,8,8,9,9,9,9,9,9,9,9,9,9,9};

int16_t WebRtcSpl_MaxAbsValueW16C(const int16_t*vector, size_t length)

{

size_t i= 0;int absolute = 0, maximum = 0;for (i = 0; i < length; i++) {

absolute= abs((int)vector[i]);if (absolute >maximum) {

maximum=absolute;

}

}if (maximum > 32767) {

maximum= 32767;

}return(int16_t)maximum;

}void ComputeLevel(const int16_t*data, size_t length)

{

int16_t _absMax= 0;

int16_t _count= 0;

int8_t _currentLevel= 0;

int16_t absValue(0);

absValue=WebRtcSpl_MaxAbsValueW16(data,length);if (absValue >_absMax)

_absMax=absValue;if (_count++ == 10) {

_count= 0;

int32_t position= _absMax/1000;if ((position == 0) && (_absMax > 250)){

position= 1;

}

_currentLevel=permutation[position];

_absMax>>= 2;

}

總結

以上是生活随笔為你收集整理的python pcm 分贝_语音文件 pcm 静默（静音）判断的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： python和sas哪个有用考研_金融学
下一篇：基于python的搜索引擎论文_技术分享

日韩av黄I国产麻豆传媒I国产91av视频在线观看I日韩一区二区三区在线看I美女国产在线I麻豆视频国产在线观看I成人黄色短片

python pcm 分贝_语音文件 pcm 静默（静音）判断

總結