當(dāng)前位置：首頁 > 编程语言 > python >内容正文

python

python计算矩阵的散度_数据集相似度度量之KLJS散度

發(fā)布時(shí)間：2023/12/14 python 38 豆豆

生活随笔收集整理的這篇文章主要介紹了 python计算矩阵的散度_数据集相似度度量之KLJS散度小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

原標(biāo)題：數(shù)據(jù)集相似度度量之KL&JS散度

一、KL散度

1、什么是KL散度

KL散度又叫相對熵，是描述兩個(gè)概率分布差異的一種方法，有人將KL散度稱為KL距離，但實(shí)際上它不滿足距離概念中的兩個(gè)條件，a、對稱性，即D(P||Q)=D(Q||P); b、三角不等式；

2、有什么樣的作用

模型效果好不好，在數(shù)據(jù)劃分上大有講究，如果訓(xùn)練集與測試集數(shù)據(jù)分布不滿足同分布，模型表現(xiàn)必然不會(huì)太好，因此劃分?jǐn)?shù)據(jù)集之后對于兩個(gè)數(shù)據(jù)分布驗(yàn)證變得非常重要，針對分類任務(wù)驗(yàn)證概率質(zhì)量相似度，針對回歸問題驗(yàn)證兩者的概率密度相似度，兩者分布越相似，相對熵越接近于0；

3、實(shí)現(xiàn)方式

a、離散性標(biāo)簽，各標(biāo)簽概率及相應(yīng)對數(shù)求和操作

b、連續(xù)型標(biāo)簽，針對x的無個(gè)數(shù)區(qū)間概率及對數(shù)積分操作

fromsklearn importdatasets

fromcollections importCounter

importnumpy asnp

random_state= 32

defgetData(n_classes ,weights ,n_features ,n_samples):

train ,label=datasets.make_classification( n_classes=n_classes ,class_sep= 2,weights=weights ,n_features=n_features ,n_samples=n_samples ,random_state=random_state)

returntrain ,label

defcomputePdotLnP(p ,q ,I= True):

ifI:

returnp*np.log(p/q)

else:

returnp*np.log(p/q)* 0.01

defcomputeKL(train_label:np.array ,test_label:np.array):

KlValue= 0.0

n_train_label=Counter(train_label)

n_test_label=Counter(test_label)

forkey ,value inn_train_label.items():

p=value/train_label.shape[ 0]

q=n_test_label[key]/test_label.shape[ 0]

KlValue+=computePdotLnP(p ,q ,True)

returnKlValue

defcomputeSplitValue(label ,splitValue):

maxValue = np.max(label) + 0.1

minValue = np.min(label)

splitvalue = [i fori innp.arange(minValue ,maxValue ,splitValue)]

returnsplitvalue

defdataDiscretization(train_label ,label:np.array ,splitValue):

splitvalue=computeSplitValue(train_label ,splitValue)

splitDict={}

forrow inlabel:

fori inrange( len(splitvalue)- 1):

ifrow>=splitvalue[i] androw

ifsplitvalue[i] insplitDict:

splitDict[splitvalue[i]]+= 1

else:

splitDict[splitvalue[i]]= 1

returnsplitDict

defcomputeK_L(train_label:np.array ,test_label:np.array):

n_train_label=dataDiscretization(train_label ,train_label ,0.01)

n_test_label=dataDiscretization(train_label ,test_label ,0.01)

KlValue= 0.0

forkey ,value inn_train_label.items():

try:

p=value/train_label.shape[ 0]

q = n_test_label[key] / test_label.shape[ 0]

KlValue += np.abs(computePdotLnP(p ,q ,False))

except:

pass

return KlValue

train ,label_train=getData( n_classes= 2,weights=[ 0.1,0.9] ,n_features= 10,n_samples= 3000)

test ,label_test=getData( n_classes= 2,weights=[ 0.6,0.4] ,n_features= 10,n_samples= 3000)

print( "分類標(biāo)簽的KL散度",computeKL(label_train ,label_test))

label_train_value=np.random.rand( 100)

label_test_value=np.random.rand( 100)

print( "連續(xù)標(biāo)簽的KL散度",computeK_L(label_train_value ,label_test_value))

二、JS散度

1、JS散度與KL散度

之前有寫過KL散度，KL散度由于不符合距離中的對稱性，所以在KL散度的基礎(chǔ)上進(jìn)行了改進(jìn)，形成了JS散度，

KL散度計(jì)算公式：KL(P||Q)=sum(P(i)*log(P(i)/Q(i))

JS散度計(jì)算公式：JS(P||Q)=0.5*KL(P||(P+Q)/2)+0.5*KL(Q||(P+Q))

2、Python實(shí)現(xiàn)(離散變量)

fromsklearn importdatasets

fromcollections importCounter

importnumpy asnp

random_state= 32

defgetData(n_classes ,weights ,n_features ,n_samples):

train ,label=datasets.make_classification( n_classes=n_classes ,class_sep= 2,weights=weights ,n_features=n_features ,n_samples=n_samples ,random_state=random_state)

returntrain ,label

defcomputePdotLnP(p ,m ,I= True):

ifI:

returnp*np.log(p/m)

else:

returnp*np.log(p/m)

defcomputeJS(train_label:np.array ,test_label:np.array):

KlValue= 0.0

n_train_label=Counter(train_label)

n_test_label=Counter(test_label)

forkey ,value inn_train_label.items():

p=value/train_label.shape[ 0]

q=n_test_label[key]/test_label.shape[ 0]

m=(p+q)/ 2

KlValue+= 0.5*computePdotLnP(p ,m ,True)+ 0.5*computePdotLnP(q ,m ,True)

returnKlValue

train ,label_train=getData( n_classes= 2,weights=[ 0.6,0.4] ,n_features= 10,n_samples= 3000)

test ,label_test=getData( n_classes= 2,weights=[ 0.6,0.4] ,n_features= 10,n_samples= 3000)

print( "分類標(biāo)簽的JS散度",computeJS(label_train ,label_test))

由于水平有限，請參照指正~返回搜狐，查看更多

責(zé)任編輯：

總結(jié)

以上是生活随笔為你收集整理的python计算矩阵的散度_数据集相似度度量之KLJS散度的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： WPF MVVM框架漂亮界面风格的WP
下一篇：整人的python代码_vbe最新整人代

日韩av黄I国产麻豆传媒I国产91av视频在线观看I日韩一区二区三区在线看I美女国产在线I麻豆视频国产在线观看I成人黄色短片

python

python计算矩阵的散度_数据集相似度度量之KLJS散度

總結(jié)