主成分分析中特征值分解与SVD(奇异值分解)的比较及其相关R语言的实现
2019獨角獸企業重金招聘Python工程師標準>>>
>pca <- read.csv("D:/pca.csv")
>pca
??? x1? x2 x3 x4
1?? 40 2.0? 5 20
2?? 10 1.5? 5 30
3? 120 3.0 13 50
4? 250 4.5 18? 0
5? 120 3.5? 9 50
6?? 10 1.5 12 50
7?? 40 1.0 19 40
8? 270 4.0 13 60
9? 280 3.5 11 60
10 170 3.0? 9 60
11 180 3.5 14 40
12 130 2.0 30 50
13 220 1.5 17 20
14 160 1.5 35 60
15 220 2.5 14 30
16 140 2.0 20 20
17 220 2.0 14 10
18? 40 1.0 10? 0
19? 20 1.0 12 60
20 120 2.0 20? 0
> P=scale(pca)#將原始數據標準化后,建立矩陣P
> P
???????????? [,1]?????? [,2]?????? [,3]?????? [,4]
?[1,] -1.10251269 -0.3081296 -1.3477550 -0.7084466
?[2,] -1.44001658 -0.7821750 -1.3477550 -0.2513843
?[3,] -0.20250233? 0.6399614 -0.2695510? 0.6627404
?[4,]? 1.26001451? 2.0620978? 0.4043265 -1.6225713
?[5,] -0.20250233? 1.1140068 -0.8086530? 0.6627404
?[6,] -1.44001658 -0.7821750 -0.4043265? 0.6627404
?[7,] -1.10251269 -1.2562205? 0.5391020? 0.2056781
?[8,]? 1.48501710? 1.5880523 -0.2695510? 1.1198028
?[9,]? 1.59751839? 1.1140068 -0.5391020? 1.1198028
[10,]? 0.36000414? 0.6399614 -0.8086530? 1.1198028
[11,]? 0.47250544? 1.1140068 -0.1347755? 0.2056781
[12,] -0.09000104 -0.3081296? 2.0216325? 0.6627404
[13,]? 0.92251062 -0.7821750? 0.2695510 -0.7084466
[14,]? 0.24750285 -0.7821750? 2.6955100? 1.1198028
[15,]? 0.92251062? 0.1659159 -0.1347755 -0.2513843
[16,]? 0.02250026 -0.3081296? 0.6738775 -0.7084466
[17,]? 0.92251062 -0.3081296 -0.1347755 -1.1655090
[18,] -1.10251269 -1.2562205 -0.6738775 -1.6225713
[19,] -1.32751528 -1.2562205 -0.4043265? 1.1198028
[20,] -0.20250233 -0.3081296? 0.6738775 -1.6225713
> eigen(cov(P)) #求矩陣P的協方差矩陣的特征值和特征向量,向量矩陣($vectors)中的第一例(0.69996363....)即為第一個特征值(1.7182516)的特征向量,以此類推。
$values
[1] 1.7182516 1.0935358 0.9813470 0.2068656
$vectors
?????????? [,1]??????? [,2]??????? [,3]?????? [,4]
[1,] 0.69996363? 0.09501037 -0.24004879? 0.6658833
[2,] 0.68979810 -0.28364662? 0.05846333 -0.6635550
[3,] 0.08793923? 0.90415870 -0.27031356 -0.3188955
[4,] 0.16277651? 0.30498307? 0.93053167? 0.1208302
特征值分解可以得到特征值與特征向量,特征值表示的是這個特征到底有多重要,而特征向量表示這個特征是什么;奇異值σ跟特征值類似,在矩陣Σ中也是從大到小排列,而且σ的減少特別的快,在很多情況下,前10%甚至1%的奇異值的和就占了全部的奇異值之和的99%以上了。
>?>?svd(cov(P))$d?#奇異值分解實現,應用的矩陣同樣為原始數據的標準化后的協方差矩陣(方陣) [1]?1.7182516?1.0935358?0.9813470?0.2068656 $u[,1]????????[,2]????????[,3]???????[,4] [1,]?-0.69996363??0.09501037?-0.24004879?-0.6658833 [2,]?-0.68979810?-0.28364662??0.05846333??0.6635550 [3,]?-0.08793923??0.90415870?-0.27031356??0.3188955 [4,]?-0.16277651??0.30498307??0.93053167?-0.1208302 $v[,1]????????[,2]????????[,3]???????[,4] [1,]?-0.69996363??0.09501037?-0.24004879?-0.6658833 [2,]?-0.68979810?-0.28364662??0.05846333??0.6635550 [3,]?-0.08793923??0.90415870?-0.27031356??0.3188955 [4,]?-0.16277651??0.30498307??0.93053167?-0.1208302
結果顯示和特征值分解的結果完全相同,即奇異值=特征值;左奇異向量與右奇異向量相等,這點和理論一致:
http://blog.csdn.net/wangzhiqing3/article/details/7446444?
2. 奇異值分解
上面討論了方陣的分解,但是在LSA中,我們是要對Term-Document矩陣進行分解,很顯然這個矩陣不是方陣。這時需要奇異值分解對Term-Document進行分解。奇異值分解的推理使用到了上面所講的方陣的分解。
假設C是M x N矩陣,U是M x M矩陣,其中U的列為CCT的正交特征向量,V為N x N矩陣,其中V的列為CTC的正交特征向量,再假設r為C矩陣的秩,則存在奇異值分解:
S奇異值分解是一個能適用于任意的矩陣的一種分解的方法,VD處理普通矩陣mxn,待續......
?
>?svd(P)$d??#奇異值分解實現,應用的矩陣為原始數據的標準化后矩陣(20X4) [1]?5.713736?4.558199?4.318054?1.982535 $u[,1]????????[,2]????????[,3]????????[,4][1,]??0.213188874?-0.31854661??0.01117934?-0.09356334[2,]??0.298743593?-0.26550125?-0.09966088?-0.02040249[3,]?-0.067184471?-0.05316902?-0.17961537?-0.19846043[4,]?-0.363306085?-0.13041863??0.41709916?-0.43090590[5,]?-0.116116981?-0.18960342?-0.21978181?-0.27040771[6,]??0.258181271?-0.01720108?-0.23759348?-0.11644174[7,]??0.272565880??0.17588846?-0.05485743?-0.02402906[8,]?-0.401395312?-0.04641079?-0.19713543??0.07886498[9,]?-0.353798967?-0.06803484?-0.20133714??0.31867233 [10,]?-0.140818406?-0.11779839?-0.28058876??0.10504376 [11,]?-0.196159553?-0.07244560?-0.04157562?-0.17994125 [12,]?-0.001770250??0.46264984?-0.01709488?-0.21189013 [13,]?-0.002549413??0.07396825??0.23141706??0.48510618 [14,]?-0.009279184??0.66343445?-0.04822489?-0.02040610 [15,]?-0.123807056?-0.03464961??0.09477346??0.26067355 [16,]??0.044254060??0.10591148??0.20027671?-0.04088441 [17,]?-0.040535151?-0.06631368??0.29818366??0.36362322 [18,]??0.343318979?-0.18704239??0.26319302??0.05965465 [19,]??0.288607918??0.04522412?-0.32341707??0.10786442 [20,]??0.097860256??0.04005869??0.38476035?-0.17217051 $v[,1]????????[,2]????????[,3]???????[,4] [1,]?-0.69996363??0.09501037??0.24004879??0.6658833 [2,]?-0.68979810?-0.28364662?-0.05846333?-0.6635550 [3,]?-0.08793923??0.90415870??0.27031356?-0.3188955 [4,]?-0.16277651??0.30498307?-0.93053167??0.1208302 奇異值與潛在語義索引LSI Book?<-?read.csv("D:/Book.csv") Book K=as.matrix(data.frame(Book)) svd(K) rownames(kk)=Book$X kk rownames(v)=paste('T',1:9,sep='') plot(rnorm,xlim=c(-0.8,0),ylim=c(-0.8,0.6),lty=0) points(v[,3],v[,2],col='red') points(kk[,3],kk[,2],col='blue') text(kk[,3],kk[,2],Book$X) text(v[,3],v[,2],paste('T',1:9,sep=''))
結果顯示右奇異矩陣為之前原始數據的標準化后的協方差矩陣的特征向量矩陣
svd即可以實現對列的壓縮(變量),也可以實現對行的壓縮(case)
轉載于:https://my.oschina.net/u/1272414/blog/190032
總結
以上是生活随笔為你收集整理的主成分分析中特征值分解与SVD(奇异值分解)的比较及其相关R语言的实现的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 将Windows文件挂在到Linux上
- 下一篇: 技术检验