當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

主成分分析具体解释_主成分分析-现在用您自己的术语解释

發布時間：2023/12/15 编程问答 32 豆豆

生活随笔收集整理的這篇文章主要介紹了主成分分析具体解释_主成分分析-现在用您自己的术语解释小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

主成分分析具體解釋

The caption in the online magazine “WIRED” caught my eye one night a few months ago. When I focused my eyes on it, it read: “Can everything be explained to everyone in terms they can understand? In 5 Levels, an expert scientist explains a high-level subject in five different layers of complexity - first to a child, then a teenager, then an undergrad majoring in the same subject, a grad student, and finally, a colleague”.

幾個月前一個晚上，在線雜志“ WIRED”的標題引起了我的注意。當我將目光聚焦在它上面時，它寫著：“ 可以用每個人都能理解的術語向所有人解釋一切嗎？在5個級別中，一位專家科學家通過五個不同層次的復雜性解釋了一個高級主題 - 首先是給孩子，然后是少年，然后是主修同一主題的本科生，然后是研究生，最后是同事 ”。

Curious, I clicked on the link and started learning about exciting new concepts I could finally grasp with my own level of understanding. Music, Biology, Physics, Medicine - all seemed clear that night. Needless to say, I couldn’t stop watching the series and went to bed very very late.

很好奇，我單擊了鏈接，開始學習一些令人興奮的新概念，這些新概念最終將以我自己的理解水平掌握。音樂，生物學，物理學，醫學-那天晚上似乎都很清楚。不用說，我無法停止觀看該系列節目，而且非常晚才上床睡覺。

Screenshot of WIRED website, showing the collection of “5-levels” concepts (image: screenshot)WIRED網站的屏幕截圖，顯示了“ 5級”概念的集合(圖片：屏幕截圖)

I actually started writing this article while working on a more technical piece. From a few paragraphs in the text, it grew in size until I felt my original article could no longer hold its weight. Could I explain the key concepts to peers and to co-workers, as well as to children and people without mathematical orientation? How far along will readers follow the explanations?

我實際上是在撰寫更多技術文章時開始寫這篇文章的。從文本中的幾段開始，它的大小不斷增加，直到我覺得我的原始文章不再能承受它的重量。我能否向同齡人和同事以及沒有數學知識的孩子和人們解釋關鍵概念？讀者將遵循這些解釋多遠？

Let’s find out :)

讓我們找出來:)

1) Child

1)兒童

Sometimes, when we learn new things, we are told lots of facts and might be shown a drawing or a table with numbers. Seeing a lot of numbers and tables can be confusing, so it would be really helpful if we could reach the same conclusions, just using less of these facts, tables, and numbers.

有時，當我們學習新事物時，會被告知很多事實，并可能會看到帶有數字的圖形或表格。看到大量的數字和表格可能會造成混淆，因此，如果我們能夠使用較少的事實，表格和數字得出相同的結論，那將非常有幫助。

Principal Component Analysis (or PCA for short) is what we call an algorithm: a set of instructions to follow. If we represent all our facts and tables using numbers, following these instructions will allow us to represent them using fewer numbers.

主成分分析(或簡稱PCA)是我們所謂的算法：一組遵循的指令。如果我們使用數字表示所有事實和表格，則按照以下說明進行操作可以使我們使用較少的數字表示它們。

If we convert these numbers back into facts, we can still draw the same conclusions. But because we drew them using fewer facts than before, performing PCA just helped us be less confused.

如果將這些數字轉換為事實，我們仍然可以得出相同的結論。但是，因為我們使用的事實比以前更少了，所以執行PCA可以減少我們的困惑。

2) High-school aged teenager

2)高中生

Our math studies focus on a few key areas that give us a basis for mathematical understanding and orientation. High school math often means learning Calculus, which deals with subjects like function analysis and analytic geometry. Let’s use them to better explain Principal Component Analysis.

我們的數學研究集中在幾個關鍵領域，這些領域為我們提供了數學理解和定向的基礎。高中數學通常意味著學習微積分，它涉及功能分析和解析幾何等主題。讓我們用它們更好地解釋主成分分析。

You probably encountered functions like this one during one time on another in your high school (image by author)您可能在高中的另一時間遇到了類似的功能(作者提供)

Functions are special objects in mathematics that give us an output value when we feed it an input value. This makes them very useful in learning certain relationships in data. A key technique taught in high school is how to draw the graph of a function by performing a short analysis of its properties. We use something from Calculus called the Derivative: A derivative is just an approximation of the slope of a function at a given point. We choose a few key points inside the area of our graph to find the values our function will take at these points and calculate the derivative of the function at these points to get hints about the slope of the function. Then we use the points and the slopes we just got to draw an approximation of the function’s shape.

函數是數學中的特殊對象，當我們將輸入值提供給輸入值時，函數會為我們提供輸出值。這使得它們在學習數據中的某些關系時非常有用。高中教授的一項關鍵技術是如何通過對功能的簡短分析來繪制功能圖。我們使用微積分中的一種叫做導數的東西：導數只是一個函數在給定點的斜率的近似值。我們在圖形區域內選擇一些關鍵點，以查找函數在這些點處采用的值，并在這些點處計算函數的導數，以獲取有關函數斜率的提示。然后，我們使用得到的點和斜率來繪制函數形狀的近似值。

But sometimes functions can get evil… (image by author)但是有時函數會變得邪惡……(作者提供的圖片)

In the real world, sometimes we have lots of functions and even strange things like functions that output multiple values at the same time or functions that require multiple inputs to give us values. And if you are the kind of person that finds high school functions dreadful, imagine how dreadful these multi-number functions can be — even for skilled adults! Principal component analysis is a technique that takes the output for all these functions and gives us close approximations using much less of these numbers. Fewer numbers often mean memorizing less data, smaller and cheaper ways to store them, and less of that “being confused” feeling we get when we see so many numbers we don’t even know where to start.

在現實世界中，有時我們有很多功能，甚至有些奇怪的事情，例如同時輸出多個值的功能或需要多個輸入才能為我們提供值的功能。而且，如果您是那種對高中職能感到恐懼的人，請想象一下，即使對于熟練的成年人來說，這些多重數字功能可能有多么可怕！主成分分析是一種獲取所有這些函數的輸出并使用更少的這些數字為我們提供近似值的技術。數字越少，通常意味著要記住的數據越少，存儲數據的方式越小越便宜，并且當我們看到如此多的數字甚至不知道從哪里開始時，就會感到越來越“困惑”。

3) First-year University student

3)大學一年級學生

During your studies, you’ve learned all about linear algebra, statistics, and probability. You’ve dealt with a few of these “real world” functions that input and output multiple values and learned that they work with these things called vectors and matrices. You’ve also learned all about random variables, sample values, means, variances, distributions, covariances, correlation, and all the rest of the statistics techno-babble.

在學習期間，您已經了解了線性代數，統計量和概率的全部知識。您已經處理了其中一些輸入和輸出多個值的“真實世界”函數，并了解到它們可用于矢量和矩陣。您還了解了有關隨機變量，樣本值，均值，方差，分布，協方差，相關性以及所有其他統計技術tech語的知識。

youtoart.com)youtoart.com )

Principal Component Analysis relies on a technique called “Singular value decomposition” (SVD for short). For now let’s treat it as what’s called a “black box” in computer science: an unknown function that gives us some desired output once we feed it with the required input. If we collect lots of data (from observations, experimentation, monitoring etc…) and store it in matrix form, we can input this matrix to our SVD function and get a matrix of smaller dimensions that allows us to represent our data with fewer values.

主成分分析依賴于稱為“奇異值分解”(簡稱SVD)的技術。現在，讓我們將其視為計算機科學中的“黑匣子”：一個未知函數，一旦我們將其輸入所需的輸入，便會提供一些所需的輸出。如果我們收集大量數據(來自觀察，實驗，監視等)并將其以矩陣形式存儲，則可以將此矩陣輸入到SVD函數中，并獲得較小尺寸的矩陣，該矩陣可以用更少的值表示數據。

While for now, it might make sense to keep the SVD as a black box, it could be interesting to investigate the input and output matrices further. And it turns out the SVD gives us extra-meaningful outputs for a special kind of matrix called a “covariance matrix”. Being an undergrad student, you’ve already dealt with covariance matrices, but you might be thinking: “What does my data have to do with them?”

雖然目前將SVD保留為黑匣子可能是有意義的，但進一步研究輸入和輸出矩陣可能會很有趣。事實證明，SVD為我們提供了一種稱為“協方差矩陣”的特殊矩陣，對我們來說意義非凡。作為本科生，您已經處理過協方差矩陣，但是您可能會想：“我的數據與它們有什么關系？”

Left - Eugenio Beltrami; Right - Camille Jordan; Two late-19th century mathematicians who independently discovered SVD (images: public domain)左-Eugenio Beltrami; 右-卡米爾·喬丹(Camille Jordan)；兩位19世紀末的數學家獨立發現了SVD(圖片：公共領域)

Our data actually has quite a lot to do with covariance matrices. If we sort our data into distinct categories, we can group related values in a vector representing that category. Each one of these vectors can also be seen as a random variable, containing n samples (which makes n the length of the vector). If we concatenate these vectors together, we can form a matrix X of m random variables, or a random vector with m scalars, holding n measurements of that vector.

實際上，我們的數據與協方差矩陣有很大關系。如果我們將數據分類為不同的類別，則可以將相關值分組在代表該類別的向量中。這些向量中的每一個也可以看作是一個隨機變量，包含n個樣本(使向量的長度為n)。如果將這些向量連接在一起，我們可以形成m個隨機變量的矩陣X，或者形成一個m個標量的隨機向量，并對該向量進行n次測量。

The matrix of concatenated vectors, representing our data, can then be used to calculate the covariance matrix for our random vector. This matrix is then provided as an input for our SVD, which provides us with a unitary matrix as an output.

表示我們數據的級聯向量矩陣可用于計算隨機向量的協方差矩陣。然后將此矩陣作為我們SVD的輸入，后者為我們提供了一個matrix矩陣作為輸出。

Without diving in too deep into unitary matrices, the output matrix has a neat property: we can pick the first k column vectors from the matrix to generate a new matrix U, and multiply our original matrix X with the matrix U to obtain a lower-dimensional matrix. It turns out this matrix is the “best” representation of the data stored in X when mapped to a lower dimension.

在不深入探討unit矩陣的情況下，輸出矩陣具有簡潔的屬性：我們可以從矩陣中選取前k個列向量來生成新矩陣U，然后將原始矩陣X與矩陣U相乘以獲得較低的-尺寸矩陣。事實證明，當映射到較低維度時，此矩陣是存儲在X中的數據的“最佳”表示。

4) Bachelor of Science

4)理學學士

Graduating with an academic degree, you now have the mathematical background needed to properly explain Principal component analysis. But just writing down the math won’t be helpful if we don’t understand the intuition behind it or the problem we are trying to solve.

通過學習，您現在擁有正確解釋主成分分析所需的數學背景。但是，如果我們不了解其背后的直覺或我們試圖解決的問題，則僅寫下數學將無濟于事。

The problem PCA tries to solve is what we nickname the “curse of dimensionality”: When we try to train machine learning algorithms, the rule of thumb is that the more data we collect, the better our predictions become. But with every new data feature (implying we now have an additional random vector) the dimension of the vector space spanned by the feature vectors increases by one. The larger our vector space, the longer it takes to train our learning algorithms, and the higher the chance some of the data might be redundant.

PCA試圖解決的問題是我們昵稱的“維數詛咒”：當我們嘗試訓練機器學習算法時，經驗法則是，我們收集的數據越多，我們的預測就越好。但是，對于每個新的數據特征(這意味著我們現在有了一個附加的隨機向量)，特征向量跨越的向量空間的維數將增加一。向量空間越大，訓練我們的學習算法所需的時間就越長，并且某些數據可能是多余的機會也就越大。

towardsdatascience.com)toondatascience.com )

To solve this, researchers and mathematicians have constantly been trying to find techniques that perform “dimensionality reduction”: embedding our vector space in another space with a lower dimension. The inherent problem in dimensionality reductions is that for every decrease in the dimension of the vector space, we essentially discard one of the random vectors spanning the original vector space. That is because the basis for the new space has one vector less, making one of our random vectors a linear combination of the others, and therefore redundant in training our algorithm.

為了解決這個問題，研究人員和數學家一直在努力尋找執行“降維”的技術：將向量空間嵌入到另一個具有較小維度的空間中。降維的固有問題是，向量空間尺寸的每減小一次，我們實際上就丟棄了跨越原始向量空間的隨機向量之一。那是因為新空間的基礎少了一個向量，使我們的隨機向量之一成為其他向量的線性組合，因此在訓練我們的算法時是多余的。

Of course, we do have na?ve techniques for dimensionality reduction (like just discarding random vectors and checking which lost vector has the smallest effect on the accuracy of the algorithm). But the question asked is “How can I lose the least information when performing dimensionality reduction?” And it turns out the answer is “By using PCA”.

當然，我們確實有用于降維的幼稚技術(就像只是丟棄隨機向量并檢查哪個丟失的向量對算法的精度影響最小)。但是問的問題是“執行降維時如何丟失最少的信息？” 事實證明，答案是“通過使用PCA”。

Formulating this mathematically, if we represent our data in matrix form as we did before, with n random variables having m measurements each, we represent each sample as the row xi. We are therefore searching for a linear map T: ?n → ?k that minimizes the Euclidian distances between xi and T(xi)

用數學公式表示，如果我們像以前一樣以矩陣形式表示數據，則n個隨機變量各具有m個測量值，我們將每個樣本表示為xi行。因此，我們正在尋找一個線性映射T： ?n → ?k ，它使xi與T(xi)之間的歐幾里得距離最小

What are we trying to do? To minimize information loss during dimensionality reduction我們要做什么？在降維過程中使信息丟失最小化

The intuition behind this formulation is that if we represent our image vectors as n-dimensional vectors in the original vector space, the less they have moved from their original positions implies the smaller the loss of information. Why does distance imply loss of information? Because the closer a representation of a vector is to its original location, the more accurate the representation.

這種表述的直覺是，如果我們將圖像矢量表示為原始矢量空間中的n維矢量，則它們從其原始位置移出的次數越少，則意味著信息損失越小。為什么距離意味著信息丟失？因為矢量的表示越接近其原始位置，表示就越準確。

We can say T1 is a more accurate projection that T2 because it is closer to the original vector Xi (image by author)我們可以說T1比T2更精確，因為它更接近原始向量Xi(作者提供的圖像)

How do we find that linear map? It turns out that the map is provided by the Singular value decomposition! Recall that if we invoke SVD we obtain a unitary matrix U that satisfies

我們如何找到線性圖？事實證明，映射是由奇異值分解提供的！回想一下，如果調用SVD，我們將獲得一個滿足的matrix矩陣U

Due to the isomorphism between the linear map space and the matrix space, we can see the matrix U as representing a linear map from ?n to ?n:

由于線性圖空間和矩陣空間之間的同構，我們可以看到矩陣U表示從?n到?n的線性圖：

The vector space of linear maps from Rn to Rk is isomorphic to the matrix space R(n x k)從Rn到Rk的線性映射的向量空間與矩陣空間R(nxk)同構

How exactly does SVD provide us with the function U? SVD is essentially a generalization of an important theorem in linear algebra called the Spectral theorem. While the spectral theorem can only be applied to normal matrices (satisfying MM* = M*M, where M* is the complex conjugate of M), SVD generalizes that result to any arbitrary matrix M, by performing a matrix decomposition into three matrices: SVD(M) = U*Σ*V.

SVD如何為我們提供函數U？ SVD本質上是線性代數中一個重要定理的泛化，稱為譜定理。雖然頻譜定理僅適用于正常矩陣(滿足MM * = M * M，其中M *是M的復共軛)，但SVD通過將矩陣分解為三個矩陣將結果歸納為任意矩陣M： SVD(M)= U *Σ* V。

According to the Spectral theorem, U is a unitary matrix whose row vectors are an orthonormal basis for Rn, with each row vector spanning an eigenspace orthogonal to the other eigenspaces. If we denote that basis as B = {u1, u2 … un}, And because U diagonalizes the matrix M, it can also be viewed as a transformation matrix between the standard basis and the basis B.

根據譜定理，U是一個ary矩陣，其行向量是Rn的正交基礎，每個行向量都跨越與其他本征空間正交的本征空間。如果我們將該基準表示為B = {u1，u2…un}，并且由于U將矩陣M對角化，則它也可以看作是標準基準和基準B之間的轉換矩陣。

Illustration of a spectral decomposition of an arbitrary linear map into 3 orthogonal projections. Each of which project Xi onto a matching eigenspace, nested within R3 (image by author)將任意線性圖譜分解為3個正交投影的插圖。每個Xi將其投影到匹配的本征空間中，嵌套在R3中(作者提供圖片)

Of course, any subset Bk = {u1, u2 … uk} is an orthonormal basis for ?(k). This means that performing the multiplication M * U-1 (similar to multiplying U * M) is isomorphic to a linear map T: ?n → ?k. The theorems behind SVD generalizes this result to any given matrix. And while these matrices aren’t necessarily diagonalizable (or even square), the results regarding the unitary matrices still hold true for U and V. All we have left is to perform the multiplication X’ = Cov(X) * U, performing the dimensionality reduction we were searching for. We can now use the reduced matrix X’ for any purpose we’d have used the original matrix X for.

當然，任何子集Bk = {u1，u2…uk}是? (k)的正交基礎。這意味著，在執行乘法M * U-1(類似于乘以U * M)是同構的線性地圖T：??→??。 SVD背后的定理將這個結果推廣到任何給定的矩陣。盡管這些矩陣不一定是對角線化的(甚至平方)，但有關U和V的the矩陣的結果仍然成立。剩下的就是執行乘法X'= Cov(X)* U，然后執行我們正在尋找的降維。現在，我們可以將縮減后的矩陣X'用于原始矩陣X所用于的任何目的。

5) Expert data scientist

5)專家數據科學家

As an expert data scientist with a hefty amount of experience in research and development of learning algorithms, you probably already know of PCA and don’t need abridged descriptions of its internal workings and of applying it in practice. I do find however that many times, even when we have been applying an algorithm for a while, we don’t always know the math that makes it work.

作為在學習算法的研究和開發方面擁有大量經驗的專家數據科學家，您可能已經了解PCA，不需要對其內部工作以及在實踐中應用它的簡要說明。但是，我確實發現很多次，即使我們已經應用了一段時間的算法，我們也不總是知道使它起作用的數學方法。

If you’ve followed along with the previous explanations, we still have two questions left: Why does the matrix U represent the linear map T which minimizes the loss of information? And how does SVD provide us with such a matrix U?

如果您按照前面的說明進行操作，那么我們還有兩個問題：為什么矩陣U代表線性映射T，從而將信息損失降至最低？ SVD如何為我們提供這樣的矩陣U？

The answer to the first question lies in an important result in linear algebra called the Principal Axis theorem. It states that every ellipsoid (or hyper-ellipsoid equivalent) shape in analytic geometry can be represented as a quadratic form Q: ?n x ?n → ? of the shape Q(x)=x’Ax, (here x’ denotes the transpose of x) such that A is diagonalizable and has a spectral decomposition into mutually orthogonal eigenspaces. The eigenvectors obtained form an orthonormal basis of ?n, and have the property of matching the axes of the hyper-ellipsoid. That’s neat!

第一個問題的答案在于線性代數的一個重要結果，稱為主軸定理。它指出解析幾何中的每個橢球(或超橢球等效)形狀都可以表示為二次形式Q： ?n x?n → ?形狀Q(x)= x'Ax，(此處x'表示轉置的x)，使得A可對角化并在光譜上分解為相互正交的本征空間。形式?n的正交基所獲得的特征向量，并具有超橢圓體的軸相匹配的屬性。那很整齊！

researchgate.net)researchgate.net )

The Principal Axis theorem shows a fundamental connection between linear algebra and analytic geometry. When plotting our matrix X and visualizing the individual values, we can plot a (roughly) hyper-ellipsoid shape containing our data points. We can then plot a regression line for each dimension of our data, trying to minimize the distance to the orthogonal projection of the data points onto the regression line. Due to the properties of hyper-ellipsoids, these lines are exactly their axes. Therefore, the regression lines correspond to the spanning eigenvectors obtained from the spectral decomposition of the quadratic form Q.

主軸定理顯示了線性代數與解析幾何之間的基本聯系。在繪制矩陣X并可視化各個值時，我們可以繪制一個包含數據點的(大致)超橢圓形。然后，我們可以為數據的每個維度繪制一條回歸線，以盡量減少到數據點在回歸線上的正交投影的距離。由于超橢圓體的特性，這些線正好是它們的軸。因此，回歸線對應于從二次形式Q的光譜分解中獲得的跨越特征向量。

The relationship and tradeoff between a vector, a possible orthogonal projection and the projection’s orthogonal complement (image by author)向量，可能的正交投影和投影的正交補碼之間的關系和權衡(作者提供的圖像)

Reviewing a diagram similar to the one used when formulating the problem, notice the tradeoff between the length of an orthogonal projection to the length of its orthogonal complement: The smaller the orthogonal complement, the larger the orthogonal projection. Even though X is written in matrix form, we must not forget it is a random vector, and as such its components exhibit variance between their measurements. Since P(x) is a projection, it inevitably loses some information about x, which accumulates to a loss of variance.

查看與解決問題時使用的示意圖類似的圖，請注意正交投影的長度與其正交補角的長度之間的折衷：正交補角越小，正交投影就越大。即使X以矩陣形式編寫，我們也不能忘記它是一個隨機向量，因此它的分量在它們的測量之間表現出方差。由于P(x)是一個投影，它不可避免地會丟失一些有關x的信息，從而累積了方差損失。

In order to minimize the variance lost (i.e. retain the maximum possible variance) across the orthogonal projections, we must minimize the length of the orthogonal complements. Since the regression lines coinciding with our data’s principal axes minimizes the orthogonal complements of the vectors representing our measurements, they also maximize the amount of variance kept from the original vectors. This is exactly why SVD “minimizes the loss of information”: because it maximizes the variance kept.

為了最小化正交投影上損失的方差(即，保持最大可能的方差)，我們必須最小化正交互補序列的長度。由于與我們數據主軸一致的回歸線使代表我們測量值的矢量的正交補數最小，因此它們也使與原始矢量保持的方差量最大化。這就是SVD“最大程度地減少信息丟失”的原因：因為它最大程度地保持了差異。

Now that we’ve discussed the theory behind SVD, how does it do its magic? SVD takes a matrix X as it’s input, and decomposes it into three matrices: U, Σ, and V, that satisfy X = U*Σ*V. Since SVD is a decomposition rather than a theorem, let’s perform it step by step on the given matrix X.

既然我們已經討論了SVD背后的理論，那么它如何發揮作用呢？ SVD將矩陣X作為輸入，并將其分解為三個矩陣：U，Σ和V，它們滿足X = U *Σ* V。由于SVD是分解而不是定理，因此讓我們在給定的矩陣X上逐步執行它。

First, we’re going to introduce a new term: a Singular vector of a matrix is any vector v that satisfies the equation

首先，我們將引入一個新術語：矩陣的奇異向量是滿足等式的任何向量v

Since X ∈ ?(m x n), we have no guarantees as to its dimensions or its contents. X might not have eigenvalues (since eigenvalues are only defined for square matrices), but it always has at least ρ different singular vectors (with ρ(X) being the rank of X), each with a matching singular value.

由于X∈? (mxn)，因此我們無法保證其尺寸或內容。 X可能沒有特征值(因為特征值僅針對平方矩陣定義)，但它始終至少具有ρ個不同的奇異矢量(其中ρ(X)是X的秩)，每個奇異矢量都具有匹配的奇異值。

We can then use X to construct a symmetric matrix

然后我們可以使用X來構造對稱矩陣

Since every transposed matrix is also an adjoint matrix, the singular values of XT are the complex conjugates of the singular values of X. This makes each eigenvalue of XT*X the square of the matching singular value of X:

由于每個轉置矩陣也是一個伴隨矩陣，因此XT的奇異值是X的奇異值的復共軛。這使XT * X的每個特征值成為X的匹配奇異值的平方：

eigenvalues of XT*X are the squares of singular values of XXT * X的特征值是X的奇異值的平方

Because XT*X is symmetric, it is also normal. Therefore, XT*X is orthogonally diagonalizable: There exists a diagonal matrix Σ2 which can be written as a product of three matrices:

由于XT * X是對稱的，因此也是正常的。因此，XT * X可正交對角化：存在一個對角矩陣Σ2，可以將其寫為三個矩陣的乘積：

Orthogonal diagonalization of XT*XXT * X的正交對角化

where V is an orthonormal matrix made of eigenvectors of XTX. We will mark these vectors as

其中V是由XTX特征向量組成的正交矩陣。我們將這些向量標記為

?n, obtained using orthogonal diagonalization of XT*X?n的標準正交基，使用XT的正交對角化獲得* X

We now construct a new group of vectors

現在，我們構建一組新的向量

where we define each member as

我們將每個成員定義為

Notice that because Bv is an orthonormal group, for every 1≤i≤j≤n

注意，由于Bv是一個正交群，所以每1≤i≤j≤n

Therefore, we can prove that:

因此，我們可以證明：

Key equation #1: Because Bv is an orthonormal group, Bu is also an orthonormal group關鍵方程式1：因為Bv是一個正交群，所以Bu也是一個正交群

In addition, ui is also an eigenvector of X*XT: This is because

另外，ui也是X * XT的特征向量：這是因為

Key equation #2: Bu consists of eigenvectors of X*XT關鍵方程式2：Bu由X * XT的特征向量組成

We can now complete the proof by expressing the relationship between Bu and Bv in matrix form:

現在，我們可以通過以矩陣形式表示Bu和Bv之間的關系來完成證明：

The relationship between ui and vi, expressed in matrix formui和vi之間的關系，以矩陣形式表示

Then by standard matrix multiplication: U*Σ=X*V, and it immediately follows that

然后通過標準矩陣乘法：U *Σ= X * V，它緊隨其后

The result we just achieved is impressive, but remember we constructed Bu as a group of n vectors using the vectors of Bv, meaning U∈R(m×n) while Σ∈ R(n×n). While this is a valid multiplication, U is not a square matrix and therefore cannot be a unitary matrix. To solve this, we “pad” the matrix Σ with zeros to achieve an m×n shape and extend U to an m×m shape using the Gram-Schmidt process.

我們剛剛獲得的結果令人印象深刻，但請記住，我們使用Bv的向量將Bu構造為n個向量的組，這意味著U∈R(m×n)而Σ∈R(n×n)。雖然這是一個有效的乘法，但U不是方矩陣，因此不能是a矩陣。為了解決這個問題，我們用零“填充”矩陣Σ以得到m×n形狀，并使用Gram-Schmidt過程將U擴展為m×m形狀。

Now that we’ve finished the math part (phew…), we can start drawing some neat connections. First, while SVD can technically decompose any matrix, and we could just feed in the raw data matrix X, the Principal Axis theorem only works for diagonalizable matrices.

現在我們已經完成了數學部分((…)，我們可以開始繪制一些整潔的連接了。首先，雖然SVD可以從技術上分解任何矩陣，并且我們可以只輸入原始數據矩陣X，但主軸定理僅適用于可對角矩陣。

Second, the Principal axis theorem maximizes the retained variance in our matrix by performing an orthogonal projection onto the matrix’s eigenvectors. But who said our matrix captured a good amount of variance in the first place?

其次，主軸定理通過對矩陣的特征向量進行正交投影，使矩陣中的保留方差最大化。但是誰說我們的矩陣首先捕獲了大量的方差呢？

To answer these questions, and bring this article to an end, we will first restate that capturing variance between random variables is done by using a covariance matrix. And because the covariance matrix is symmetric and positive semi-definite, it is orthogonally diagonalizable AND has a spectral decomposition. We can then rewrite the formula for the covariance matrix using this neat equation, obtained by applying SVD to the matrices multiplied to form the covariance matrix:

為了回答這些問題并結束本文，我們將首先重申使用協方差矩陣來捕獲隨機變量之間的方差。并且因為協方差矩陣是對稱且正半定的，所以它可以正交對角化并且具有頻譜分解。然后，我們可以使用這個整齊的方程式重寫協方差矩陣的公式，該方程是通過將SVD應用于乘以形成協方差矩陣的矩陣而獲得的：

Key equation #3: Notice this is very similar to the orthogonal diagonalization of XT*X關鍵方程式3：請注意，這與XT * X的正交對角線化非常相似

This is exactly why we said that SVD gives us extra-meaningful outputs when being applied to the covariance matrix of X:

這就是為什么我們說SVD應用于X的協方差矩陣時會給我們額外的意義：

a) Not only is the Singular Value decomposition of Cov(X) is identical to its Spectral decomposition, it diagonalizes it!

a)Cov(X)的奇異值分解不僅與其頻譜分解相同，而且對角化！

b) The diagonalizing matrix V is an orthonormal matrix made of unitary eigenvectors of Cov(X), which is used to perform the Principal component analysis of X!

b)對角化矩陣V是由Cov(X)的單一特征向量組成的正交矩陣，用于執行X！的主成分分析！

c) Using Cov(X) captures the maximum amount of variance in our data, and by projecting it onto the basis eigenvectors associated with the k-largest eigenvalues of Cov(X) we lose the smallest possible amount of variance for reducing our data by (n-k) dimensions

c)使用Cov(X)捕獲數據中的最大方差量，并將其投影到與Cov(X)的k個最大特征值相關的基礎特征向量上，我們損失了最小的方差量來減少我們的數據(nk)尺寸

d) The Principal Axis theorem ensures that we minimize the projection error from ?n→?k when performing PCA using Cov(X) and V.

d)主軸定理確保當使用Cov(X)和V執行PCA時，使從?n → ?k引起的投影誤差最小。

翻譯自: https://medium.com/@yonatanalon/principal-component-analysis-now-explained-in-your-own-terms-6f7a4af1da8

主成分分析具體解釋

總結

以上是生活随笔為你收集整理的主成分分析具体解释_主成分分析-现在用您自己的术语解释的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： yolov3算法优点缺点_优点缺点
下一篇： netflix 数据科学家_数据科学和机

日韩av黄I国产麻豆传媒I国产91av视频在线观看I日韩一区二区三区在线看I美女国产在线I麻豆视频国产在线观看I成人黄色短片

编程问答

主成分分析具体解释_主成分分析-现在用您自己的术语解释

總結