Cracking Principal Components Analysis (PCA) — Part 2

Hyeon Gu Kim
6 min readFeb 2, 2022
Photo by fabio on Unsplash

This blog is based on Professor Tom Sager’s Unsupervised Learning class

I n real world application, the story is the most important part of analysis. Although we know what a sophisticated technology is, it would be meaningless if we don’t know how to interpret its resulted output from the sophisticated technology. Now that we have better understanding about Principal Component Analysis (PCA) from Part 1, we will go to the most significant part — the interpretation of PCA.

Interpretation of PCA using correlation (loading) and component coefficients

A s we discussed in the previous blog, PCA simply captures most of the information of old variables by new variables through rotating dataset for different perspectives. In order to understand the interpretation of PCA, we first need to know how much each of the original variables correlates with the component. Then, in order to give the component its meaning, we would look to the meaning of those original variables that correlate highly with the component.

“Loading is the correlation between an original variable and a principal component.”

The correlation between an original variable and a principal component is called a loading. For the components’ meaning, we can take a look at the loading matrix:

Loading Matrix

The column of the matrix represents principal components (Prin1, Prin2, Prin3) and the row represents original variables (A, B, C). We can observe that every original variables have high loadings in respect to Prin1 which signifies that Prin1 is our most significant principal component. Note that the matrix also shows the correlations between the U, V, W variables (the original variables before standardization) and the components since standardization leaves correlations unchanged.

What about Prin2? Prin2 has low loading on B and high loading on C. Therefore, Prin2 is a contrast component. Contrast component is basically a component that has an inverse relationship between original variables; data points that score high on Prin2 have high C and low B or data points that score low on Prin2 have low C and high B values.

Prin3 doesn’t have any high values — Prin3 would likely be dropped in analysis.

Another point to note is that the columns of the loading matrix and the principal components coefficient are proportional. That is,

Relationship between loading matrix and PC coefficients

Notice when we square those constants of proportionality, the numbers are precisely the squared lengths of PCs (Prin1, Prin2, Prin3) as well as the variances of the standardized variables (A, B, C). So the loading values are proportional to the component coefficients which denotes that the coefficients may also be used to interpret the meaning of the components as long as it is recognized that the coefficients are not the actual correlations. To sum it up, we can take either of correlation or component coefficients for interpretation.

Principal Components & Eigenvectors

It is worth noting that the principal components are actually the eigenvectors of the correlation matrix of the original data (U,V,W). In other words,

Principal Components = eigenvectors with largest eigenvalues

And the squared lengths of the PC axes are actually the eigenvalues of the correlation matrix of the original data (U,V,W). The fact that eigenvectors and eigenvalues are easy for computers to calculate makes principal components an eminently feasible technique in working with data.

To visualize the relationship between correlations and component coefficients, take a look at the example that professor Sager mentioned in class. Given certain companies’ data (Private Equity, Profit, Growth), we used PCA on the data and below is my messy notes on the results from PCA:

My messy notes on the results of PCA

Okay, let’s dig the notes a bit because I know it’s a messy note! First, let’s see the loading matrix. We can see that Prin1 has the highest correlation scores (or loadings). Now see the component coefficient matrix. Prin1 also has the highest coefficients (or eigenvectors). The rest principal components have same behaviors. This indicates that both loading matrix and component matrix are proportional, therefore orthonormal.

For the coefficient matrix, the sum of square of each row and the sum of square of each column result one. This is due to the fact that it is orthonormal. On the other hand, the sum of square of each of the loading matrix’s column represents eigenvalues. Here, eigenvalues are same as the variances for each principal components.

Interestingly enough, when we take ratio of loadings and coefficients (loadings/eigenvectors) across the variables and one principal component, we get the same ratios for each principal component. Moreover, when we square those numbers, we get the same eigenvalues from the eigenvalues in the loading matrix (e.g. 1.48² = 2.20)! It’s not over yet! When we add all the resulted eigenvalues, we get the total variance which is same as the number of variables (in this case 3) and also same as the squared length of each principal component. You can see that these values in the matrices are all somehow related!

To prove the relationship between loading matrix and component coefficient matrix mathematically, we can use basic regression system. For example, say Private Equity variable (PE) is our y variable and Prin1 is our x variable. Then we can get the coefficient value of PE with loading of PE using property of beta:

Computing the coefficient of PE with the loading

0.6087 is indeed very close to the coefficient of Prin1 in respect to PE!

Furthermore, we can also take interpretation with principal component score data. Below is the PC score data which are computed from eigenvectors and standardized variables:

Principal Component Scores for each PCs

A principal component can be seen as a line with minimum and maximum. And the scores above represent how far each data away from each principal component’s average. For instance, -0.7830 of Prin1 means that the data of this value (Exxon) is 0.7830 away from the average of Prin1 in negative direction. So if we find the minimum PC score of Prin1, we can find which company is not doing well. In Prin1 case, the lowest ones are companies called Unisys and Texaco, which are definitely not in good shape. Below picture will help to understand of the minimum and maximum property of a principal component better:

Can PCA really substitute all data? PCA vs all-variable regression

To revisit the main two aspects to the PCA rotation,

Orthonormality: preserves the geometry of the original data in the sense that the distances and angles between all points are the same after rotation as before (M*M^T = Identity matrix)

Variance Maximization: new dimension will be selected if it carries the most variance.

If PCA is selected, the new dimension will be uncorrelated (because it’s orthonormal, and will highlight the important new dimensions that carry the most variance and also highlight the unimportant new dimensions that carry minimal variances.

PCA can also naturally prevent multicollinearity since variables in PCA are uncorrelated.

To find the answer of whether PCA can really substitute all data like all-variable regression, we can run two regression models, one with all the variables and one with chosen variables using PCA. Interestingly enough, both the models have the same R² and RMSE!

However, there’s one thing that we need to cautious about using PCA. PCA doesn’t even care about y variable when we do PCA. It only uses x variables. In this reason, we could have a principal component with low score as a significant principal component since whenever we do regression on the chosen variables from PCA, we do take an account of y variable. In that case, we need to go back to the PCA part and reinterpret the significant principal component from the regression.

--

--