{"id":1122,"date":"2019-11-08T17:16:05","date_gmt":"2019-11-09T00:16:05","guid":{"rendered":"http:\/\/www.zhuoyao.net\/?p=1122"},"modified":"2019-11-08T17:16:05","modified_gmt":"2019-11-09T00:16:05","slug":"principal-component-methods-in-r-practical-guide","status":"publish","type":"post","link":"https:\/\/zhuoyao.net\/index.php\/2019\/11\/08\/principal-component-methods-in-r-practical-guide\/","title":{"rendered":"Principal Component Methods in R: Practical Guide"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\">\nPCA &#8211; Principal Component Analysis Essentials\n\n\n<\/h2>\n\n\n\n<p><a href=\"http:\/\/www.sthda.com\/english\/user\/profile\/1\">&nbsp;kassambara&nbsp;<\/a>|&nbsp;\n&nbsp;23\/09\/2017&nbsp;|\n&nbsp;&nbsp;169015\n&nbsp;|&nbsp;<a href=\"http:\/\/www.sthda.com\/english\/articles\/31-principal-component-methods-in-r-practical-guide\/112-pca-principal-component-analysis-essentials\/#comments-list\">&nbsp;Comments (37)<\/a>\n&nbsp;|&nbsp;&nbsp;<a href=\"http:\/\/www.sthda.com\/english\/articles\/31-principal-component-methods-in-r-practical-guide\/\">Principal Component Methods in R: Practical Guide<\/a>\n&nbsp;|&nbsp; \n<a href=\"http:\/\/www.sthda.com\/english\/articles\/tag\/multivariate-analysis\/\">Multivariate Analysis<\/a><\/p>\n\n\n\n<p><strong>Principal component analysis<\/strong> (<strong>PCA<\/strong>) \nallows us to summarize and to visualize the information in a data set \ncontaining individuals\/observations described by multiple \ninter-correlated quantitative variables. Each variable could be \nconsidered as a different dimension. If you have more than 3 variables \nin your data sets, it could be very difficult to visualize a \nmulti-dimensional hyperspace.<\/p>\n\n\n\n<p>Principal component analysis is used to extract the important \ninformation from a multivariate data table and to express this \ninformation as a set of few new variables called <strong>principal components<\/strong>.\n These new variables correspond to a linear combination of the \noriginals. The number of principal components is less than or equal to \nthe number of original variables.<\/p>\n\n\n\n<p>The information in a given data set corresponds to the <code>total variation<\/code>\n it contains. The goal of PCA is to identify directions (or principal \ncomponents) along which the variation in the data is maximal.<\/p>\n\n\n\n<p>In other words, PCA reduces the dimensionality of a multivariate data\n to two or three principal components, that can be visualized \ngraphically, with minimal loss of information.<\/p>\n\n\n\n<p>\nIn this chapter, we describe the basic idea of PCA and, demonstrate how \nto compute and visualize PCA using R software. Additionally, we\u2019ll show \nhow to reveal the most important variables that explain the variations \nin a data set.\n<br><\/p>\n\n\n\n<p>Contents:<br><\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><a href=\"http:\/\/www.sthda.com\/english\/articles\/31-principal-component-methods-in-r-practical-guide\/112-pca-principal-component-analysis-essentials\/#basics\">Basics<\/a><\/li><li><a href=\"http:\/\/www.sthda.com\/english\/articles\/31-principal-component-methods-in-r-practical-guide\/112-pca-principal-component-analysis-essentials\/#computation\">Computation<\/a><ul><li><a href=\"http:\/\/www.sthda.com\/english\/articles\/31-principal-component-methods-in-r-practical-guide\/112-pca-principal-component-analysis-essentials\/#r-packages\">R packages<\/a><\/li><li><a href=\"http:\/\/www.sthda.com\/english\/articles\/31-principal-component-methods-in-r-practical-guide\/112-pca-principal-component-analysis-essentials\/#pca-data-format\">Data format<\/a><\/li><li><a href=\"http:\/\/www.sthda.com\/english\/articles\/31-principal-component-methods-in-r-practical-guide\/112-pca-principal-component-analysis-essentials\/#data-standardization\">Data standardization<\/a><\/li><li><a href=\"http:\/\/www.sthda.com\/english\/articles\/31-principal-component-methods-in-r-practical-guide\/112-pca-principal-component-analysis-essentials\/#r-code\">R code<\/a><\/li><\/ul><\/li><li><a href=\"http:\/\/www.sthda.com\/english\/articles\/31-principal-component-methods-in-r-practical-guide\/112-pca-principal-component-analysis-essentials\/#visualization-and-interpretation\">Visualization and Interpretation<\/a><ul><li><a href=\"http:\/\/www.sthda.com\/english\/articles\/31-principal-component-methods-in-r-practical-guide\/112-pca-principal-component-analysis-essentials\/#eigenvalues-variances\">Eigenvalues \/ Variances<\/a><\/li><li><a href=\"http:\/\/www.sthda.com\/english\/articles\/31-principal-component-methods-in-r-practical-guide\/112-pca-principal-component-analysis-essentials\/#graph-of-variables\">Graph of variables<\/a><\/li><li><a href=\"http:\/\/www.sthda.com\/english\/articles\/31-principal-component-methods-in-r-practical-guide\/112-pca-principal-component-analysis-essentials\/#dimension-description\">Dimension description<\/a><\/li><li><a href=\"http:\/\/www.sthda.com\/english\/articles\/31-principal-component-methods-in-r-practical-guide\/112-pca-principal-component-analysis-essentials\/#graph-of-individuals\">Graph of individuals<\/a><\/li><li><a href=\"http:\/\/www.sthda.com\/english\/articles\/31-principal-component-methods-in-r-practical-guide\/112-pca-principal-component-analysis-essentials\/#graph-customization\">Graph customization<\/a><\/li><li><a href=\"http:\/\/www.sthda.com\/english\/articles\/31-principal-component-methods-in-r-practical-guide\/112-pca-principal-component-analysis-essentials\/#biplot\">Biplot<\/a><\/li><\/ul><\/li><li><a href=\"http:\/\/www.sthda.com\/english\/articles\/31-principal-component-methods-in-r-practical-guide\/112-pca-principal-component-analysis-essentials\/#supplementary-elements\">Supplementary elements<\/a><ul><li><a href=\"http:\/\/www.sthda.com\/english\/articles\/31-principal-component-methods-in-r-practical-guide\/112-pca-principal-component-analysis-essentials\/#definition-and-types\">Definition and types<\/a><\/li><li><a href=\"http:\/\/www.sthda.com\/english\/articles\/31-principal-component-methods-in-r-practical-guide\/112-pca-principal-component-analysis-essentials\/#specification-in-pca\">Specification in PCA<\/a><\/li><li><a href=\"http:\/\/www.sthda.com\/english\/articles\/31-principal-component-methods-in-r-practical-guide\/112-pca-principal-component-analysis-essentials\/#quantitative-variables\">Quantitative variables<\/a><\/li><li><a href=\"http:\/\/www.sthda.com\/english\/articles\/31-principal-component-methods-in-r-practical-guide\/112-pca-principal-component-analysis-essentials\/#individuals\">Individuals<\/a><\/li><li><a href=\"http:\/\/www.sthda.com\/english\/articles\/31-principal-component-methods-in-r-practical-guide\/112-pca-principal-component-analysis-essentials\/#qualitative-variables\">Qualitative variables<\/a><\/li><\/ul><\/li><li><a href=\"http:\/\/www.sthda.com\/english\/articles\/31-principal-component-methods-in-r-practical-guide\/112-pca-principal-component-analysis-essentials\/#filtering-results\">Filtering results<\/a><\/li><li><a href=\"http:\/\/www.sthda.com\/english\/articles\/31-principal-component-methods-in-r-practical-guide\/112-pca-principal-component-analysis-essentials\/#exporting-pca-results\">Exporting results<\/a><ul><li><a href=\"http:\/\/www.sthda.com\/english\/articles\/31-principal-component-methods-in-r-practical-guide\/112-pca-principal-component-analysis-essentials\/#export-plots-to-pdfpng-files\">Export plots to PDF\/PNG files<\/a><\/li><li><a href=\"http:\/\/www.sthda.com\/english\/articles\/31-principal-component-methods-in-r-practical-guide\/112-pca-principal-component-analysis-essentials\/#export-results-to-txtcsv-files\">Export results to txt\/csv files<\/a><\/li><\/ul><\/li><li><a href=\"http:\/\/www.sthda.com\/english\/articles\/31-principal-component-methods-in-r-practical-guide\/112-pca-principal-component-analysis-essentials\/#summary\">Summary<\/a><\/li><li><a href=\"http:\/\/www.sthda.com\/english\/articles\/31-principal-component-methods-in-r-practical-guide\/112-pca-principal-component-analysis-essentials\/#further-reading\">Further reading<\/a><\/li><li><a href=\"http:\/\/www.sthda.com\/english\/articles\/31-principal-component-methods-in-r-practical-guide\/112-pca-principal-component-analysis-essentials\/#references\">References<\/a><\/li><\/ul>\n\n\n\n<p>The Book:<a href=\"http:\/\/www.sthda.com\/english\/web\/5-bookadvisor\/50-practical-guide-to-principal-component-methods-in-r\/\">\n          <br>\n      Practical Guide to Principal Component Methods in R\n      <\/a><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Basics<\/h2>\n\n\n\n<p>Understanding the details of PCA requires knowledge of linear \nalgebra. Here, we\u2019ll explain only the basics with simple graphical \nrepresentation of the data.<\/p>\n\n\n\n<p>In the Plot 1A below, the data are represented in the X-Y coordinate \nsystem. The dimension reduction is achieved by identifying the principal\n directions, called principal components, in which the data varies.<\/p>\n\n\n\n<p>PCA assumes that the directions with the largest variances are the most \u201cimportant\u201d (i.e, the most principal).<\/p>\n\n\n\n<p>In the figure below, the <code>PC1 axis<\/code> is the <code>first principal direction<\/code> along which the samples show the largest variation. The <code>PC2 axis<\/code> is the <code>second most important direction<\/code> and it is <code>orthogonal<\/code> to the PC1 axis.<\/p>\n\n\n\n<p>The dimensionality of our two-dimensional data can be reduced to a \nsingle dimension by projecting each sample onto the first principal \ncomponent (Plot 1B)<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"http:\/\/www.sthda.com\/english\/sthda-upload\/figures\/principal-component-methods\/006-principal-component-analysis-scatter-plot-data-mining-1.png\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"http:\/\/www.sthda.com\/english\/sthda-upload\/figures\/principal-component-methods\/006-principal-component-analysis-scatter-plot-data-mining-2.png\" alt=\"\"\/><\/figure>\n\n\n\n<p>Technically speaking, the amount of variance retained by each principal component is measured by the so-called <strong>eigenvalue<\/strong>.<\/p>\n\n\n\n<p>Note that, the PCA method is particularly useful when the variables \nwithin the data set are highly correlated. Correlation indicates that \nthere is redundancy in the data. Due to this redundancy, PCA can be used\n to reduce the original variables into a smaller number of new variables\n ( = <strong>principal components<\/strong>) explaining most of the variance in the original variables.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"http:\/\/www.sthda.com\/english\/sthda-upload\/figures\/principal-component-methods\/006-principal-component-analysis-unnamed-chunk-3-1.png\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"http:\/\/www.sthda.com\/english\/sthda-upload\/figures\/principal-component-methods\/006-principal-component-analysis-unnamed-chunk-3-2.png\" alt=\"\"\/><\/figure>\n\n\n\n<p>\nTaken together, the main purpose of principal component analysis is to:\n<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>\nidentify hidden pattern in a data set,\n<\/li><li>\nreduce the dimensionnality of the data by removing the noise and redundancy in the data,\n<\/li><li>\nidentify correlated variables\n<\/li><\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Computation<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">R packages<\/h3>\n\n\n\n<p>Several functions from different packages are available in the <em>R software<\/em> for computing PCA:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><em>prcomp<\/em>() and <em>princomp<\/em>() [built-in R <em>stats<\/em> package],<\/li><li><em>PCA<\/em>() [<em>FactoMineR<\/em> package],<\/li><li><em>dudi.pca<\/em>() [<em>ade4<\/em> package],<\/li><li>and <em>epPCA<\/em>() [<em>ExPosition<\/em> package]<\/li><\/ul>\n\n\n\n<p>No matter what function you decide to use, you can easily extract and\n visualize the results of PCA using R functions provided in the <code>factoextra<\/code> R package.<\/p>\n\n\n\n<p>\nHere, we\u2019ll use the two packages FactoMineR (for the analysis) and factoextra (for ggplot2-based visualization).\n<\/p>\n\n\n\n<p>Install the two packages as follow:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>install.packages(c(\"FactoMineR\", \"factoextra\"))<\/code><\/pre>\n\n\n\n<p>Load them in R, by typing this:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>library(\"FactoMineR\")\nlibrary(\"factoextra\")<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Data format<\/h3>\n\n\n\n<p>We\u2019ll use the demo data sets <code>decathlon2<\/code> from the <em>factoextra<\/em> package:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>data(decathlon2)\n# head(decathlon2)<\/code><\/pre>\n\n\n\n<p>As illustrated in Figure 3.1, the data used here describes athletes\u2019 \nperformance during two sporting events (Desctar and OlympicG). It \ncontains 27 individuals (athletes) described by 13 variables.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"http:\/\/www.sthda.com\/english\/sthda-upload\/images\/principal-component-methods\/pca-decathlon-300dpi.png\" alt=\"Principal component analysis data format\"\/><\/figure>\n\n\n\n<p>\nNote that, only some of these individuals and variables will be used to \nperform the principal component analysis. The coordinates of the \nremaining individuals and variables on the factor map will be predicted \nafter the PCA.\n<\/p>\n\n\n\n<p>In PCA terminology, our data contains :<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>\n<em>Active individuals<\/em> (in light blue, rows 1:23) : Individuals that are used during the principal component analysis.\n<\/li><li>\n<em>Supplementary individuals<\/em> (in dark blue, rows 24:27) : The \ncoordinates of these individuals will be predicted using the PCA \ninformation and parameters obtained with active individuals\/variables\n<\/li><li>\n<em>Active variables<\/em> (in pink, columns 1:10) : Variables that are used for the principal component analysis.\n<\/li><li>\n<em>Supplementary variables<\/em>: As supplementary individuals, the coordinates of these variables will be predicted also. These can be:\n<ul><li>\n<em>Supplementary continuous variables<\/em> (red): Columns 11 and 12 corresponding respectively to the rank and the points of athletes.\n<\/li><li>\n<em>Supplementary qualitative variables<\/em> (green): Column 13 \ncorresponding to the two athlete-tic meetings (2004 Olympic Game or 2004\n Decastar). This is a categorical (or factor) variable factor. It can be\n used to color individuals by groups.\n<\/li><\/ul>\n<\/li><\/ul>\n\n\n\n<p>We start by subsetting active individuals and active variables for the principal component analysis:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>decathlon2.active &lt;- decathlon2[1:23, 1:10]\nhead(decathlon2.active[, 1:6], 4)<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>##         X100m Long.jump Shot.put High.jump X400m X110m.hurdle\n## SEBRLE   11.0      7.58     14.8      2.07  49.8         14.7\n## CLAY     10.8      7.40     14.3      1.86  49.4         14.1\n## BERNARD  11.0      7.23     14.2      1.92  48.9         15.0\n## YURKOV   11.3      7.09     15.2      2.10  50.4         15.3<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Data standardization<\/h3>\n\n\n\n<p>In principal component analysis, variables are often scaled \n(i.e.&nbsp;standardized). This is particularly recommended when variables are\n measured in different scales (e.g: kilograms, kilometers, centimeters, \n\u2026); otherwise, the PCA outputs obtained will be severely affected. <\/p>\n\n\n\n<p>The goal is to make the variables comparable. Generally variables are\n scaled to have i) standard deviation one and ii) mean zero.<\/p>\n\n\n\n<p>The standardization of data is an approach widely used in the context\n of gene expression data analysis before PCA and clustering analysis. We\n might also want to scale the data when the mean and\/or the standard \ndeviation of variables are largely different.<\/p>\n\n\n\n<p>When scaling variables, the data can be transformed as follow:<em>x<\/em><em>i<\/em>\u2212<em>m<\/em><em>e<\/em><em>a<\/em><em>n<\/em>(<em>x<\/em>)<em>s<\/em><em>d<\/em>(<em>x<\/em>)<\/p>\n\n\n\n<p>Where <em>m<\/em><em>e<\/em><em>a<\/em><em>n<\/em>(<em>x<\/em>) is the mean of x values, and <em>s<\/em><em>d<\/em>(<em>x<\/em>)<\/p>\n\n\n\n<p> is the standard deviation (SD).<\/p>\n\n\n\n<p>The R base function `scale() can be used to standardize the data. It \ntakes a numeric matrix as an input and performs the scaling on the \ncolumns.<\/p>\n\n\n\n<p>\nNote that, by default, the function <strong>PCA<\/strong>() [in <em>FactoMineR<\/em>], standardizes the data automatically during the PCA; so you don\u2019t need do this transformation before the PCA.\n<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">R code<\/h3>\n\n\n\n<p>The function <em>PCA<\/em>() [<em>FactoMineR<\/em> package] can be used. A simplified format is :<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>PCA(X, scale.unit = TRUE, ncp = 5, graph = TRUE)<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\"><li><code>X<\/code>: a data frame. Rows are individuals and columns are numeric variables<\/li><li><code>scale.unit<\/code>: a logical value. If <em>TRUE<\/em>,\n the data are scaled to unit variance before the analysis. This \nstandardization to the same scale avoids some variables to become \ndominant just because of their large measurement units. It makes \nvariable comparable.<\/li><li><code>ncp<\/code>: number of dimensions kept in the final results.<\/li><li><code>graph<\/code>: a logical value. If TRUE a graph is displayed.<\/li><\/ul>\n\n\n\n<p>The R code below, computes principal component analysis on the active individuals\/variables:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>library(\"FactoMineR\")\nres.pca &lt;- PCA(decathlon2.active, graph = FALSE)<\/code><\/pre>\n\n\n\n<p>The output of the function <em>PCA<\/em>() is a list, including the following components :<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>print(res.pca)<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>## **Results for the Principal Component Analysis (PCA)**\n## The analysis was performed on 23 individuals, described by 10 variables\n## *The results are available in the following objects:\n## \n##    name               description                          \n## 1  \"$eig\"             \"eigenvalues\"                        \n## 2  \"$var\"             \"results for the variables\"          \n## 3  \"$var$coord\"       \"coord. for the variables\"           \n## 4  \"$var$cor\"         \"correlations variables - dimensions\"\n## 5  \"$var$cos2\"        \"cos2 for the variables\"             \n## 6  \"$var$contrib\"     \"contributions of the variables\"     \n## 7  \"$ind\"             \"results for the individuals\"        \n## 8  \"$ind$coord\"       \"coord. for the individuals\"         \n## 9  \"$ind$cos2\"        \"cos2 for the individuals\"           \n## 10 \"$ind$contrib\"     \"contributions of the individuals\"   \n## 11 \"$call\"            \"summary statistics\"                 \n## 12 \"$call$centre\"     \"mean of the variables\"              \n## 13 \"$call$ecart.type\" \"standard error of the variables\"    \n## 14 \"$call$row.w\"      \"weights for the individuals\"        \n## 15 \"$call$col.w\"      \"weights for the variables\"<\/code><\/pre>\n\n\n\n<p>\nThe object that is created using the function <em>PCA<\/em>() contains many information found in many different lists and matrices. These values are described in the next section.\n<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Visualization and Interpretation<\/h2>\n\n\n\n<p>We\u2019ll use the <em>factoextra<\/em> R package to help in the \ninterpretation of PCA. No matter what function you decide to use \n<\/p>\n\n\n<p>[stats::prcomp(), FactoMiner::PCA(), ade4::dudi.pca(),<br \/>\nExPosition::epPCA()]<\/p>\n\n\n\n<p>, you can easily extract and visualize the results \nof PCA using R functions provided in the <em>factoextra<\/em> R package.\n<\/p>\n\n\n\n<p>These functions include:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><code>get_eigenvalue(res.pca)<\/code>: Extract the eigenvalues\/variances of principal components<\/li><li><code>fviz_eig(res.pca)<\/code>: Visualize the eigenvalues<\/li><li><code>get_pca_ind(res.pca)<\/code>, <code>get_pca_var(res.pca)<\/code>: Extract the results for individuals and variables, respectively.<\/li><li><code>fviz_pca_ind(res.pca)<\/code>, <code>fviz_pca_var(res.pca)<\/code>: Visualize the results individuals and variables, respectively.<\/li><li><code>fviz_pca_biplot(res.pca)<\/code>: Make a biplot of individuals and variables.<\/li><\/ul>\n\n\n\n<p>In the next sections, we\u2019ll illustrate each of these functions.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Eigenvalues \/ Variances<\/h3>\n\n\n\n<p>As described in previous sections, the <em>eigenvalues<\/em> measure the amount of variation retained by each principal component. <em>Eigenvalues<\/em>\n are large for the first PCs and small for the subsequent PCs. That is, \nthe first PCs corresponds to the directions with the maximum amount of \nvariation in the data set. <\/p>\n\n\n\n<p>We examine the eigenvalues to determine the number of principal \ncomponents to be considered. The eigenvalues and the proportion of \nvariances (i.e., information) retained by the principal components (PCs)\n can be extracted using the function <em>get_eigenvalue<\/em>() [<em>factoextra<\/em> package].<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>library(\"factoextra\")\neig.val &lt;- get_eigenvalue(res.pca)\neig.val<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>##        eigenvalue variance.percent cumulative.variance.percent\n## Dim.1       4.124            41.24                        41.2\n## Dim.2       1.839            18.39                        59.6\n## Dim.3       1.239            12.39                        72.0\n## Dim.4       0.819             8.19                        80.2\n## Dim.5       0.702             7.02                        87.2\n## Dim.6       0.423             4.23                        91.5\n## Dim.7       0.303             3.03                        94.5\n## Dim.8       0.274             2.74                        97.2\n## Dim.9       0.155             1.55                        98.8\n## Dim.10      0.122             1.22                       100.0<\/code><\/pre>\n\n\n\n<p>The sum of all the eigenvalues give a total variance of 10.<\/p>\n\n\n\n<p>The proportion of variation explained by each eigenvalue is given in \nthe second column. For example, 4.124 divided by 10 equals 0.4124, or, \nabout 41.24% of the variation is explained by this first eigenvalue. The\n cumulative percentage explained is obtained by adding the successive \nproportions of variation explained to obtain the running total. For \ninstance, 41.242% plus 18.385% equals 59.627%, and so forth. Therefore, \nabout 59.627% of the variation is explained by the first two eigenvalues\n together.<\/p>\n\n\n\n<p>Eigenvalues can be used to determine the number of principal components to retain after PCA (Kaiser 1961): <\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>\n\nAn <em>eigenvalue<\/em> &gt; 1 indicates that PCs account for more \nvariance than accounted by one of the original variables in standardized\n data. This is commonly used as a cutoff point for which PCs are \nretained. This holds true only when the data are standardized.\n\n<\/li><li>\n\nYou can also limit the number of component to that number that accounts \nfor a certain fraction of the total variance. For example, if you are \nsatisfied with 70% of the total variance explained then use the number \nof components to achieve that.\n\n<\/li><\/ul>\n\n\n\n<p>Unfortunately, there is no well-accepted objective way to decide how \nmany principal components are enough. This will depend on the specific \nfield of application and the specific data set. In practice, we tend to \nlook at the first few principal components in order to find interesting \npatterns in the data.<\/p>\n\n\n\n<p>In our analysis, the first three principal components explain 72% of the variation. This is an acceptably large percentage.<\/p>\n\n\n\n<p>An alternative method to determine the number of principal components\n is to look at a Scree Plot, which is the plot of eigenvalues ordered \nfrom largest to the smallest. The number of component is determined at \nthe point, beyond which the remaining eigenvalues are all relatively \nsmall and of comparable size (Jollife 2002, Peres-Neto, Jackson, and Somers (2005)).<\/p>\n\n\n\n<p>The scree plot can be produced using the function <code>fviz_eig()<\/code> or <code>fviz_screeplot()<\/code> [<em>factoextra<\/em> package]. <\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>fviz_eig(res.pca, addlabels = TRUE, ylim = c(0, 50))<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"http:\/\/www.sthda.com\/english\/sthda-upload\/figures\/principal-component-methods\/006-principal-component-analysis-eigenvalue-screeplot-1.png\" alt=\"\"\/><\/figure>\n\n\n\n<p>\nFrom the plot above, we might want to stop at the fifth principal \ncomponent. 87% of the information (variances) contained in the data are \nretained by the first five principal components.\n<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Graph of variables<\/h3>\n\n\n\n<h4 class=\"wp-block-heading\">Results<\/h4>\n\n\n\n<p>A simple method to extract the results, for variables, from a PCA output is to use the function <code>get_pca_var()<\/code> [<em>factoextra<\/em>\n package]. This function provides a list of matrices containing all the \nresults for the active variables (coordinates, correlation between \nvariables and axes, squared cosine and contributions)<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>var &lt;- get_pca_var(res.pca)\nvar<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>## Principal Component Analysis Results for variables\n##  ===================================================\n##   Name       Description                                    \n## 1 \"$coord\"   \"Coordinates for the variables\"                \n## 2 \"$cor\"     \"Correlations between variables and dimensions\"\n## 3 \"$cos2\"    \"Cos2 for the variables\"                       \n## 4 \"$contrib\" \"contributions of the variables\"<\/code><\/pre>\n\n\n\n<p>The components of the <code>get_pca_var()<\/code> can be used in the plot of variables as follow:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><code>var$coord<\/code>: coordinates of variables to create a scatter plot<\/li><li><code>var$cos2<\/code>: represents the quality \nof representation for variables on the factor map. It\u2019s calculated as \nthe squared coordinates: var.cos2 = var.coord * var.coord.<\/li><li><code>var$contrib<\/code>: contains the \ncontributions (in percentage) of the variables to the principal \ncomponents. The contribution of a variable (var) to a given principal \ncomponent is (in percentage) : (var.cos2 * 100) \/ (total cos2 of the \ncomponent).<\/li><\/ul>\n\n\n\n<p>\nNote that, it\u2019s possible to plot variables and to color them according \nto either i) their quality on the factor map (cos2) or ii) their \ncontribution values to the principal components (contrib).\n<\/p>\n\n\n\n<p>The different components can be accessed as follow:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Coordinates\nhead(var$coord)\n# Cos2: quality on the factore map\nhead(var$cos2)\n# Contributions to the principal components\nhead(var$contrib)<\/code><\/pre>\n\n\n\n<p>In this section, we describe how to visualize variables and draw \nconclusions about their correlations. Next, we highlight variables \naccording to either i) their quality of representation on the factor map\n or ii) their contributions to the principal components.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Correlation circle<\/h4>\n\n\n\n<p>The correlation between a variable and a principal component (PC) is \nused as the coordinates of the variable on the PC. The representation of\n variables differs from the plot of the observations: The observations \nare represented by their projections, but the variables are represented \nby their correlations (Abdi and Williams 2010). <\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Coordinates of variables\nhead(var$coord, 4)<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>##            Dim.1   Dim.2  Dim.3   Dim.4  Dim.5\n## X100m     -0.851 -0.1794  0.302  0.0336 -0.194\n## Long.jump  0.794  0.2809 -0.191 -0.1154  0.233\n## Shot.put   0.734  0.0854  0.518  0.1285 -0.249\n## High.jump  0.610 -0.4652  0.330  0.1446  0.403<\/code><\/pre>\n\n\n\n<p>To plot variables, type this:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>fviz_pca_var(res.pca, col.var = \"black\")<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"http:\/\/www.sthda.com\/english\/sthda-upload\/figures\/principal-component-methods\/006-principal-component-analysis-variables-correlation-circle-1.png\" alt=\"\"\/><\/figure>\n\n\n\n<p>The plot above is also known as variable correlation plots. It shows \nthe relationships between all variables. It can be interpreted as \nfollow:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Positively correlated variables are grouped together.<\/li><li>Negatively correlated variables are positioned on opposite sides of the plot origin (opposed quadrants).\n<\/li><li>The distance between variables and the origin measures the quality \nof the variables on the factor map. Variables that are away from the \norigin are well represented on the factor map.<\/li><\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">Quality of representation<\/h4>\n\n\n\n<p>The quality of representation of the variables on factor map is called <strong>cos2<\/strong> (square cosine, squared coordinates) . You can access to the cos2 as follow:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>head(var$cos2, 4)<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>##           Dim.1   Dim.2  Dim.3   Dim.4  Dim.5\n## X100m     0.724 0.03218 0.0909 0.00113 0.0378\n## Long.jump 0.631 0.07888 0.0363 0.01331 0.0544\n## Shot.put  0.539 0.00729 0.2679 0.01650 0.0619\n## High.jump 0.372 0.21642 0.1090 0.02089 0.1622<\/code><\/pre>\n\n\n\n<p>You can visualize the cos2 of variables on all the dimensions using the corrplot package:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>library(\"corrplot\")\ncorrplot(var$cos2, is.corr=FALSE)<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"http:\/\/www.sthda.com\/english\/sthda-upload\/figures\/principal-component-methods\/006-principal-component-analysis-pca-variable-cos2-corrplot-1.png\" alt=\"\"\/><\/figure>\n\n\n\n<p>It\u2019s also possible to create a bar plot of variables cos2 using the function <code>fviz_cos2()<\/code>[in factoextra]:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Total cos2 of variables on Dim.1 and Dim.2\nfviz_cos2(res.pca, choice = \"var\", axes = 1:2)<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"http:\/\/www.sthda.com\/english\/sthda-upload\/figures\/principal-component-methods\/006-principal-component-analysis-pca-variable-cos2-fviz_cos2-1.png\" alt=\"\"\/><\/figure>\n\n\n\n<p>\nNote that,\n<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>\n\nA high cos2 indicates a good representation of the variable on the \nprincipal component. In this case the variable is positioned close to \nthe circumference of the correlation circle.\n\n<\/li><li>\n\nA low cos2 indicates that the variable is not perfectly represented by \nthe PCs. In this case the variable is close to the center of the circle.\n\n<\/li><\/ul>\n\n\n\n<p>For a given variable, the sum of the cos2 on all the principal components is equal to one.<\/p>\n\n\n\n<p>If a variable is perfectly represented by only two principal \ncomponents (Dim.1 &amp; Dim.2), the sum of the cos2 on these two PCs is \nequal to one. In this case the variables will be positioned on the \ncircle of correlations.<\/p>\n\n\n\n<p>For some of the variables, more than 2 components might be required \nto perfectly represent the data. In this case the variables are \npositioned inside the circle of correlations.<\/p>\n\n\n\n<p>In summary:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>\nThe cos2 values are used to estimate the quality of the representation\n<\/li><li>\nThe closer a variable is to the circle of correlations, the better its \nrepresentation on the factor map (and the more important it is to \ninterpret these components)\n<\/li><li>\nVariables that are closed to the center of the plot are less important for the first components.\n<\/li><\/ul>\n\n\n\n<p>It\u2019s possible to color variables by their cos2 values using the argument <code>col.var = \"cos2\"<\/code>. This produces a gradient colors. In this case, the argument <code>gradient.cols<\/code> can be used to provide a custom color. For instance, <code>gradient.cols = c(\"white\", \"blue\", \"red\")<\/code> means that:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>variables with low cos2 values will be colored in \u201cwhite\u201d<\/li><li>variables with mid cos2 values will be colored in \u201cblue\u201d<\/li><li>variables with high cos2 values will be colored in red<\/li><\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code># Color by cos2 values: quality on the factor map\nfviz_pca_var(res.pca, col.var = \"cos2\",\n             gradient.cols = c(\"#00AFBB\", \"#E7B800\", \"#FC4E07\"), \n             repel = TRUE # Avoid text overlapping\n             )<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"http:\/\/www.sthda.com\/english\/sthda-upload\/figures\/principal-component-methods\/006-principal-component-analysis-prcomp-correlation-circle-colors-cos2-1.png\" alt=\"\"\/><\/figure>\n\n\n\n<p>Note that, it\u2019s also possible to change the transparency of the variables according to their cos2 values using the option <code>alpha.var = \"cos2\"<\/code>. For example, type this:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Change the transparency by cos2 values\nfviz_pca_var(res.pca, alpha.var = \"cos2\")<\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\">Contributions of variables to PCs<\/h4>\n\n\n\n<p>The contributions of variables in accounting for the variability in a given principal component are expressed in percentage. <\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Variables that are correlated with PC1 (i.e., Dim.1) and PC2 (i.e., \nDim.2) are the most important in explaining the variability in the data \nset.\n<\/li><li>Variables that do not correlated with any PC or correlated with the \nlast dimensions are variables with low contribution and might be removed\n to simplify the overall analysis.<\/li><\/ul>\n\n\n\n<p>The contribution of variables can be extracted as follow :<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>head(var$contrib, 4)<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>##           Dim.1  Dim.2 Dim.3 Dim.4 Dim.5\n## X100m     17.54  1.751  7.34 0.138  5.39\n## Long.jump 15.29  4.290  2.93 1.625  7.75\n## Shot.put  13.06  0.397 21.62 2.014  8.82\n## High.jump  9.02 11.772  8.79 2.550 23.12<\/code><\/pre>\n\n\n\n<p>\nThe larger the value of the contribution, the more the variable contributes to the component.\n<\/p>\n\n\n\n<p>It\u2019s possible to use the function <code>corrplot()<\/code> [corrplot package] to highlight the most contributing variables for each dimension:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>library(\"corrplot\")\ncorrplot(var$contrib, is.corr=FALSE)    <\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"http:\/\/www.sthda.com\/english\/sthda-upload\/figures\/principal-component-methods\/006-principal-component-analysis-variable-contribution-corrplot-1.png\" alt=\"\"\/><\/figure>\n\n\n\n<p>The function <code>fviz_contrib()<\/code> \n<\/p>\n\n\n<p>[factoextra package]<\/p>\n\n\n\n<p> can be used to draw a bar plot of variable \ncontributions. If your data contains many variables, you can decide to \nshow only the top contributing variables. The R code below shows the top\n 10 variables contributing to the principal components:\n<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Contributions of variables to PC1\nfviz_contrib(res.pca, choice = \"var\", axes = 1, top = 10)\n# Contributions of variables to PC2\nfviz_contrib(res.pca, choice = \"var\", axes = 2, top = 10)<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"http:\/\/www.sthda.com\/english\/sthda-upload\/figures\/principal-component-methods\/006-principal-component-analysis-variable-contribution-bar-plot-1.png\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"http:\/\/www.sthda.com\/english\/sthda-upload\/figures\/principal-component-methods\/006-principal-component-analysis-variable-contribution-bar-plot-2.png\" alt=\"\"\/><\/figure>\n\n\n\n<p>The total contribution to PC1 and PC2 is obtained with the following R code:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>fviz_contrib(res.pca, choice = \"var\", axes = 1:2, top = 10)<\/code><\/pre>\n\n\n\n<p>The red dashed line on the graph above indicates the expected average\n contribution. If the contribution of the variables were uniform, the \nexpected value would be 1\/length(variables) = 1\/10 = 10%. For a given \ncomponent, a variable with a contribution larger than this cutoff could \nbe considered as important in contributing to the component.<\/p>\n\n\n\n<p>Note that, the total contribution of a given variable, on explaining \nthe variations retained by two principal components, say PC1 and PC2, is\n calculated as contrib = [(C1 * Eig1) + (C2 * Eig2)]\/(Eig1 + Eig2), \nwhere<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>C1 and C2 are the contributions of the variable on PC1 and PC2, respectively<\/li><li>Eig1 and Eig2 are the eigenvalues of PC1 and PC2, respectively. \nRecall that eigenvalues measure the amount of variation retained by each\n PC.<\/li><\/ul>\n\n\n\n<p>In this case, the expected average contribution (cutoff) is \ncalculated as follow: As mentioned above, if the contributions of the 10\n variables were uniform, the expected average contribution on a given PC\n would be 1\/10 = 10%. The expected average contribution of a variable \nfor PC1 and PC2 is : [(10* Eig1) + (10 * Eig2)]\/(Eig1 + Eig2)<\/p>\n\n\n\n<p>\nIt can be seen that the variables &#8211; X100m, Long.jump and Pole.vault &#8211; contribute the most to the dimensions 1 and 2.\n<\/p>\n\n\n\n<p>The most important (or, contributing) variables can be highlighted on the correlation plot as follow:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>fviz_pca_var(res.pca, col.var = \"contrib\",\n             gradient.cols = c(\"#00AFBB\", \"#E7B800\", \"#FC4E07\")\n             )<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"http:\/\/www.sthda.com\/english\/sthda-upload\/figures\/principal-component-methods\/006-principal-component-analysis-variable-contribution-1.png\" alt=\"\"\/><\/figure>\n\n\n\n<p>Note that, it\u2019s also possible to change the transparency of variables according to their contrib values using the option <code>alpha.var = \"contrib\"<\/code>. For example, type this:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Change the transparency by contrib values\nfviz_pca_var(res.pca, alpha.var = \"contrib\")<\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\">Color by a custom continuous variable<\/h4>\n\n\n\n<p>In the previous sections, we showed how to color variables by their \ncontributions and their cos2. Note that, it\u2019s possible to color \nvariables by any custom continuous variable. The coloring variable \nshould have the same length as the number of active variables in the PCA\n (here n = 10).<\/p>\n\n\n\n<p>For example, type this:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Create a random continuous variable of length 10\nset.seed(123)\nmy.cont.var &lt;- rnorm(10)\n# Color variables by the continuous variable\nfviz_pca_var(res.pca, col.var = my.cont.var,\n             gradient.cols = c(\"blue\", \"yellow\", \"red\"),\n             legend.title = \"Cont.Var\")<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"http:\/\/www.sthda.com\/english\/sthda-upload\/figures\/principal-component-methods\/006-principal-component-analysis-pca-color-variables-by-a-custom-continuous-variable-1.png\" alt=\"\"\/><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\">Color by groups<\/h4>\n\n\n\n<p>It\u2019s also possible to change the color of variables by groups defined by a qualitative\/categorical variable, also called <code>factor<\/code> in R terminology.<\/p>\n\n\n\n<p>As we don\u2019t have any grouping variable in our data sets for classifying variables, we\u2019ll create it.<\/p>\n\n\n\n<p>In the following demo example, we start by classifying the variables into 3 groups using the <code>kmeans<\/code> clustering algorithm. Next, we use the clusters returned by the kmeans algorithm to color variables.<\/p>\n\n\n\n<p>\nNote that, if you are interested in learning clustering, we previously \npublished a book named \u201cPractical Guide To Cluster Analysis in R\u201d (<a href=\"https:\/\/goo.gl\/DmJ5y5\">https:\/\/goo.gl\/DmJ5y5<\/a>).\n<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Create a grouping variable using kmeans\n# Create 3 groups of variables (centers = 3)\nset.seed(123)\nres.km &lt;- kmeans(var$coord, centers = 3, nstart = 25)\ngrp &lt;- as.factor(res.km$cluster)\n# Color variables by groups\nfviz_pca_var(res.pca, col.var = grp, \n             palette = c(\"#0073C2FF\", \"#EFC000FF\", \"#868686FF\"),\n             legend.title = \"Cluster\")<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"http:\/\/www.sthda.com\/english\/sthda-upload\/figures\/principal-component-methods\/006-principal-component-analysis-pca-color-variables-by-groups-1.png\" alt=\"\"\/><\/figure>\n\n\n\n<p>\nNote that, to change the color of groups the argument palette should be \nused. To change gradient colors, the argument gradient.cols should be \nused.\n<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Dimension description<\/h3>\n\n\n\n<p>In the section @ref(pca-variable-contributions), we described how to \nhighlight variables according to their contributions to the principal \ncomponents.<\/p>\n\n\n\n<p>Note also that, the function <code>dimdesc()<\/code>\n <\/p>\n\n\n<p>[in FactoMineR]<\/p>\n\n\n\n<p>, for dimension description, can be used to identify the\n most significantly associated variables with a given principal \ncomponent . It can be used as follow:\n<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>res.desc &lt;- dimdesc(res.pca, axes = c(1,2), proba = 0.05)\n# Description of dimension 1\nres.desc$Dim.1<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>## $quanti\n##              correlation  p.value\n## Long.jump          0.794 6.06e-06\n## Discus             0.743 4.84e-05\n## Shot.put           0.734 6.72e-05\n## High.jump          0.610 1.99e-03\n## Javeline           0.428 4.15e-02\n## X400m             -0.702 1.91e-04\n## X110m.hurdle      -0.764 2.20e-05\n## X100m             -0.851 2.73e-07<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code># Description of dimension 2\nres.desc$Dim.2<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>## $quanti\n##            correlation  p.value\n## Pole.vault       0.807 3.21e-06\n## X1500m           0.784 9.38e-06\n## High.jump       -0.465 2.53e-02<\/code><\/pre>\n\n\n\n<p>In the output above, <em>$quanti<\/em> means results for quantitative variables. Note that, variables are sorted by the p-value of the correlation.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Graph of individuals<\/h3>\n\n\n\n<h4 class=\"wp-block-heading\">Results<\/h4>\n\n\n\n<p>The results, for individuals can be extracted using the function <code>get_pca_ind()<\/code> [<em>factoextra<\/em> package]. Similarly to the <code>get_pca_var()<\/code>, the function <code>get_pca_ind()<\/code>\n provides a list of matrices containing all the results for the \nindividuals (coordinates, correlation between individuals and axes, \nsquared cosine and contributions)<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>ind &lt;- get_pca_ind(res.pca)\nind<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>## Principal Component Analysis Results for individuals\n##  ===================================================\n##   Name       Description                       \n## 1 \"$coord\"   \"Coordinates for the individuals\" \n## 2 \"$cos2\"    \"Cos2 for the individuals\"        \n## 3 \"$contrib\" \"contributions of the individuals\"<\/code><\/pre>\n\n\n\n<p>To get access to the different components, use this:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Coordinates of individuals\nhead(ind$coord)\n# Quality of individuals\nhead(ind$cos2)\n# Contributions of individuals\nhead(ind$contrib)<\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\">Plots: quality and contribution<\/h4>\n\n\n\n<p>The <code>fviz_pca_ind()<\/code> is used to produce the graph of individuals. To create a simple plot, type this:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>fviz_pca_ind(res.pca)<\/code><\/pre>\n\n\n\n<p>Like variables, it\u2019s also possible to color individuals by their cos2 values:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>fviz_pca_ind(res.pca, col.ind = \"cos2\", \n             gradient.cols = c(\"#00AFBB\", \"#E7B800\", \"#FC4E07\"),\n             repel = TRUE # Avoid text overlapping (slow if many points)\n             )<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"http:\/\/www.sthda.com\/english\/sthda-upload\/figures\/principal-component-methods\/006-principal-component-analysis-graph-individuals-cos2-1.png\" alt=\"\"\/><\/figure>\n\n\n\n<p>\nNote that, individuals that are similar are grouped together on the plot.\n<\/p>\n\n\n\n<p>You can also change the point size according the cos2 of the corresponding individuals:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>fviz_pca_ind(res.pca, pointsize = \"cos2\", \n             pointshape = 21, fill = \"#E7B800\",\n             repel = TRUE # Avoid text overlapping (slow if many points)\n             )<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"http:\/\/www.sthda.com\/english\/sthda-upload\/figures\/principal-component-methods\/006-principal-component-analysis-graph-individuals-point-size-by-cos2-1.png\" alt=\"\"\/><\/figure>\n\n\n\n<p>To change both point size and color by cos2, try this:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>fviz_pca_ind(res.pca, col.ind = \"cos2\", pointsize = \"cos2\",\n             gradient.cols = c(\"#00AFBB\", \"#E7B800\", \"#FC4E07\"),\n             repel = TRUE # Avoid text overlapping (slow if many points)\n             )<\/code><\/pre>\n\n\n\n<p>To create a bar plot of the quality of representation (cos2) of individuals on the factor map, you can use the function <code>fviz_cos2()<\/code> as previously described for variables:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>fviz_cos2(res.pca, choice = \"ind\")<\/code><\/pre>\n\n\n\n<p>To visualize the contribution of individuals to the first two principal components, type this:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Total contribution on PC1 and PC2\nfviz_contrib(res.pca, choice = \"ind\", axes = 1:2)<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"http:\/\/www.sthda.com\/english\/sthda-upload\/figures\/principal-component-methods\/006-principal-component-analysis-individuals-contribution-1.png\" alt=\"\"\/><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\">Color by a custom continuous variable<\/h4>\n\n\n\n<p>As for variables, individuals can be colored by any custom continuous variable by specifying the argument <code>col.ind<\/code>.<\/p>\n\n\n\n<p>For example, type this:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Create a random continuous variable of length 23,\n# Same length as the number of active individuals in the PCA\nset.seed(123)\nmy.cont.var &lt;- rnorm(23)\n# Color individuals by the continuous variable\nfviz_pca_ind(res.pca, col.ind = my.cont.var,\n             gradient.cols = c(\"blue\", \"yellow\", \"red\"),\n             legend.title = \"Cont.Var\")<\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\">Color by groups<\/h4>\n\n\n\n<p>Here, we describe how to color individuals by group. Additionally, we show how to add <em>concentration ellipses<\/em> and <em>confidence ellipses<\/em> by groups. For this, we\u2019ll use the iris data as demo data sets.<\/p>\n\n\n\n<p>Iris data sets look like this:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>head(iris, 3)<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species\n## 1          5.1         3.5          1.4         0.2  setosa\n## 2          4.9         3.0          1.4         0.2  setosa\n## 3          4.7         3.2          1.3         0.2  setosa<\/code><\/pre>\n\n\n\n<p>The column \u201cSpecies\u201d will be used as grouping variable. We start by computing principal component analysis as follow:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># The variable Species (index = 5) is removed\n# before PCA analysis\niris.pca &lt;- PCA(iris[,-5], graph = FALSE)<\/code><\/pre>\n\n\n\n<p>In the R code below: the argument <code>habillage<\/code> or <code>col.ind<\/code> can be used to specify the factor variable for coloring the individuals by groups.<\/p>\n\n\n\n<p>To add a concentration ellipse around each group, specify the argument <code>addEllipses = TRUE<\/code>. The argument <code>palette<\/code> can be used to change group colors.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>fviz_pca_ind(iris.pca,\n             geom.ind = \"point\", # show points only (nbut not \"text\")\n             col.ind = iris$Species, # color by groups\n             palette = c(\"#00AFBB\", \"#E7B800\", \"#FC4E07\"),\n             addEllipses = TRUE, # Concentration ellipses\n             legend.title = \"Groups\"\n             )<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"http:\/\/www.sthda.com\/english\/sthda-upload\/figures\/principal-component-methods\/006-principal-component-analysis-individuals-factor-map-color-by-groups-1.png\" alt=\"\"\/><\/figure>\n\n\n\n<p>\nTo remove the group mean point, specify the argument <em>mean.point = FALSE<\/em>.\n<\/p>\n\n\n\n<p>\nIf you want confidence ellipses instead of concentration ellipses, use <em>ellipse.type = \u201cconfidence\u201d<\/em>.\n<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Add confidence ellipses\nfviz_pca_ind(iris.pca, geom.ind = \"point\", col.ind = iris$Species, \n             palette = c(\"#00AFBB\", \"#E7B800\", \"#FC4E07\"),\n             addEllipses = TRUE, ellipse.type = \"confidence\",\n             legend.title = \"Groups\"\n             )<\/code><\/pre>\n\n\n\n<p>Note that, allowed values for palette include:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>\u201cgrey\u201d for grey color palettes;<\/li><li>brewer palettes e.g. \u201cRdBu\u201d, \u201cBlues\u201d, \u2026; To view all, type this in R: <code>RColorBrewer::display.brewer.all()<\/code>.<\/li><li>custom color palette e.g.&nbsp;c(\u201cblue\u201d, \u201cred\u201d);<\/li><li>and scientific journal palettes from <code>ggsci R package<\/code>, e.g.: \u201cnpg\u201d, \u201caaas\u201d, \u201clancet\u201d, \u201cjco\u201d, \u201cucscgb\u201d, \u201cuchicago\u201d, \u201csimpsons\u201d and \u201crickandmorty\u201d.<\/li><\/ul>\n\n\n\n<p>For example, to use the jco (journal of clinical oncology) color palette, type this:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>fviz_pca_ind(iris.pca,\n             label = \"none\", # hide individual labels\n             habillage = iris$Species, # color by groups\n             addEllipses = TRUE, # Concentration ellipses\n             palette = \"jco\"\n             )<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Graph customization<\/h3>\n\n\n\n<p>Note that, <code>fviz_pca_ind()<\/code> and <code>fviz_pca_var()<\/code> and related functions are wrapper around the core function <code>fviz()<\/code> [in <em>factoextra<\/em>]. fviz() is a wrapper around the function <code>ggscatter()<\/code>\n <\/p>\n\n\n<p>[in ggpubr]<\/p>\n\n\n\n<p>. Therefore, further arguments, to be passed to the function\n fviz() and ggscatter(), can be specified in fviz_pca_ind() and \nfviz_pca_var().\n<\/p>\n\n\n\n<p>Here, we present some of these additional arguments to customize the PCA graph of variables and individuals.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Dimensions<\/h4>\n\n\n\n<p>By default, variables\/individuals are represented on dimensions 1 and\n 2. If you want to visualize them on dimensions 2 and 3, for example, \nyou should specify the argument <code>axes = c(2, 3)<\/code>.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Variables on dimensions 2 and 3\nfviz_pca_var(res.pca, axes = c(2, 3))\n# Individuals on dimensions 2 and 3\nfviz_pca_ind(res.pca, axes = c(2, 3))<\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\">Plot elements: point, text, arrow<\/h4>\n\n\n\n<p>The argument <code>geom<\/code> (for geometry) and derivatives are used to specify the geometry elements or graphical elements to be used for plotting.<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li><code>geom.var<\/code>: a text specifying the \ngeometry to be used for plotting variables. Allowed values are the \ncombination of c(\u201cpoint\u201d, \u201carrow\u201d, \u201ctext\u201d).<\/li><\/ol>\n\n\n\n<ul class=\"wp-block-list\"><li>Use <code>geom.var = \"point\"<\/code>, to show only points;<\/li><li>Use <code>geom.var = \"text\"<\/code> to show only text labels;<\/li><li>Use <code>geom.var = c(\"point\", \"text\")<\/code> to show both points and text labels<\/li><li>Use <code>geom.var = c(\"arrow\", \"text\")<\/code> to show arrows and labels (default).<\/li><\/ul>\n\n\n\n<p>For example, type this:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Show variable points and text labels\nfviz_pca_var(res.pca, geom.var = c(\"point\", \"text\"))<\/code><\/pre>\n\n\n\n<ol class=\"wp-block-list\"><li><code>geom.ind<\/code>: a text specifying the geometry to be used for plotting individuals. Allowed values are the combination of c(\u201cpoint\u201d, \u201ctext\u201d).<\/li><\/ol>\n\n\n\n<ul class=\"wp-block-list\"><li>Use <code>geom.ind = \"point\"<\/code>, to show only points;<\/li><li>Use <code>geom.ind = \"text\"<\/code> to show only text labels;<\/li><li>Use <code>geom.ind = c(\"point\", \"text\")<\/code> to show both point and text labels (default)<\/li><\/ul>\n\n\n\n<p>For example, type this:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Show individuals text labels only\nfviz_pca_ind(res.pca, geom.ind =  \"text\")<\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\">Size and shape of plot elements<\/h4>\n\n\n\n<ol class=\"wp-block-list\"><li><code>labelsize<\/code>: font size for the text labels, e.g.: <code>labelsize = 4<\/code>.<\/li><li><code>pointsize<\/code>: the size of points, e.g.: <code>pointsize = 1.5<\/code>.<\/li><li><code>arrowsize<\/code>: the size of arrows. Controls the thickness of arrows, e.g.: <code>arrowsize = 0.5<\/code>.<\/li><li><code>pointshape<\/code>: the shape of points, <code>pointshape = 21<\/code>. Type <code>ggpubr::show_point_shapes()<\/code> to see available point shapes.<\/li><\/ol>\n\n\n\n<pre class=\"wp-block-code\"><code># Change the size of arrows an labels\nfviz_pca_var(res.pca, arrowsize = 1, labelsize = 5, \n             repel = TRUE)\n# Change points size, shape and fill color\n# Change labelsize\nfviz_pca_ind(res.pca, \n             pointsize = 3, pointshape = 21, fill = \"lightblue\",\n             labelsize = 5, repel = TRUE)<\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\">Ellipses<\/h4>\n\n\n\n<p>As we described in the previous section @ref(color-ind-by-groups), \nwhen coloring individuals by groups, you can add point concentration \nellipses using the argument <code>addEllipses = TRUE<\/code>.<\/p>\n\n\n\n<p>Note that, the argument <code>ellipse.type<\/code> can be used to change the type of ellipses. Possible values are:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><code>\"convex\"<\/code>: plot convex hull of a set o points.<\/li><li><code>\"confidence\"<\/code>: plot confidence ellipses around group mean points as the function <code>coord.ellipse()<\/code> [in FactoMineR].<\/li><li><code>\"t\"<\/code>: assumes a multivariate t-distribution.<\/li><li><code>\"norm\"<\/code>: assumes a multivariate normal distribution.<\/li><li><code>\"euclid\"<\/code>: draws a circle with the\n radius equal to level, representing the euclidean distance from the \ncenter. This ellipse probably won\u2019t appear circular unless <code>coord_fixed()<\/code> is applied.<\/li><\/ul>\n\n\n\n<p>The argument <code>ellipse.level<\/code> is also\n available to change the size of the concentration ellipse in normal \nprobability. For example, specify ellipse.level = 0.95 or ellipse.level =\n 0.66.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Add confidence ellipses\nfviz_pca_ind(iris.pca, geom.ind = \"point\", \n             col.ind = iris$Species, # color by groups\n             palette = c(\"#00AFBB\", \"#E7B800\", \"#FC4E07\"),\n             addEllipses = TRUE, ellipse.type = \"confidence\",\n             legend.title = \"Groups\"\n             )\n# Convex hull\nfviz_pca_ind(iris.pca, geom.ind = \"point\",\n             col.ind = iris$Species, # color by groups\n             palette = c(\"#00AFBB\", \"#E7B800\", \"#FC4E07\"),\n             addEllipses = TRUE, ellipse.type = \"convex\",\n             legend.title = \"Groups\"\n             )<\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\">Group mean points<\/h4>\n\n\n\n<p>When coloring individuals by groups (section \n@ref(color-ind-by-groups)), the mean points of groups (barycenters) are \nalso displayed by default.<\/p>\n\n\n\n<p>To remove the mean points, use the argument <code>mean.point = FALSE<\/code>.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>fviz_pca_ind(iris.pca,\n             geom.ind = \"point\", # show points only (but not \"text\")\n             group.ind = iris$Species, # color by groups\n             legend.title = \"Groups\",\n             mean.point = FALSE)<\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\">Axis lines<\/h4>\n\n\n\n<p>The argument <code>axes.linetype<\/code> can be \nused to specify the line type of axes. Default is \u201cdashed\u201d. Allowed \nvalues include \u201cblank\u201d, \u201csolid\u201d, \u201cdotted\u201d, etc. To see all possible \nvalues type <code>ggpubr::show_line_types()<\/code> in R.<\/p>\n\n\n\n<p>To remove axis lines, use axes.linetype = \u201cblank\u201d:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>fviz_pca_var(res.pca, axes.linetype = \"blank\")<\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\">Graphical parameters<\/h4>\n\n\n\n<p>To change easily the graphical of any ggplots, you can use the function <a href=\"http:\/\/www.sthda.com\/english\/rpkgs\/ggpubr\/reference\/ggpar.html\">ggpar()<\/a> [ggpubr package]<\/p>\n\n\n\n<p>The graphical parameters that can be changed using ggpar() include:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Main titles, axis labels and legend titles<\/li><li>Legend position. Possible values: \u201ctop\u201d, \u201cbottom\u201d, \u201cleft\u201d, \u201cright\u201d, \u201cnone\u201d.<\/li><li>Color palette.<\/li><li>Themes. Allowed values include: theme_gray(), theme_bw(), theme_minimal(), theme_classic(), theme_void().<\/li><\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code>ind.p &lt;- fviz_pca_ind(iris.pca, geom = \"point\", col.ind = iris$Species)\nggpubr::ggpar(ind.p,\n              title = \"Principal Component Analysis\",\n              subtitle = \"Iris data set\",\n              caption = \"Source: factoextra\",\n              xlab = \"PC1\", ylab = \"PC2\",\n              legend.title = \"Species\", legend.position = \"top\",\n              ggtheme = theme_gray(), palette = \"jco\"\n              )<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"http:\/\/www.sthda.com\/english\/sthda-upload\/figures\/principal-component-methods\/006-principal-component-analysis-graphical-parameters-1.png\" alt=\"\"\/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">Biplot<\/h3>\n\n\n\n<p>To make a simple biplot of individuals and variables, type this:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>fviz_pca_biplot(res.pca, repel = TRUE,\n                col.var = \"#2E9FDF\", # Variables color\n                col.ind = \"#696969\"  # Individuals color\n                )<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"http:\/\/www.sthda.com\/english\/sthda-upload\/figures\/principal-component-methods\/006-principal-component-analysis-prcomp-biplot-factoextra-data-mining-1.png\" alt=\"\"\/><\/figure>\n\n\n\n<p>\nNote that, the biplot might be only useful when there is a low number of\n variables and individuals in the data set; otherwise the final plot \nwould be unreadable.\n<\/p>\n\n\n\n<p>\nNote also that, the coordinate of individuals and variables are not \nconstructed on the same space. Therefore, in the biplot, you should \nmainly focus on the direction of variables but not on their absolute \npositions on the plot.\n<\/p>\n\n\n\n<p>\nRoughly speaking a biplot can be interpreted as follow:\n<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>\nan individual that is on the same side of a given variable has a high value for this variable;\n<\/li><li>\nan individual that is on the opposite side of a given variable has a low value for this variable.\n<\/li><\/ul>\n\n\n\n<p>Now, using the <code>iris.pca<\/code> output, let\u2019s :<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>make a biplot of individuals and variables<\/li><li>change the color of individuals by groups: col.ind = iris$Species<\/li><li>show only the labels for variables: <code>label = \"var\"<\/code> or use <code>geom.ind = \"point\"<\/code><\/li><\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code>fviz_pca_biplot(iris.pca, \n                col.ind = iris$Species, palette = \"jco\", \n                addEllipses = TRUE, label = \"var\",\n                col.var = \"black\", repel = TRUE,\n                legend.title = \"Species\") <\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"http:\/\/www.sthda.com\/english\/sthda-upload\/figures\/principal-component-methods\/006-principal-component-analysis-biplot-change-color-groups-1.png\" alt=\"\"\/><\/figure>\n\n\n\n<p>In the following example, we want to color both individuals and \nvariables by groups. The trick is to use pointshape = 21 for individual \npoints. This particular point shape can be filled by a color using the \nargument <code>fill.ind<\/code>. The border line color of individual points is set to \u201cblack\u201d using <code>col.ind<\/code>. To color variable by groups, the argument <code>col.var<\/code> will be used.<\/p>\n\n\n\n<p>To customize individuals and variable colors, we use the helper functions <code>fill_palette()<\/code> and <code>color_palette()<\/code> [in ggpubr package].<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>fviz_pca_biplot(iris.pca, \n                # Fill individuals by groups\n                geom.ind = \"point\",\n                pointshape = 21,\n                pointsize = 2.5,\n                fill.ind = iris$Species,\n                col.ind = \"black\",\n                # Color variable by groups\n                col.var = factor(c(\"sepal\", \"sepal\", \"petal\", \"petal\")),\n                \n                legend.title = list(fill = \"Species\", color = \"Clusters\"),\n                repel = TRUE        # Avoid label overplotting\n             )+\n  ggpubr::fill_palette(\"jco\")+      # Indiviual fill color\n  ggpubr::color_palette(\"npg\")      # Variable colors<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"http:\/\/www.sthda.com\/english\/sthda-upload\/figures\/principal-component-methods\/006-principal-component-analysis-color-individuals-and-variables-by-groups-1.png\" alt=\"\"\/><\/figure>\n\n\n\n<p>Another complex example is to color individuals by groups (discrete \ncolor) and variables by their contributions to the principal components \n(gradient colors). Additionally, we\u2019ll change the transparency of \nvariables by their contributions using the argument <code>alpha.var<\/code>.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>fviz_pca_biplot(iris.pca, \n                # Individuals\n                geom.ind = \"point\",\n                fill.ind = iris$Species, col.ind = \"black\",\n                pointshape = 21, pointsize = 2,\n                palette = \"jco\",\n                addEllipses = TRUE,\n                # Variables\n                alpha.var =\"contrib\", col.var = \"contrib\",\n                gradient.cols = \"RdYlBu\",\n                \n                legend.title = list(fill = \"Species\", color = \"Contrib\",\n                                    alpha = \"Contrib\")\n                )<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"http:\/\/www.sthda.com\/english\/sthda-upload\/figures\/principal-component-methods\/006-principal-component-analysis-color-individuals-by-groups-and-variables-by-contributions-1.png\" alt=\"\"\/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Supplementary elements<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Definition and types<\/h3>\n\n\n\n<p>As described above (section @ref(pca-data-format)), the <code>decathlon2<\/code> data sets contain <em>supplementary continuous variables<\/em> (quanti.sup, columns 11:12), <em>supplementary qualitative variables<\/em> (quali.sup, column 13) and <em>supplementary individuals<\/em> (ind.sup, rows 24:27).<\/p>\n\n\n\n<p>Supplementary variables and individuals are not used for the \ndetermination of the principal components. Their coordinates are \npredicted using only the information provided by the performed principal\n component analysis on active variables\/individuals.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Specification in PCA<\/h3>\n\n\n\n<p>To specify supplementary individuals and variables, the function <em>PCA<\/em>() can be used as follow:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>PCA(X, ind.sup = NULL, \n    quanti.sup = NULL, quali.sup = NULL, graph = TRUE)<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\"><li><code>X<\/code> : a data frame. Rows are individuals and columns are numeric variables.<\/li><li><code>ind.sup<\/code> : a numeric vector specifying the indexes of the supplementary individuals<\/li><li><code>quanti.sup<\/code>, <code>quali.sup<\/code> : a numeric vector specifying, respectively, the indexes of the quantitative and qualitative variables<\/li><li><code>graph<\/code> : a logical value. If TRUE a graph is displayed.<\/li><\/ul>\n\n\n\n<p>For example, type this:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>res.pca &lt;- PCA(decathlon2, ind.sup = 24:27, \n               quanti.sup = 11:12, quali.sup = 13, graph=FALSE)<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Quantitative variables<\/h3>\n\n\n\n<ul class=\"wp-block-list\"><li>Predicted results (coordinates, correlation and cos2) for the supplementary quantitative variables:<\/li><\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code>res.pca$quanti.sup<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>## $coord\n##         Dim.1   Dim.2  Dim.3   Dim.4   Dim.5\n## Rank   -0.701 -0.2452 -0.183  0.0558 -0.0738\n## Points  0.964  0.0777  0.158 -0.1662 -0.0311\n## \n## $cor\n##         Dim.1   Dim.2  Dim.3   Dim.4   Dim.5\n## Rank   -0.701 -0.2452 -0.183  0.0558 -0.0738\n## Points  0.964  0.0777  0.158 -0.1662 -0.0311\n## \n## $cos2\n##        Dim.1   Dim.2  Dim.3   Dim.4   Dim.5\n## Rank   0.492 0.06012 0.0336 0.00311 0.00545\n## Points 0.929 0.00603 0.0250 0.02763 0.00097<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\"><li>Visualize all variables (active and supplementary ones):<\/li><\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code>fviz_pca_var(res.pca)<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"http:\/\/www.sthda.com\/english\/sthda-upload\/figures\/principal-component-methods\/006-principal-component-analysis-quantitative-supplementary-variable-data-mining-1.png\" alt=\"\"\/><\/figure>\n\n\n\n<p>\nNote that, by default, supplementary quantitative variables are shown in blue color and dashed lines.\n<\/p>\n\n\n\n<p>Further arguments to customize the plot:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Change color of variables\nfviz_pca_var(res.pca,\n             col.var = \"black\",     # Active variables\n             col.quanti.sup = \"red\" # Suppl. quantitative variables\n             )\n# Hide active variables on the plot, \n# show only supplementary variables\nfviz_pca_var(res.pca, invisible = \"var\")\n# Hide supplementary variables\nfviz_pca_var(res.pca, invisible = \"quanti.sup\")<\/code><\/pre>\n\n\n\n<p>\nUsing the <em>fviz_pca_var<\/em>(), the quantitative supplementary \nvariables are displayed automatically on the correlation circle plot. \nNote that, you can add the quanti.sup variables manually, using the \nfviz_add() function, for further customization. An example is shown \nbelow.\n<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Plot of active variables\np &lt;- fviz_pca_var(res.pca, invisible = \"quanti.sup\")\n# Add supplementary active variables\nfviz_add(p, res.pca$quanti.sup$coord, \n         geom = c(\"arrow\", \"text\"), \n         color = \"red\")<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Individuals<\/h3>\n\n\n\n<ul class=\"wp-block-list\"><li>Predicted results for the supplementary individuals (<code>ind.sup<\/code>):<\/li><\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code>res.pca$ind.sup<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\"><li>Visualize all individuals (active and supplementary ones). On the \ngraph, you can add also the supplementary qualitative variables (<code>quali.sup<\/code>), which coordinates is accessible using <em>res.pca$quali.supp$coord<\/em>.<\/li><\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code>p &lt;- fviz_pca_ind(res.pca, col.ind.sup = \"blue\", repel = TRUE)\np &lt;- fviz_add(p, res.pca$quali.sup$coord, color = \"red\")\np<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"http:\/\/www.sthda.com\/english\/sthda-upload\/figures\/principal-component-methods\/006-principal-component-analysis-supplementary-individuals-1.png\" alt=\"\"\/><\/figure>\n\n\n\n<p>\nSupplementary individuals are shown in blue. The levels of the supplementary qualitative variable are shown in red color.\n<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Qualitative variables<\/h3>\n\n\n\n<p>In the previous section, we showed that you can add the supplementary qualitative variables on individuals plot using <code>fviz_add()<\/code>.<\/p>\n\n\n\n<p>Note that, the supplementary qualitative variables can be also used \nfor coloring individuals by groups. This can help to interpret the data.\n The data sets <code>decathlon2<\/code> contain a <em>supplementary qualitative variable<\/em> at columns 13 corresponding to the type of competitions.<\/p>\n\n\n\n<p>The results concerning the supplementary qualitative variable are:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>res.pca$quali<\/code><\/pre>\n\n\n\n<p>To color individuals by a supplementary qualitative variable, the argument <code>habillage<\/code>\n is used to specify the index of the supplementary qualitative variable.\n Historically, this argument name comes from the FactoMineR package. \nIt\u2019s a french word meaning \u201cdressing\u201d in english. To keep consistency \nbetween FactoMineR and factoextra, we decided to keep the same argument \nname<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>fviz_pca_ind(res.pca, habillage = 13,\n             addEllipses =TRUE, ellipse.type = \"confidence\",\n             palette = \"jco\", repel = TRUE) <\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"http:\/\/www.sthda.com\/english\/sthda-upload\/figures\/principal-component-methods\/006-principal-component-analysis-supplementary-qualitative-variable-1.png\" alt=\"\"\/><\/figure>\n\n\n\n<p>\nRecall that, to remove the mean points of groups, specify the argument <em>mean.point = FALSE<\/em>.\n<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Filtering results<\/h2>\n\n\n\n<p>If you have many individuals\/variable, it\u2019s possible to visualize only some of them using the arguments <code>select.ind<\/code> and <code>select.var<\/code>.<\/p>\n\n\n\n<p><code>select.ind, select.var:<\/code> a selection of individuals\/variable to be plotted. Allowed values are <em>NULL<\/em> or a <em>list<\/em> containing the arguments name, cos2 or contrib:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><code>name<\/code>: is a character vector containing individuals\/variable names to be plotted<\/li><li><code>cos2<\/code>: if cos2 is in [0, 1], ex: 0.6, then individuals\/variables with a cos2 &gt; 0.6 are plotted<\/li><li><code>if cos2 &gt; 1<\/code>, ex: 5, then the top 5 active individuals\/variables and top 5 supplementary columns\/rows with the highest cos2 are plotted<\/li><li><code>contrib<\/code>: if contrib &gt; 1, ex: 5, then the top 5 individuals\/variables with the highest contributions are plotted<\/li><\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code># Visualize variable with cos2 >= 0.6\nfviz_pca_var(res.pca, select.var = list(cos2 = 0.6))\n# Top 5 active variables with the highest cos2\nfviz_pca_var(res.pca, select.var= list(cos2 = 5))\n# Select by names\nname &lt;- list(name = c(\"Long.jump\", \"High.jump\", \"X100m\"))\nfviz_pca_var(res.pca, select.var = name)\n# top 5 contributing individuals and variable\nfviz_pca_biplot(res.pca, select.ind = list(contrib = 5), \n               select.var = list(contrib = 5),\n               ggtheme = theme_minimal())<\/code><\/pre>\n\n\n\n<p>\nWhen the selection is done according to the contribution values, \nsupplementary individuals\/variables are not shown because they don\u2019t \ncontribute to the construction of the axes.\n<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Exporting results<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Export plots to PDF\/PNG files<\/h3>\n\n\n\n<p>The <code>factoextra<\/code> package produces a ggplot2-based graphs. To save any ggplots, the standard R code is as follow:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Print the plot to a pdf file\npdf(\"myplot.pdf\")\nprint(myplot)\ndev.off()<\/code><\/pre>\n\n\n\n<p>In the following examples, we\u2019ll show you how to save the different graphs into pdf or png files.<\/p>\n\n\n\n<p>The first step is to create the plots you want as an R object:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Scree plot\nscree.plot &lt;- fviz_eig(res.pca)\n# Plot of individuals\nind.plot &lt;- fviz_pca_ind(res.pca)\n# Plot of variables\nvar.plot &lt;- fviz_pca_var(res.pca)<\/code><\/pre>\n\n\n\n<p>Next, the plots can be exported into a single pdf file as follow:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>pdf(\"PCA.pdf\") # Create a new pdf device\nprint(scree.plot)\nprint(ind.plot)\nprint(var.plot)\ndev.off() # Close the pdf device<\/code><\/pre>\n\n\n\n<p>\nNote that, using the above R code will create the PDF file into your \ncurrent working directory. To see the path of your current working \ndirectory, type getwd() in the R console.\n<\/p>\n\n\n\n<p>To print each plot to specific png file, the R code looks like this:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Print scree plot to a png file\npng(\"pca-scree-plot.png\")\nprint(scree.plot)\ndev.off()\n# Print individuals plot to a png file\npng(\"pca-variables.png\")\nprint(var.plot)\ndev.off()\n# Print variables plot to a png file\npng(\"pca-individuals.png\")\nprint(ind.plot)\ndev.off()<\/code><\/pre>\n\n\n\n<p>Another alternative, to export ggplots, is to use the function <code>ggexport()<\/code>\n <\/p>\n\n\n<p>[in ggpubr package]<\/p>\n\n\n\n<p>. We like ggexport(), because it\u2019s very simple. With\n one line R code, it allows us to export individual plots to a file \n(pdf, eps or png) (one plot per page). It can also arrange the plots (2 \nplot per page, for example) before exporting them. The examples below \ndemonstrates how to export ggplots using ggexport().\n<\/p>\n\n\n\n<p>Export individual plots to a pdf file (one plot per page):<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>library(ggpubr)\nggexport(plotlist = list(scree.plot, ind.plot, var.plot), \n         filename = \"PCA.pdf\")<\/code><\/pre>\n\n\n\n<p>Arrange and export. Specify <code>nrow<\/code> and <code>ncol<\/code> to display multiple plots on the same page:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>ggexport(plotlist = list(scree.plot, ind.plot, var.plot), \n         nrow = 2, ncol = 2,\n         filename = \"PCA.pdf\")<\/code><\/pre>\n\n\n\n<p>Export plots to png files. If you specify a list of plots, then \nmultiple png files will be automatically created to hold each plot.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>ggexport(plotlist = list(scree.plot, ind.plot, var.plot),\n         filename = \"PCA.png\")<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Export results to txt\/csv files<\/h3>\n\n\n\n<p>All the outputs of the PCA (individuals\/variables coordinates, \ncontributions, etc) can be exported at once, into a TXT\/CSV file, using \nthe function <code>write.infile()<\/code> [in <em>FactoMineR<\/em>] package:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Export into a TXT file\nwrite.infile(res.pca, \"pca.txt\", sep = \"\\t\")\n# Export into a CSV file\nwrite.infile(res.pca, \"pca.csv\", sep = \";\")<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Summary<\/h2>\n\n\n\n<p>In conclusion, we described how to perform and interpret principal component analysis (PCA). We computed PCA using the <strong>PCA<\/strong>() function [FactoMineR]. Next, we used the <strong>factoextra<\/strong> R package to produce ggplot2-based visualization of the PCA results.<\/p>\n\n\n\n<p>There are other functions [packages] to compute PCA in R:<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li>Using <strong>prcomp<\/strong>() [stats]<\/li><\/ol>\n\n\n\n<pre class=\"wp-block-code\"><code>res.pca &lt;- prcomp(iris[, -5], scale. = TRUE)<\/code><\/pre>\n\n\n\n<p>Read more: <a href=\"http:\/\/www.sthda.com\/english\/wiki\/pca-using-prcomp-and-princomp\">http:\/\/www.sthda.com\/english\/wiki\/pca-using-prcomp-and-princomp<\/a><\/p>\n\n\n\n<ol class=\"wp-block-list\"><li>Using <strong>princomp<\/strong>() [stats]<\/li><\/ol>\n\n\n\n<pre class=\"wp-block-code\"><code>res.pca &lt;- princomp(iris[, -5], cor = TRUE)<\/code><\/pre>\n\n\n\n<p>Read more: <a href=\"http:\/\/www.sthda.com\/english\/wiki\/pca-using-prcomp-and-princomp\">http:\/\/www.sthda.com\/english\/wiki\/pca-using-prcomp-and-princomp<\/a><\/p>\n\n\n\n<ol class=\"wp-block-list\"><li>Using <strong>dudi.pca<\/strong>() [ade4]<\/li><\/ol>\n\n\n\n<pre class=\"wp-block-code\"><code>library(\"ade4\")\nres.pca &lt;- dudi.pca(iris[, -5], scannf = FALSE, nf = 5)<\/code><\/pre>\n\n\n\n<p>Read more: <a href=\"http:\/\/www.sthda.com\/english\/wiki\/pca-using-ade4-and-factoextra\">http:\/\/www.sthda.com\/english\/wiki\/pca-using-ade4-and-factoextra<\/a><\/p>\n\n\n\n<ol class=\"wp-block-list\"><li>Using <strong>epPCA<\/strong>() [ExPosition]<\/li><\/ol>\n\n\n\n<pre class=\"wp-block-code\"><code>library(\"ExPosition\")\nres.pca &lt;- epPCA(iris[, -5], graph = FALSE)<\/code><\/pre>\n\n\n\n<p>No matter what functions you decide to use, in the list above, the \nfactoextra package can handle the output for creating beautiful plots \nsimilar to what we described in the previous sections for FactoMineR:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>fviz_eig(res.pca)     # Scree plot\nfviz_pca_ind(res.pca) # Graph of individuals\nfviz_pca_var(res.pca) # Graph of variables<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Further reading<\/h2>\n\n\n\n<p>For the mathematical background behind CA, refer to the following video courses, articles and books:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Principal component analysis (article) (Abdi and Williams 2010). <a href=\"https:\/\/goo.gl\/1Vtwq1\">https:\/\/goo.gl\/1Vtwq1<\/a>.<\/li><li>Principal Component Analysis Course Using FactoMineR (Video courses). <a href=\"https:\/\/goo.gl\/VZJsnM\">https:\/\/goo.gl\/VZJsnM<\/a><\/li><li>Exploratory Multivariate Analysis by Example Using R (book) (Husson, Le, and Pag\u00e8s 2017).<\/li><li>Principal Component Analysis (book) (Jollife 2002).<\/li><\/ul>\n\n\n\n<p>See also:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>PCA using <strong>prcomp<\/strong>() and <strong>princomp<\/strong>() (tutorial). <a href=\"http:\/\/www.sthda.com\/english\/wiki\/pca-using-prcomp-and-princomp\">http:\/\/www.sthda.com\/english\/wiki\/pca-using-prcomp-and-princomp<\/a><\/li><li>PCA using <strong>ade4<\/strong> and <strong>factoextra<\/strong> (tutorial). <a href=\"http:\/\/www.sthda.com\/english\/wiki\/pca-using-ade4-and-factoextra\">http:\/\/www.sthda.com\/english\/wiki\/pca-using-ade4-and-factoextra<\/a><\/li><\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">References<\/h2>\n\n\n\n<p>Abdi, Herv\u00e9, and Lynne J. Williams. 2010. \u201cPrincipal Component Analysis.\u201d <em>John Wiley and Sons, Inc. WIREs Comp Stat<\/em> 2: 433\u201359. <a href=\"http:\/\/staff.ustc.edu.cn\/~zwp\/teach\/MVA\/abdi-awPCA2010.pdf\">http:\/\/staff.ustc.edu.cn\/~zwp\/teach\/MVA\/abdi-awPCA2010.pdf<\/a>.<\/p>\n\n\n\n<p>Husson, Francois, Sebastien Le, and J\u00e9r\u00f4me Pag\u00e8s. 2017. <em>Exploratory Multivariate Analysis by Example Using R<\/em>. 2nd ed. Boca Raton, Florida: Chapman; Hall\/CRC. <a href=\"http:\/\/factominer.free.fr\/bookV2\/index.html\">http:\/\/factominer.free.fr\/bookV2\/index.html<\/a>.<\/p>\n\n\n\n<p>Jollife, I.T. 2002. <em>Principal Component Analysis<\/em>. 2nd ed. New York: Springer-Verlag. <a href=\"https:\/\/goo.gl\/SB86SR\">https:\/\/goo.gl\/SB86SR<\/a>.<\/p>\n\n\n\n<p>Kaiser, Henry F. 1961. \u201cA Note on Guttman\u2019s Lower Bound for the Number of Common Factors.\u201d <em>British Journal of Statistical Psychology<\/em> 14: 1\u20132.<\/p>\n\n\n\n<p>Peres-Neto, Pedro R., Donald A. Jackson, and Keith M. Somers. 2005. \n\u201cHow Many Principal Components? Stopping Rules for Determining the \nNumber of Non-Trivial Axes Revisited.\u201d <em>British Journal of Statistical Psychology<\/em> 49: 974\u201397.<\/p>\n\n\n\n<p><a href=\"http:\/\/www.sthda.com\/english\/articles\/31-principal-component-methods-in-r-practical-guide\/112-pca-principal-component-analysis-essentials\/\">http:\/\/www.sthda.com\/english\/articles\/31-principal-component-methods-in-r-practical-guide\/112-pca-principal-component-analysis-essentials\/<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>PCA &#8211; Principal Component Analysis Essentials &nbsp;kassambara&nbsp;|&nbsp; &nbsp;23\/09\/2017&nbsp;| &nbsp;&nbsp;169015 &nbsp;|&nbsp;&nbsp;Comments (37) &nbsp;|&nbsp;&nbsp;Principal Component Methods in R: Practical Guide &nbsp;|&nbsp; Multivariate Analysis Principal component analysis (PCA)&hellip; <\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[],"class_list":["post-1122","post","type-post","status-publish","format-standard","hentry","category-r"],"_links":{"self":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/posts\/1122","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/comments?post=1122"}],"version-history":[{"count":0,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/posts\/1122\/revisions"}],"wp:attachment":[{"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/media?parent=1122"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/categories?post=1122"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/zhuoyao.net\/index.php\/wp-json\/wp\/v2\/tags?post=1122"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}