NMST539 | Lab Session 6

Factor Analysis with R

LS 2017 | Tuesday 18/04/2017

Rmd file (UTF8 encoding)

The R-software is available for download from the website: https://www.r-project.org

A user-friendly interface (one of many): RStudio.

Manuals and introduction into R (in Czech or English):

  • Bína, V., Komárek, A. a Komárková, L.: Jak na jazyk R. (PDF súbor)
  • Komárek, A.: Základy práce s R. (PDF súbor)
  • Kulich, M.: Velmi stručný úvod do R. (PDF súbor)
  • De Vries, A. a Meys, J.: R for Dummies. (ISBN-13: 978-1119055808)

Introduction into FA

Factor analysis is an effective statistical method for dimensionaly reduction especially in situation where one needs to use the data for further analysis and calculations (e.g. regression). It is used to describe the overall correlation among all variables however, using a potentially lower number of variables which are called factors. These factors are, however, unobserved random variables.

The factor analysis approach searches for similar covariates with respect to their mutuall correlation. All variables with mutually high correlation are represented with a factor (or a linear combination of factors) instead.

In some sense the factor analysis approach can be considered to be a generalization of the classical principal component analysis (PCA) with one great advantage at the end - much more convenient interpretation of the factors.

In the statistical software R there is function factanal() available under the standard R instalation. Beside that, there are many more additinal functions and packages which can be downloaded and installed in R (e.g. Factanal() function in the ‘FAiR’ package; fa.promax() function in the ‘psych’ package;).

For our purposes we mainly use the standard function factanal(). Let us again recall the biological metrics data from the Czech republic. The data represent 65 different river localities in the Czech Republic where on each locality there are various biological metrics assessed (17 metrics in total).

rm(list = ls())
bioData <- read.csv("http://msekce.karlin.mff.cuni.cz/~maciak/NMST539/bioData.csv", header = T)

The correlation structure (wich will be later assessed using the factor analysis approach) can be either estimated using a standard variance covariance matrix (command var(bioData[,2:18])) or it can be visualized using the corrplot() function from the ‘corrplot’ package instead.

corrplot(cor(bioData[,2:18]), method="ellipse")