**Introduction into FA**

Factor analysis is an effective statistical method for dimensionaly reduction especially in situation where one needs to use the data for further analysis and calculations (e.g. regression). It is used to describe the overall correlation among all variables however, using a potentially lower number of variables which are called factors. These factors are, however, unobserved random variables.

The factor analysis approach searches for similar covariates with respect to their mutuall correlation. All variables with mutually high correlation are represented with a factor (or a linear combination of factors) instead.

In some sense the factor analysis approach can be considered to be a generalization of the classical principal component analysis (PCA) with one great advantage at the end - much more convenient interpretation of the factors.

In the statistical software R there is function `factanal()`

available under the standard R instalation. Beside that, there are many more additinal functions and packages which can be downloaded and installed in R (e.g. `Factanal()`

function in the ‘FAiR’ package; `fa.promax()`

function in the ‘psych’ package;).

For our purposes we mainly use the standard function `factanal()`

. Let us again recall the biological metrics data from the Czech republic. The data represent 65 different river localities in the Czech Republic where on each locality there are various biological metrics assessed (17 metrics in total).

```
rm(list = ls())
bioData <- read.csv("http://msekce.karlin.mff.cuni.cz/~maciak/NMST539/bioData.csv", header = T)
```

The correlation structure (wich will be later assessed using the factor analysis approach) can be either estimated using a standard variance covariance matrix (command `var(bioData[,2:18])`

) or it can be visualized using the `corrplot()`

function from the ‘corrplot’ package instead.

```
library(corrplot)
corrplot(cor(bioData[,2:18]), method="ellipse")
```