NMST539 | Lab Session 6

Factor Analysis with R

LS 2017 | Tuesday 18/04/2017

Rmd file (UTF8 encoding)

The R-software is available for download from the website: https://www.r-project.org

A user-friendly interface (one of many): RStudio.

Manuals and introduction into R (in Czech or English):

Bína, V., Komárek, A. a Komárková, L.: Jak na jazyk R. (PDF súbor)
Komárek, A.: Základy práce s R. (PDF súbor)
Kulich, M.: Velmi stručný úvod do R. (PDF súbor)
De Vries, A. a Meys, J.: R for Dummies. (ISBN-13: 978-1119055808)

Introduction into FA

Factor analysis is an effective statistical method for dimensionaly reduction especially in situation where one needs to use the data for further analysis and calculations (e.g. regression). It is used to describe the overall correlation among all variables however, using a potentially lower number of variables which are called factors. These factors are, however, unobserved random variables.

The factor analysis approach searches for similar covariates with respect to their mutuall correlation. All variables with mutually high correlation are represented with a factor (or a linear combination of factors) instead.

In some sense the factor analysis approach can be considered to be a generalization of the classical principal component analysis (PCA) with one great advantage at the end - much more convenient interpretation of the factors.

In the statistical software R there is function factanal() available under the standard R instalation. Beside that, there are many more additinal functions and packages which can be downloaded and installed in R (e.g. Factanal() function in the ‘FAiR’ package; fa.promax() function in the ‘psych’ package;).

For our purposes we mainly use the standard function factanal(). Let us again recall the biological metrics data from the Czech republic. The data represent 65 different river localities in the Czech Republic where on each locality there are various biological metrics assessed (17 metrics in total).

rm(list = ls())
bioData <- read.csv("http://msekce.karlin.mff.cuni.cz/~maciak/NMST539/bioData.csv", header = T)

The correlation structure (wich will be later assessed using the factor analysis approach) can be either estimated using a standard variance covariance matrix (command var(bioData[,2:18])) or it can be visualized using the corrplot() function from the ‘corrplot’ package instead.

library(corrplot)
corrplot(cor(bioData[,2:18]), method="ellipse")

The idea behind the factanal() function is to consider a $p$-dimensional (random) vector $\boldsymbol{X}$ and to express this vector using a set of $k$ factors where we require want that $k \ll p$ (dimensionaly reduction). The model which is fitted by the factor analysis approch by appling the `factanal()’ function in R is the following one:

\[ \boldsymbol{X} = \Lambda \boldsymbol{F} + \boldsymbol{e}, \]

where $\Lambda$ is a $p\times k$ dimensional matrix of so called factor loadings, $\boldsymbol{F}$ is a $k$-dimensional (random) vector represending $k$ factors (common factors or scores respectively) and $\boldsymbol{e}$ is an approximation error (or specific factors).

The tricky part in this expresssion is that beside the random vector $\boldsymbol{X}$ no other quantity is directly observed. The problem, as such, is unsolvable unless we pose some additional restrictions. This is done by specifying a varinace covariance structure among all quantities which appear in the expression above:

The scores $\boldsymbol{F}$ are required to be uncorrelated with unit variance ($Var (\boldsymbol{F}) = \mathbb{I}_{k});$
Error terms $\boldsymbol{e}$ are independent with some variance ($Var (\boldsymbol{e}) = \boldsymbol{\Psi}$, where $var (e_{i}) = \psi_{ii}$);
The correlation matrix of $\boldsymbol{X}$ is decomposed as $\Sigma = \Lambda \Lambda^{T} + \boldsymbol{\Psi}$.

Note

Factor analysis is not unique – the rotation of $X$ gives the same results;
Factor analysis is invariant with respect to the scale;

Since the factors are not unique with respect to the rotation, it is usefull to provide some rotation which makes sense. The automatic proceduce which tries to find factors in a way that original covariates can be splitted into disjoint sents is called the procedure.
To use the varimax procedure for detecting the right rotation we can use the additional parameter ‘rotation=“varimax”’ when calling the function factanal() in R.

fa1 <- factanal(bioData[,2:18], factors = 3, rotation="varimax")
print(fa1, digits=2, cutoff=.6, sort=TRUE)

## 
## Call:
## factanal(x = bioData[, 2:18], factors = 3, rotation = "varimax")
## 
## Uniquenesses:
##    SaprInd      Lital       RETI     EPTAbu       Marg   Metaritr 
##       0.15       0.09       0.05       0.05       0.03       0.02 
##     JepAbu  Epiritral Hyporitral     Ntaxon     Nceled     Bindex 
##       0.10       0.07       0.39       0.21       0.09       0.19 
##     EPTTax     PosAbu    Spasaci      MDS_1      MDS_2 
##       0.10       0.35       0.11       0.13       0.45 
## 
## Loadings:
##            Factor1 Factor2 Factor3
## SaprInd    -0.79                  
## Lital       0.94                  
## RETI        0.94                  
## EPTAbu      0.90                  
## Metaritr    0.96                  
## JepAbu      0.68            0.65  
## Epiritral   0.89                  
## Hyporitral  0.71                  
## PosAbu      0.79                  
## Spasaci     0.94                  
## MDS_1       0.91                  
## Marg                0.98          
## Ntaxon              0.65          
## Nceled              0.94          
## Bindex              0.79          
## EPTTax              0.83          
## MDS_2                       0.71  
## 
##                Factor1 Factor2 Factor3
## SS loadings       8.96    4.05    1.41
## Proportion Var    0.53    0.24    0.08
## Cumulative Var    0.53    0.77    0.85
## 
## Test of the hypothesis that 3 factors are sufficient.
## The chi square statistic is 414.98 on 88 degrees of freedom.
## The p-value is 6.88e-44

Is such factor representation enough? Compare the model with some others for the increasing number of factors used:

fa2 <- factanal(bioData[,2:18], factors = 4, rotation="varimax")
fa3 <- factanal(bioData[,2:18], factors = 5, rotation="varimax")
fa4 <- factanal(bioData[,2:18], factors = 9, rotation="varimax")

There are mainly two different approaches on how to define the number of factors which should be used. For instance, it is easy to see that for each element $X_{j}$ from the random vector $\boldsymbol{X} = (X_{1}, \dots, X_{p})^\top$ it holds that

\[ var(X_{j}) =h_{j}^2 + \psi_{j}, \qquad \textrm{where} \qquad h_{j}^2 = \sum_{\ell = 1}^{k} q_{j \ell}^2, \]

and $\boldsymbol{\psi} = Diag\{\psi_{1}, \dots, \psi_{k}\}$, and $\Lambda = \{q_{i j}\}_{i, j = 1}^{k}$. The quantity $h_{j}^2$ is the overall variability of $X_{j}$ which is explained by the common factors in $F$ (also called the communality) while the second quantity, $\psi_{j}$ is the variability which is left (also called the specific variability).

To decide on how many factors are needed, one can base his decision on the comparison of the communality and the specific varibility. It is obvious, that for $k = p$ the following holds: $h_{j}^2 = var(X_{j})$ and $\psi_{j} = 0$.

Alternative approaches to judge the right number of factors:

Statistical pre-analysis
One usually runs e.g. a principal components analysis to determin how many factors should be enough (in order to have some reasonable proportion of the overal variability explained);
Expert judgement
Sometimes (under some optimal interpretation options) the estimated factors nice correspond with some latent variables which are not observed, however, can be indentified by some expert judgement.
Maximum Likelihood Test
Using the likelihood approach we can also test the null hypothesis $H_{0}$ that $\Sigma = \Lambda\Lambda^\top + \boldsymbol{\psi}$ against a general altternative $H_{1}$ specifing no restrictions on the variance covariance matrix $\Sigma$. The likelihood ratio test is given by the test statistic \[ T = - 2 log \Bigg[\frac{\textrm{maximum likelihood under $H_{0}$}}{\textrm{maximum likelihood under $H_{1}$}}\Bigg] \]

Note

Another option on how to determine the optimal number of factor to be extracted in the analysis is to use the tools from the library ‘nFactors’ (the library can be installed by install.packages("nFactors"));

library(nFactors)
ev <- eigen(cor(bioData[,2:18])) 
ap <- parallel(subject=nrow(bioData[,2:18]),var=ncol(bioData[,2:18]), rep=100, cent=.05)
nS <- nScree(x=ev$values, aparallel=ap$eigen$qevpea)

plotnScree(nS)

load <- fa1$loadings[,1:2] 
plot(load,type="n") 
text(load,labels=names(bioData[,2:18]),cex=.7)

Alternatively, using the library FactoMineR and function PCA() we can obtain the complete factor map (covariate map with the mutual correlation).

library(FactoMineR)
result <- PCA(bioData[,2:18])

Note

The factanal() function in R uses scaled variables as the starting point for the factor analysis - the variance-covariance matrix of $\boldsymbol{X}$ is replaced by the corresponding correlation matrix with ones on its diagonal.

Now, we an idea that maybe three factors should be ok. In orther to get the corresponding score values (those are values of $\boldsymbol{F}$ in the expression above given for each observations - in our case $n = 65$) from the factanal() function we need an additional parameter to be specified:

fa1 <- factanal(bioData[,2:18], factors = 3, scores = "regression")
fa1$scores

##           Factor1     Factor2      Factor3
##  [1,] -0.12449662  0.49318396  0.019226166
##  [2,] -0.33849930 -0.39230318  0.119550138
##  [3,] -0.22044557 -0.38580832  2.320568152
##  [4,]  0.82566318  1.20827606 -0.940891112
##  [5,] -0.13932079  0.89310776 -0.532355972
##  [6,] -1.01455286 -0.65600386 -0.375228204
##  [7,] -0.82343873 -0.62783914 -0.701274829
##  [8,] -0.98728818 -0.13782643 -0.433312546
##  [9,]  0.22418720  0.88632819  0.795206993
## [10,]  0.52292639  1.06261823  0.295526977
## [11,]  0.29559045  1.25010730 -0.529517247
## [12,] -0.50645369 -0.62925864 -0.373956123
## [13,] -0.93565845 -1.35045706 -0.637327245
## [14,] -0.95854352 -0.20085958  1.192711934
## [15,]  0.42955162 -0.44374222 -1.634492803
## [16,]  1.52669131  0.90258023 -0.647334942
## [17,] -1.06358562 -0.98644111 -0.614191669
## [18,] -0.17714883  0.29705426 -0.273963895
## [19,] -1.17296128  0.15482177  0.740586340
## [20,] -0.78876331  1.64768437  1.135756173
## [21,] -0.68592393  1.86494676  0.622906404
## [22,]  0.03378541  0.51636339 -0.492220100
## [23,] -0.98102088  0.67287079  0.410026081
## [24,]  1.72144792  0.93062040 -1.616832483
## [25,] -0.76259043 -1.35110582 -0.851702050
## [26,] -0.91691053 -1.09563523 -0.287746278
## [27,] -0.28310899 -0.08405725 -0.554590635
## [28,]  1.01603809  0.64177486 -1.212012177
## [29,]  0.02890208  0.83487030  0.220552649
## [30,]  2.69443073 -1.16900885  0.606585871
## [31,] -0.47082315 -0.18615905  2.222434682
## [32,]  1.95090263 -1.29573724  1.303667307
## [33,]  0.28558011  1.04412770  0.364248451
## [34,] -0.18909235  0.97786635  0.090795648
## [35,] -0.48400090 -0.41407901 -0.459370022
## [36,] -0.86086894 -1.09247855 -0.356923055
## [37,] -0.83806848 -0.92300383 -0.815412673
## [38,] -0.91971280 -0.62821762 -0.098009986
## [39,] -0.87317702 -0.29454466 -0.077311663
## [40,]  0.96502394 -1.12911936  4.327243670
## [41,] -0.63704689  0.20145093  0.197835134
## [42,] -0.19959392  1.30938359  0.667994311
## [43,]  3.12996486 -1.62033577  0.044354682
## [44,]  1.20880189  0.64341909 -0.149924195
## [45,]  0.06135524 -0.02822842 -0.557196522
## [46,] -0.86225779 -1.61696484 -1.552055930
## [47,] -0.61532943 -0.45390266 -0.419664831
## [48,] -0.14042956  0.55569570 -0.113699662
## [49,] -0.76637088  0.22722578 -0.178657943
## [50,]  0.92871029  1.82419989 -0.448391534
## [51,]  1.52938447  1.37527251 -1.001758261
## [52,] -0.06380299 -2.13721348  0.714719247
## [53,] -0.48147547  0.55671249  1.028660417
## [54,] -0.01773075 -0.25909705 -0.961153139
## [55,]  0.44621941  0.62532458 -0.231747386
## [56,] -0.48055946  0.70571466  0.712530342
## [57,] -0.57932009 -0.16888336  0.267753908
## [58,]  0.41294090  0.99097762 -0.110134319
## [59,] -0.54167359  0.01836471  0.755755149
## [60,] -0.08414099  0.39657781  0.326141043
## [61,] -0.92105906 -1.35457565 -0.853275093
## [62,]  2.50167388 -1.84503963 -0.802830680
## [63,] -0.84252936 -0.72197336 -0.191797001
## [64,]  1.79639695 -1.48006272  0.583465441
## [65,]  0.21360641  1.45044090  0.001460895

and the corresponding loadings (matrix $\Lambda$ in the expression above) is obtained as

fa1$loadings

## 
## Loadings:
##            Factor1 Factor2 Factor3
## SaprInd    -0.794  -0.457  -0.109 
## Lital       0.939   0.164         
## RETI        0.941   0.203  -0.142 
## EPTAbu      0.903           0.357 
## Marg                0.984         
## Metaritr    0.959   0.193  -0.150 
## JepAbu      0.684  -0.115   0.646 
## Epiritral   0.886   0.182  -0.340 
## Hyporitral  0.710   0.184   0.275 
## Ntaxon     -0.572   0.654  -0.190 
## Nceled      0.102   0.936  -0.151 
## Bindex      0.421   0.789         
## EPTTax      0.456   0.834         
## PosAbu      0.785          -0.159 
## Spasaci     0.939                 
## MDS_1       0.907   0.190   0.113 
## MDS_2      -0.165  -0.129   0.714 
## 
##                Factor1 Factor2 Factor3
## SS loadings      8.961   4.053   1.409
## Proportion Var   0.527   0.238   0.083
## Cumulative Var   0.527   0.766   0.848

Application in Regression or SEM

Structural Equation Models (SEM) are statistical modelling techniqes showing great potential especially in causal dependencies between endogenous and exogenous variables, and the measurement model showing the relations between latent variables and their indicators (which is also the case of the latent factors $F$ in the factor analysis).

It this situations is common to firstly draw a schematic diagram with the assumed underlying structure. Later, the diagram is used to specify the final model.

An example of such diagram is below.

In the R software there is package ‘sem’ available for the structural equation modelling.

library("sem")

and a simple example using the dataset ‘PoliticalDemocracy’ from the package library("lavaan"):

library("lavaan")

## This is lavaan 0.5-23.1097

## lavaan is BETA software! Please report any bugs.

## 
## Attaching package: 'lavaan'

## The following objects are masked from 'package:sem':
## 
##     cfa, sem

data(PoliticalDemocracy)
PoliticalDemocracy

##       y1       y2        y3        y4        y5       y6       y7
## 1   2.50 0.000000  3.333333  0.000000  1.250000 0.000000 3.726360
## 2   1.25 0.000000  3.333333  0.000000  6.250000 1.100000 6.666666
## 3   7.50 8.800000  9.999998  9.199991  8.750000 8.094061 9.999998
## 4   8.90 8.800000  9.999998  9.199991  8.907948 8.127979 9.999998
## 5  10.00 3.333333  9.999998  6.666666  7.500000 3.333333 9.999998
## 6   7.50 3.333333  6.666666  6.666666  6.250000 1.100000 6.666666
## 7   7.50 3.333333  6.666666  6.666666  5.000000 2.233333 8.271257
## 8   7.50 2.233333  9.999998  1.496333  6.250000 3.333333 9.999998
## 9   2.50 3.333333  3.333333  3.333333  6.250000 3.333333 3.333333
## 10 10.00 6.666666  9.999998  8.899991  8.750000 6.666666 9.999998
## 11  7.50 3.333333  9.999998  6.666666  8.750000 3.333333 9.999998
## 12  7.50 3.333333  6.666666  6.666666  8.750000 3.333333 6.666666
## 13  7.50 3.333333  9.999998  6.666666  7.500000 3.333333 6.666666
## 14  7.50 7.766664  9.999998  6.666666  7.500000 0.000000 9.999998
## 15  7.50 9.999998  3.333333 10.000000  7.500000 6.666666 9.999998
## 16  7.50 9.999998  9.999998  7.766666  7.500000 1.100000 6.666666
## 17  2.50 3.333333  6.666666  6.666666  5.000000 1.100000 6.666666
## 18  1.25 0.000000  3.333333  3.333333  1.250000 3.333333 3.333333
## 19 10.00 9.999998  9.999998 10.000000  8.750000 9.999998 9.999998
## 20  7.50 3.333299  3.333333  6.666666  7.500000 2.233299 6.666666
## 21 10.00 9.999998  9.999998 10.000000 10.000000 9.999998 9.999998
## 22  1.25 0.000000  0.000000  0.000000  2.500000 0.000000 0.000000
## 23  2.50 0.000000  3.333333  3.333333  2.500000 0.000000 3.333333
## 24  7.50 6.666666  9.999998 10.000000  7.500000 6.666666 9.999998
## 25  8.50 9.999998  6.666666  6.666666  8.750000 9.999998 7.351018
## 26  6.10 0.000000  5.400000  3.333333  0.000000 0.000000 4.696028
## 27  3.30 0.000000  6.666666  3.333333  6.250000 0.000000 6.666666
## 28  2.90 3.333333  6.666666  3.333333  2.385559 0.000000 3.177568
## 29  9.20 0.000000  9.900000  3.333333  7.609660 0.000000 8.118828
## 30  6.90 0.000000  6.666666  3.333333  4.226033 0.000000 0.000000
## 31  2.90 0.000000  3.333333  3.333333  5.000000 0.000000 3.333333
## 32  2.00 0.000000  0.000000  0.000000  0.000000 0.000000 0.000000
## 33  5.00 0.000000  3.333333  3.333333  5.000000 0.000000 3.333333
## 34  5.00 0.000000  9.999998  3.333333  0.000000 0.000000 3.333333
## 35  4.10 9.999998  4.700000  6.666666  3.750000 0.000000 7.827667
## 36  6.30 9.999998  9.999998  6.666666  6.250000 2.233333 6.666666
## 37  5.20 4.999998  6.600000  3.333333  3.633403 1.100000 3.314128
## 38  5.00 3.333333  6.400000  6.666666  2.844997 0.000000 4.429657
## 39  3.10 4.999998  4.200000  5.000000  3.750000 0.000000 6.164304
## 40  4.10 9.999998  6.666666  3.333333  5.000000 0.000000 4.938089
## 41  5.00 9.999998  6.666666  1.666666  5.000000 0.000000 6.666666
## 42  5.00 7.700000  6.666666  8.399997  6.250000 4.358243 9.999998
## 43  5.00 6.200000  9.999998  6.060997  5.000000 2.782771 6.666666
## 44  5.60 4.900000  0.000000  0.000000  6.555647 4.055463 6.666666
## 45  5.70 4.800000  0.000000  0.000000  6.555647 4.055463 0.000000
## 46  7.50 9.999998  7.900000  6.666666  3.750000 9.999998 7.631891
## 47  2.50 0.000000  6.666666  3.333333  2.500000 0.000000 0.000000
## 48  8.90 9.999998  9.700000  6.666666  5.000000 9.999998 9.556024
## 49  7.60 0.000000 10.000000  0.000000  5.000000 1.100000 6.666666
## 50  7.80 9.999998  6.666666  6.666666  5.000000 3.333333 6.666666
## 51  2.50 0.000000  6.666666  3.333333  5.000000 0.000000 6.666666
## 52  3.80 0.000000  5.100000  0.000000  3.750000 0.000000 6.666666
## 53  5.00 3.333333  3.333333  2.233333  5.000000 3.333333 6.666666
## 54  6.25 3.333333  9.999998  2.955702  6.250000 5.566663 9.999998
## 55  1.25 0.000000  3.333333  0.000000  2.500000 0.000000 0.000000
## 56  1.25 0.000000  4.700000  0.736999  2.500000 0.000000 3.333333
## 57  1.25 0.000000  6.666666  0.000000  2.500000 0.000000 5.228375
## 58  7.50 7.766664  9.999998  6.666666  7.500000 3.333333 9.999998
## 59  2.50 0.000000  6.666666  4.433333  5.000000 0.000000 6.666666
## 60  7.50 9.999998  9.999998 10.000000  8.750000 9.999998 9.999998
## 61  1.25 0.000000  0.000000  0.000000  1.250000 0.000000 0.000000
## 62  1.25 0.000000  0.000000  0.000000  0.000000 0.000000 0.000000
## 63  2.50 0.000000  0.000000  0.000000  0.000000 0.000000 6.666666
## 64  6.25 2.233299  6.666666  2.970332  3.750000 3.333299 6.666666
## 65  7.50 9.999998  9.999998 10.000000  7.500000 9.999998 9.999998
## 66  5.00 0.000000  6.100000  0.000000  5.000000 3.333333 9.999998
## 67  7.50 9.999998  9.999998 10.000000  3.750000 9.999998 9.999998
## 68  4.90 2.233333  9.999998  0.000000  5.000000 0.000000 3.621989
## 69  5.00 0.000000  8.200000  0.000000  5.000000 0.000000 0.000000
## 70  2.90 3.333333  6.666666  3.333333  2.500000 3.333333 6.666666
## 71  5.40 9.999998  6.666666  3.333333  3.750000 6.666666 6.666666
## 72  7.50 8.800000  9.999998  6.066666  7.500000 6.666666 9.999998
## 73  7.50 7.000000  9.999998  6.852998  7.500000 6.348340 6.666666
## 74 10.00 6.666666  9.999998 10.000000 10.000000 6.666666 9.999998
## 75  3.75 3.333333  0.000000  0.000000  1.250000 3.333333 0.000000
##           y8       x1       x2       x3
## 1   3.333333 4.442651 3.637586 2.557615
## 2   0.736999 5.384495 5.062595 3.568079
## 3   8.211809 5.961005 6.255750 5.224433
## 4   4.615086 6.285998 7.567863 6.267495
## 5   6.666666 5.863631 6.818924 4.573679
## 6   0.368500 5.533389 5.135798 3.892270
## 7   1.485166 5.308268 5.075174 3.316213
## 8   6.666666 5.347108 4.852030 4.263183
## 9   3.333333 5.521461 5.241747 4.115168
## 10 10.000000 5.828946 5.370638 4.446216
## 11  6.666666 5.916202 6.423247 3.791545
## 12  6.666666 5.398163 6.246107 4.535708
## 13 10.000000 6.622736 7.872074 4.906154
## 14  0.000000 5.204007 5.225747 4.561047
## 15 10.000000 5.509388 6.202536 4.586286
## 16  6.666666 5.262690 5.820083 3.948911
## 17  0.368500 4.700480 5.023881 4.394491
## 18  3.333333 5.209486 4.465908 4.510268
## 19 10.000000 5.916202 6.732211 5.829084
## 20  2.948164 6.523562 6.992096 6.424591
## 21 10.000000 6.238325 6.746412 5.741711
## 22  0.000000 5.976351 6.712956 5.948168
## 23  3.333333 5.631212 5.937536 5.686755
## 24  7.766666 6.033086 6.093570 4.611429
## 25  6.666666 6.196444 6.704414 5.475261
## 26  3.333333 4.248495 2.708050 1.740830
## 27  3.333333 5.141664 4.564348 2.255134
## 28  1.116666 4.174387 3.688879 3.046927
## 29  3.333333 4.382027 2.890372 1.711279
## 30  0.000000 4.290459 1.609438 1.001674
## 31  3.333333 4.934474 4.234107 1.418971
## 32  0.000000 3.850148 1.945910 2.345229
## 33  3.333333 5.181784 4.394449 3.167167
## 34  0.744370 5.062595 4.595120 3.834970
## 35  6.666666 4.691348 4.143135 2.255134
## 36  2.955702 4.248495 3.367296 3.217506
## 37  3.333333 5.564520 5.236442 2.677633
## 38  1.485166 4.727388 3.610918 1.418971
## 39  3.333333 4.143135 2.302585 1.418971
## 40  2.233333 4.317488 4.955827 4.249888
## 41  0.368500 5.141664 4.430817 3.046927
## 42  4.141377 4.488636 3.465736 2.013579
## 43  4.974739 4.615121 4.941642 2.255134
## 44  3.821796 3.850148 2.397895 1.740830
## 45  0.000000 3.970292 2.397895 1.050741
## 46  6.666666 3.784190 3.091042 2.113313
## 47  0.000000 3.806662 2.079442 2.137561
## 48  6.666666 4.532599 3.610918 1.587802
## 49  1.099999 5.117994 4.934474 3.834970
## 50  6.666666 5.049856 5.111988 4.381490
## 51  3.333333 5.393628 5.638355 4.169451
## 52  1.485166 4.477337 3.931826 2.474671
## 53  5.566663 5.257495 5.840642 5.001796
## 54  6.666666 5.379897 5.505332 3.299937
## 55  0.000000 5.298317 6.274762 4.381490
## 56  3.333333 4.859812 5.669881 3.537416
## 57  0.000000 4.969813 5.564520 4.510268
## 58  6.666666 6.011267 6.253829 5.001796
## 59  1.485166 5.075174 5.252273 5.350708
## 60 10.000000 6.736967 7.125283 6.330518
## 61  0.000000 5.225747 5.451038 3.167167
## 62  0.000000 4.025352 1.791759 2.657972
## 63  2.948164 4.234107 2.708050 2.474671
## 64  3.333333 4.644391 5.564520 3.046927
## 65 10.000000 4.418841 4.941642 3.380653
## 66  3.333333 4.262680 4.219508 4.368462
## 67 10.000000 4.875197 4.700480 3.834970
## 68  3.333333 4.189655 1.386294 1.418971
## 69  0.000000 4.521789 4.127134 2.113313
## 70  3.333333 4.653960 3.555348 1.881917
## 71  1.485166 4.477337 3.091042 1.987909
## 72  6.666666 5.337538 5.631212 3.491004
## 73  7.508044 6.129050 6.403574 5.001796
## 74 10.000000 5.003946 4.962845 3.976994
## 75  0.000000 4.488636 4.897840 2.867566

with the corresponding model structure, which we want to fit:

Firstly, we define the model:

model <- '
  # measurement model
    ind60 =~ x1 + x2 + x3
    dem60 =~ y1 + y2 + y3 + y4
    dem65 =~ y5 + y6 + y7 + y8
  # regressions
    dem60 ~ ind60
    dem65 ~ ind60 + dem60
  # residual correlations
    y1 ~~ y5
    y2 ~~ y4 + y6
    y3 ~~ y7
    y4 ~~ y8
    y6 ~~ y8
'

later, the covariates:

variable ~~ variable

## variable ~ ~variable

and finaly, we can fit the model:

fit <- sem(model, data=PoliticalDemocracy)
summary(fit, standardized=TRUE)

## lavaan (0.5-23.1097) converged normally after  68 iterations
## 
##   Number of observations                            75
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic               38.125
##   Degrees of freedom                                35
##   P-value (Chi-square)                           0.329
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Standard Errors                             Standard
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   ind60 =~                                                              
##     x1                1.000                               0.670    0.920
##     x2                2.180    0.139   15.742    0.000    1.460    0.973
##     x3                1.819    0.152   11.967    0.000    1.218    0.872
##   dem60 =~                                                              
##     y1                1.000                               2.223    0.850
##     y2                1.257    0.182    6.889    0.000    2.794    0.717
##     y3                1.058    0.151    6.987    0.000    2.351    0.722
##     y4                1.265    0.145    8.722    0.000    2.812    0.846
##   dem65 =~                                                              
##     y5                1.000                               2.103    0.808
##     y6                1.186    0.169    7.024    0.000    2.493    0.746
##     y7                1.280    0.160    8.002    0.000    2.691    0.824
##     y8                1.266    0.158    8.007    0.000    2.662    0.828
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   dem60 ~                                                               
##     ind60             1.483    0.399    3.715    0.000    0.447    0.447
##   dem65 ~                                                               
##     ind60             0.572    0.221    2.586    0.010    0.182    0.182
##     dem60             0.837    0.098    8.514    0.000    0.885    0.885
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##  .y1 ~~                                                                 
##    .y5                0.624    0.358    1.741    0.082    0.624    0.296
##  .y2 ~~                                                                 
##    .y4                1.313    0.702    1.871    0.061    1.313    0.273
##    .y6                2.153    0.734    2.934    0.003    2.153    0.356
##  .y3 ~~                                                                 
##    .y7                0.795    0.608    1.308    0.191    0.795    0.191
##  .y4 ~~                                                                 
##    .y8                0.348    0.442    0.787    0.431    0.348    0.109
##  .y6 ~~                                                                 
##    .y8                1.356    0.568    2.386    0.017    1.356    0.338
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .x1                0.082    0.019    4.184    0.000    0.082    0.154
##    .x2                0.120    0.070    1.718    0.086    0.120    0.053
##    .x3                0.467    0.090    5.177    0.000    0.467    0.239
##    .y1                1.891    0.444    4.256    0.000    1.891    0.277
##    .y2                7.373    1.374    5.366    0.000    7.373    0.486
##    .y3                5.067    0.952    5.324    0.000    5.067    0.478
##    .y4                3.148    0.739    4.261    0.000    3.148    0.285
##    .y5                2.351    0.480    4.895    0.000    2.351    0.347
##    .y6                4.954    0.914    5.419    0.000    4.954    0.443
##    .y7                3.431    0.713    4.814    0.000    3.431    0.322
##    .y8                3.254    0.695    4.685    0.000    3.254    0.315
##     ind60             0.448    0.087    5.173    0.000    1.000    1.000
##    .dem60             3.956    0.921    4.295    0.000    0.800    0.800
##    .dem65             0.172    0.215    0.803    0.422    0.039    0.039

Homework Assignment

(Deadline: Lab Session 8 / 25.04.2007)

Use the command below to instal the SMSdata packages with various multivariate datasets:

install.packages(pkgs="http://www.karlin.mff.cuni.cz/~hlavka/sms2/SMSdata_1.0.tar.gz", repos=NULL, type="source")

Chose one dataset of your choise from the list of all available datasets in the package:

data(package = "SMSdata")

There are 21 different datasets and you can load each of them by typing and running data(dataset_name), for instance:

data(plasma)

Consider the same dataset you applied the principal component analysis to and apply the factor analysis now;
Compare the outcomes of both, PCA and FA;
Chose one covariate as the dependent covariate and try to use the regression modelling approach to model the expectation of this covariate using the factors obtained from the factor analysis. Try to find some reasonable interpretation of the final model.