Letný semester 2024 | Cvičenie 9 | 23.04.2024
HTML Markdown – Rmd source file (UTF encoding)
The idea is to consider a general version of the linear regression model where \[ \boldsymbol{Y} \sim N(\mathbb{X}\boldsymbol{\beta}, \sigma^2 \mathbb{W}^{-1}), \] where \(\boldsymbol{Y} = (Y_1, \dots, Y_n)^\top\) is the response vector for \(n \in \mathbb{N}\) independent subjects, \(\mathbb{X}\) is the corresponding model matrix (of a full rank \(p \in \mathbb{N}\)) and \(\boldsymbol{\beta} \in \mathbb{R}^p\) is the vector of the unknown parameters.
Note, that the normality assumption above is only used as a simplification but it is not needed to fit the model (without normality the results and the inference will hold asymptoticaly).
Considering the variance structure, reflected in the matrix \(\mathbb{W} \in \mathbb{R}^{n \times n}\) (a diagonal matrix because the subjets are mutually independent) can can distinguish two cases:
The first case leads to the so-called general linear model (not a generalized linear model, which is a different class of models) and the second case leads to a regression model under heteroscedastic errors. Both cases will be briefly addressed below.
We start with the dataset
The information provided in the data stand for the: \(i\) serial number; \(t\) week of the pregnancy; \(n\) number of measurements used to calculate the average head size; and \(y\) the average head size obtained from the particular number of measurements
library("mffSM")
data(Hlavy, package = "mffSM")
### Exploratory analysis
head(Hlavy)
## i t n y
## 1 1 16 2 5.100
## 2 2 18 3 5.233
## 3 3 19 9 4.744
## 4 4 20 10 5.110
## 5 5 21 11 5.236
## 6 6 22 20 5.740
dim(Hlavy)
## [1] 26 4
summary(Hlavy)
## i t n y
## Min. : 1.00 Min. :16.00 Min. : 2.00 Min. :4.744
## 1st Qu.: 7.25 1st Qu.:23.25 1st Qu.:20.25 1st Qu.:5.871
## Median :13.50 Median :29.50 Median :47.00 Median :7.776
## Mean :13.50 Mean :29.46 Mean :39.85 Mean :7.461
## 3rd Qu.:19.75 3rd Qu.:35.75 3rd Qu.:60.00 3rd Qu.:8.980
## Max. :26.00 Max. :42.00 Max. :85.00 Max. :9.633
plot(y ~ t, data = Hlavy, pch = 23, col = "red3", bg = "orange")
The simple scatterplot above can be, however, substantially improved by accounting also for reliability of each observation (higher number of measurements used to obtain the average also means higher reliability). This can be obtain, for instance, by the following code:
with(Hlavy, quantile(n, prob = seq(0, 1, by = 0.2)))
## 0% 20% 40% 60% 80% 100%
## 2 11 31 53 60 85
Hlavy <- transform(Hlavy, fn = cut(n, breaks = c(0, 10, 30, 50, 60, 90),
labels = c("<=10", "11-30", "31-50", "51-60", ">60")))
with(Hlavy, summary(fn))
## <=10 11-30 31-50 51-60 >60
## 5 5 5 6 5
plotData <- function(){
PAL <- rev(heat_hcl(length(levels(Hlavy[, "fn"])), c.=c(80, 30), l=c(30, 90),
power=c(1/5, 1.3)))
names(PAL) <- levels(Hlavy[, "fn"])
plot(y ~ t, data = Hlavy, pch = 23, col = "black", bg = PAL[fn])
legend(17, 9, legend = levels(Hlavy[, "fn"]), title = "n", fill = PAL)
abline(v = 27, col = "lightblue", lty = 2)
}
plotData()
It is reasonable to assume, that for each measurement we have some model of the form \[ \widetilde{Y}_i | \boldsymbol{X}_i \sim N(\boldsymbol{X}_i^\top\boldsymbol{\beta}, \sigma^2) \] which is a homoscedastic model. Unfortunattely, we do not observe \(\widetilde{Y}_i\) directly, but, instead, there a a set of independent measurements \(\widetilde{Y}_{ij}\) for \(j = 1, \dots, n_i\) (all measured conditionally on \(\boldsymbol{X}_i\)) and we only observe the overall average \(Y_i = \frac{1}{n_i}\sum_{j = 1}^{n_i} \widetilde{Y}_{ij}\).
It also holds, that \[
Y_i \sim N(\boldsymbol{X}_i^\top\boldsymbol{\beta}, \sigma^2/{n_i}),
\] or, considering all data together, we have \[
\boldsymbol{Y} \sim N_n(\mathbb{X}\boldsymbol{\beta}, \sigma^2 \mathbb{W}^{-1}),
\] where \(\mathbb{W}^{-1}\) is a diagonal matrix \(\mathbb{W}^{-1} = diag(1/n_1, \dots, 1/n_n)\) (thus, the variance structure is known from the design of the experiment).
In addition, practical motivation behind the model is the following: Practitioners say that the relationship \(y ~ t\) could be described by a piecewise quadratic function with \(t = 27\) being a point where the quadratic $relationship changes its shape.
From the practical point of view, the fitted curve should be continuous at \(t = 27\) and perhaps also smooth (i.e., with continuous at least the first derivative) as it is biologically not defensible to claim that at \(t = 27\) the growth has a jump in the speed (rate) of the growth.
As far as different observations have different credibility, we will use the general linear model with the known matrix \(\mathbb{W}\). The model can be fitted as follows:
Hlavy <- transform(Hlavy, t27 = t - 27)
### Grid to calculate the fitted regression function
tgrid <- seq(16, 42, length = 100)
pdata <- data.frame(t27 = tgrid - 27)
summary(mCont <- lm(y ~ t27 + I(t27^2) + (t27 > 0):t27 + (t27 > 0):I(t27^2),
weights = n, data = Hlavy))
##
## Call:
## lm(formula = y ~ t27 + I(t27^2) + (t27 > 0):t27 + (t27 > 0):I(t27^2),
## data = Hlavy, weights = n)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -0.88212 -0.35216 0.01232 0.37917 0.85320
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.049749 0.042425 166.170 < 2e-16 ***
## t27 0.381177 0.030288 12.585 3.01e-11 ***
## I(t27^2) 0.016214 0.003943 4.112 0.000497 ***
## t27:t27 > 0TRUE -0.063531 0.039426 -1.611 0.122017
## I(t27^2):t27 > 0TRUE -0.026786 0.003776 -7.094 5.35e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.477 on 21 degrees of freedom
## Multiple R-squared: 0.9968, Adjusted R-squared: 0.9962
## F-statistic: 1652 on 4 and 21 DF, p-value: < 2.2e-16
Can you manually reconstruct the estimated values for the mean structure and, also, the estimated values for the variance parameters?
### Some naive calculation
W <- diag(Hlavy$n);
Y <- Hlavy$y;
X <- model.matrix(mCont)
### Estimated regression coefficients
(betahat <- solve(t(X)%*%W%*%X)%*%(t(X)%*%W%*%Y));
## [,1]
## (Intercept) 7.04974918
## t27 0.38117667
## I(t27^2) 0.01621412
## t27:t27 > 0TRUE -0.06353122
## I(t27^2):t27 > 0TRUE -0.02678567
coef(mCont)
## (Intercept) t27 I(t27^2)
## 7.04974918 0.38117667 0.01621412
## t27:t27 > 0TRUE I(t27^2):t27 > 0TRUE
## -0.06353122 -0.02678567
### Generalized fitted values
(Yg <- X%*%betahat);
## [,1]
## 1 4.818714
## 2 4.932503
## 3 5.038040
## 4 5.176004
## 5 5.346398
## 6 5.549219
## 7 5.784468
## 8 6.052146
## 9 6.352252
## 10 6.684787
## 11 7.049749
## 12 7.356823
## 13 7.642754
## 14 7.907542
## 15 8.151186
## 16 8.373688
## 17 8.575046
## 18 8.755262
## 19 8.914334
## 20 9.052263
## 21 9.169049
## 22 9.264692
## 23 9.339192
## 24 9.392549
## 25 9.424762
## 26 9.435833
fitted(mCont)
## 1 2 3 4 5 6 7 8
## 4.818714 4.932503 5.038040 5.176004 5.346398 5.549219 5.784468 6.052146
## 9 10 11 12 13 14 15 16
## 6.352252 6.684787 7.049749 7.356823 7.642754 7.907542 8.151186 8.373688
## 17 18 19 20 21 22 23 24
## 8.575046 8.755262 8.914334 9.052263 9.169049 9.264692 9.339192 9.392549
## 25 26
## 9.424762 9.435833
### Generalized residual sum of squares
(RSS <- t(Y - Yg)%*%W%*%(Y - Yg));
## [,1]
## [1,] 4.777559
### Generalized mean square
RSS/(length(Y)-ncol(X))
## [,1]
## [1,] 0.2275028
(summary(mCont)$sigma)^2
## [1] 0.2275028
### Be careful that Multiple R-squared: 0.9968 reported in the output
### "summary(mCont)" is not equal to
sum((Yg-mean(Yg))^2)/sum((Y-mean(Y))^2)
## [1] 1.005601
### Instead, it equals the following quantity that is more difficult
### to interpret
barYw <- sum(Y*Hlavy$n)/sum(Hlavy$n);
### weigthed mean of the original observations
### something as "generalized" regression sum of squares
SSR <- sum((Yg - barYw)^2 * Hlavy$n)
### something as "generalized" total variance
SST <- sum((Y - barYw)^2 * Hlavy$n);
### "generalized" R squared
SSR/SST
## [1] 0.9968318
summary(mCont)$r.squared
## [1] 0.9968318
Compare the general linear model above with the simple model below. What are the main differences between these two models? Can you explain them?
iii <- rep(1:nrow(Hlavy), Hlavy$n)
Hlavy2 <- Hlavy[iii,]; # new dataset
summary(mCont <- lm(y ~ t27 + I(t27^2) + (t27 > 0):t27 + (t27 > 0):I(t27^2), data = Hlavy2))
##
## Call:
## lm(formula = y ~ t27 + I(t27^2) + (t27 > 0):t27 + (t27 > 0):I(t27^2),
## data = Hlavy2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.29404 -0.04505 -0.01469 0.04125 0.30050
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.0497492 0.0060548 1164.32 <2e-16 ***
## t27 0.3811767 0.0043227 88.18 <2e-16 ***
## I(t27^2) 0.0162141 0.0005628 28.81 <2e-16 ***
## t27:t27 > 0TRUE -0.0635312 0.0056268 -11.29 <2e-16 ***
## I(t27^2):t27 > 0TRUE -0.0267857 0.0005389 -49.70 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.06807 on 1031 degrees of freedom
## Multiple R-squared: 0.9968, Adjusted R-squared: 0.9968
## F-statistic: 8.11e+04 on 4 and 1031 DF, p-value: < 2.2e-16
For a general heteroscedastic model it is usually assumed that
\[
Y_i | \boldsymbol{X}_i \sim N(\boldsymbol{X}_i^\top\boldsymbol{\beta}, \sigma^2(\boldsymbol{X}_i))
\] where \(\sigma^2(\cdot)\) is some positive (unknown) function of the response variables \(\boldsymbol{X}_i \in \mathbb{R}^p\). Note, that the normality assumption is here only given for simplification but it is not required.
In the following, we will consider the datasat that comes from the project MANA (Maturita na necisto) launched in the Czech Republic in 2005. This project was a part of a bigger project whose goal was to prepare a new form of graduation exam (“statni maturita”).
This dataset contains the results in English language at grammar schools.
load(url("http://msekce.karlin.mff.cuni.cz/~maciak/NMSA407/mana.RData"))
The variables in the data are: region
(1 = Praha; 2 = Stredocesky; 3 = Jihocesky;4 = Plzensky; =5 = Karlovarsky; 6 = Ustecky; 7 = Liberecky; 8 = Kralovehradecky; 9 = Pardubicky; 10 = Vysocina; 11 = Jihomoravsky; 12 = Olomoucky; 13 = Zlinsky; 14 = Moravskoslezsky), specialization
(1 = education; 2 = social science; 3 = languages; 4 = law; 5 = math-physics; 6 = engineering; 7 = science; 8 = medicine; 9 = economics; 10 = agriculture; 11 = art), gender
(0 = female; 1 = male), graduation
(0 = no; 1 = yes), entry.exam
(0 = no; 1 = yes), score
(score in English language in MANA test), avg.mark
(aaverage mark (grade) from all subjects at the last report card), and mark.eng
(average mark (grade) from the English language at the last 5 report cards).
For better intuition, we can transform the data as follows:
mana <- transform(mana, fregion = factor(region, levels = 1:14,
labels = c("A", "S", "C", "P", "K", "U", "L", "H", "E",
"J", "B", "M", "Z", "T")),
fspec = factor(specialization, levels = 1:11,
labels = c("Educ", "Social", "Lang",
"Law", "Mat-Phys", "Engin",
"Science", "Medic",
"Econom", "Agric", "Art")),
fgender = factor(gender, levels = 0:1,
labels = c("Female", "Male")),
fgrad = factor(graduation, levels = 0:1,
labels = c("No", "Yes")),
fentry = factor(entry.exam, levels = 0:1,
labels = c("No", "Yes")))
summary(mana)
## region specialization gender graduation
## Min. : 1.000 Min. : 1.000 Min. :0.000 Min. :0.0000
## 1st Qu.: 3.000 1st Qu.: 2.000 1st Qu.:0.000 1st Qu.:1.0000
## Median : 7.000 Median : 6.000 Median :0.000 Median :1.0000
## Mean : 7.462 Mean : 5.496 Mean :0.415 Mean :0.9585
## 3rd Qu.:11.000 3rd Qu.: 8.000 3rd Qu.:1.000 3rd Qu.:1.0000
## Max. :14.000 Max. :11.000 Max. :1.000 Max. :1.0000
##
## entry.exam score avg.mark mark.eng
## Min. :0.0000 Min. : 5.00 Min. :1.000 Min. :1.000
## 1st Qu.:0.0000 1st Qu.:40.00 1st Qu.:1.480 1st Qu.:1.400
## Median :0.0000 Median :44.00 Median :1.640 Median :2.000
## Mean :0.4536 Mean :43.02 Mean :1.645 Mean :2.176
## 3rd Qu.:1.0000 3rd Qu.:48.00 3rd Qu.:1.810 3rd Qu.:2.800
## Max. :1.0000 Max. :52.00 Max. :2.470 Max. :4.200
##
## fregion fspec fgender fgrad fentry
## T : 923 Social : 855 Female:3044 No : 216 No :2843
## S : 665 Econom : 846 Male :2159 Yes:4987 Yes:2360
## B : 634 Engin : 638
## A : 599 Medic : 564
## L : 534 Science: 557
## C : 474 Educ : 513
## (Other):1374 (Other):1230
For illustration purposes we will fit a simple (additive) model to explain the average mark given the region and the specialization.
mAddit <- lm(avg.mark ~ fregion + fspec, data = mana, x = T)
summary(mAddit)
##
## Call:
## lm(formula = avg.mark ~ fregion + fspec, data = mana, x = T)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.68845 -0.16441 -0.00517 0.15618 0.84024
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.77616 0.01432 124.065 < 2e-16 ***
## fregionS 0.00887 0.01334 0.665 0.5063
## fregionC -0.01336 0.01455 -0.918 0.3585
## fregionP 0.02008 0.02197 0.914 0.3607
## fregionK -0.02198 0.02098 -1.048 0.2948
## fregionU 0.05678 0.02049 2.772 0.0056 **
## fregionL 0.02388 0.01411 1.692 0.0907 .
## fregionH 0.04861 0.02104 2.310 0.0209 *
## fregionE 0.07980 0.02049 3.894 9.98e-05 ***
## fregionJ -0.03138 0.01942 -1.616 0.1062
## fregionB 0.03236 0.01348 2.400 0.0164 *
## fregionM 0.02588 0.02159 1.199 0.2306
## fregionZ 0.01653 0.01880 0.879 0.3794
## fregionT 0.00846 0.01250 0.677 0.4984
## fspecSocial -0.15080 0.01325 -11.382 < 2e-16 ***
## fspecLang -0.26387 0.01700 -15.520 < 2e-16 ***
## fspecLaw -0.15020 0.01534 -9.789 < 2e-16 ***
## fspecMat-Phys -0.22192 0.01926 -11.525 < 2e-16 ***
## fspecEngin -0.17752 0.01403 -12.655 < 2e-16 ***
## fspecScience -0.10411 0.01448 -7.191 7.36e-13 ***
## fspecMedic -0.14369 0.01446 -9.938 < 2e-16 ***
## fspecEconom -0.16751 0.01330 -12.591 < 2e-16 ***
## fspecAgric -0.07115 0.03120 -2.281 0.0226 *
## fspecArt -0.15527 0.02030 -7.650 2.38e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.236 on 5179 degrees of freedom
## Multiple R-squared: 0.07124, Adjusted R-squared: 0.06712
## F-statistic: 17.27 on 23 and 5179 DF, p-value: < 2.2e-16
modelMatrix <- mAddit$x
plotLM(mAddit)
In the diagnostic plots, there are some obvious issues with heteroscedasticity. We will use the sandwich estimate of the variance matrix to account for this heteroscedasticty.
library(sandwich)
### Estimate of the variance matrix of the regression coefficients
(VHC <- vcovHC(mAddit, type = "HC3"))
## (Intercept) fregionS fregionC fregionP
## (Intercept) 0.0002035881 -1.045279e-04 -1.065979e-04 -1.052227e-04
## fregionS -0.0001045279 1.942935e-04 1.020741e-04 1.023074e-04
## fregionC -0.0001065979 1.020741e-04 2.077926e-04 1.022329e-04
## fregionP -0.0001052227 1.023074e-04 1.022329e-04 4.862487e-04
## fregionK -0.0001034781 1.011661e-04 1.006027e-04 1.010982e-04
## fregionU -0.0001151940 1.022038e-04 1.021223e-04 1.022390e-04
## fregionL -0.0001054516 1.022997e-04 1.022098e-04 1.026960e-04
## fregionH -0.0001069183 1.017729e-04 1.019779e-04 1.022026e-04
## fregionE -0.0001119843 1.034123e-04 1.036049e-04 1.040351e-04
## fregionJ -0.0001020705 1.011792e-04 1.015031e-04 1.019322e-04
## fregionB -0.0001056148 1.015463e-04 1.015702e-04 1.016549e-04
## fregionM -0.0001238900 1.022109e-04 1.023576e-04 1.023937e-04
## fregionZ -0.0001021461 1.021408e-04 1.021596e-04 1.025831e-04
## fregionT -0.0001119894 1.021415e-04 1.022646e-04 1.022399e-04
## fspecSocial -0.0001095004 5.186032e-06 4.850734e-06 5.542872e-06
## fspecLang -0.0001018653 -4.794218e-06 -1.832688e-08 -3.318818e-06
## fspecLaw -0.0001158160 1.136511e-05 1.754327e-05 1.720455e-05
## fspecMat-Phys -0.0001075772 -1.297224e-07 5.170287e-06 -1.690925e-07
## fspecEngin -0.0001039184 3.516317e-07 2.732096e-06 -6.940539e-06
## fspecScience -0.0001035725 -2.615603e-07 7.836906e-07 5.838505e-06
## fspecMedic -0.0001053234 -1.747645e-06 7.730410e-07 -7.512734e-07
## fspecEconom -0.0001117251 5.388386e-06 7.172107e-06 4.979715e-06
## fspecAgric -0.0001261814 2.243536e-05 2.673643e-05 2.654674e-05
## fspecArt -0.0001120803 5.012924e-06 5.279016e-06 1.006513e-06
## fregionK fregionU fregionL fregionH
## (Intercept) -1.034781e-04 -1.151940e-04 -1.054516e-04 -1.069183e-04
## fregionS 1.011661e-04 1.022038e-04 1.022997e-04 1.017729e-04
## fregionC 1.006027e-04 1.021223e-04 1.022098e-04 1.019779e-04
## fregionP 1.010982e-04 1.022390e-04 1.026960e-04 1.022026e-04
## fregionK 4.261994e-04 1.006884e-04 1.008589e-04 1.004968e-04
## fregionU 1.006884e-04 4.576749e-04 1.024748e-04 1.026034e-04
## fregionL 1.008589e-04 1.024748e-04 1.988494e-04 1.020482e-04
## fregionH 1.004968e-04 1.026034e-04 1.020482e-04 4.092964e-04
## fregionE 1.014683e-04 1.040431e-04 1.037727e-04 1.032234e-04
## fregionJ 1.018529e-04 1.014398e-04 1.018628e-04 1.006620e-04
## fregionB 1.004206e-04 1.020086e-04 1.018307e-04 1.018632e-04
## fregionM 1.016880e-04 1.042711e-04 1.025255e-04 1.021591e-04
## fregionZ 1.015116e-04 1.016879e-04 1.025097e-04 1.020868e-04
## fregionT 1.005988e-04 1.031143e-04 1.022234e-04 1.022581e-04
## fspecSocial 6.123929e-06 1.533598e-05 8.484251e-06 7.011846e-06
## fspecLang -6.607376e-06 7.452497e-06 -7.832722e-06 -3.764169e-06
## fspecLaw 8.453615e-07 2.295827e-05 1.590225e-05 1.080191e-05
## fspecMat-Phys 4.253874e-06 1.057390e-05 -6.122380e-07 1.458788e-05
## fspecEngin 4.393692e-06 5.525088e-06 -1.484394e-06 -9.184423e-08
## fspecScience 4.344921e-07 1.130438e-05 -2.864310e-07 4.852612e-06
## fspecMedic 7.851661e-07 1.821728e-05 2.995977e-07 5.228142e-06
## fspecEconom 8.542314e-06 1.594485e-05 5.500519e-06 5.011322e-06
## fspecAgric 2.221538e-05 2.828049e-05 1.965951e-05 3.473386e-05
## fspecArt -1.858998e-05 2.214965e-05 3.824366e-06 1.167659e-05
## fregionE fregionJ fregionB fregionM
## (Intercept) -1.119843e-04 -1.020705e-04 -1.056148e-04 -1.238900e-04
## fregionS 1.034123e-04 1.011792e-04 1.015463e-04 1.022109e-04
## fregionC 1.036049e-04 1.015031e-04 1.015702e-04 1.023576e-04
## fregionP 1.040351e-04 1.019322e-04 1.016549e-04 1.023937e-04
## fregionK 1.014683e-04 1.018529e-04 1.004206e-04 1.016880e-04
## fregionU 1.040431e-04 1.014398e-04 1.020086e-04 1.042711e-04
## fregionL 1.037727e-04 1.018628e-04 1.018307e-04 1.025255e-04
## fregionH 1.032234e-04 1.006620e-04 1.018632e-04 1.021591e-04
## fregionE 4.707723e-04 1.025186e-04 1.029284e-04 1.054622e-04
## fregionJ 1.025186e-04 3.853793e-04 1.009154e-04 1.013285e-04
## fregionB 1.029284e-04 1.009154e-04 1.946619e-04 1.021131e-04
## fregionM 1.054622e-04 1.013285e-04 1.021131e-04 5.880047e-04
## fregionZ 1.034574e-04 1.016605e-04 1.016424e-04 1.012773e-04
## fregionT 1.048254e-04 1.008371e-04 1.020891e-04 1.033797e-04
## fspecSocial 9.663411e-06 7.831133e-07 4.647422e-06 2.423658e-05
## fspecLang 4.964611e-07 -7.758178e-06 -4.279974e-06 1.646022e-05
## fspecLaw 2.230938e-05 6.582371e-06 1.292575e-05 3.297904e-05
## fspecMat-Phys 1.127995e-05 -9.293771e-06 8.187614e-06 2.023267e-05
## fspecEngin -3.150392e-06 8.207013e-07 7.844985e-07 1.964833e-05
## fspecScience -9.814639e-07 3.705295e-06 5.278403e-07 1.788432e-05
## fspecMedic 5.092648e-06 -2.162083e-06 2.990129e-06 2.586856e-05
## fspecEconom 2.132032e-05 5.159279e-06 6.527332e-06 2.983524e-05
## fspecAgric 3.732227e-05 1.594359e-05 1.920988e-05 3.202834e-05
## fspecArt 1.461463e-05 -1.306452e-05 1.087469e-05 7.813656e-06
## fregionZ fregionT fspecSocial fspecLang
## (Intercept) -1.021461e-04 -1.119894e-04 -1.095004e-04 -1.018653e-04
## fregionS 1.021408e-04 1.021415e-04 5.186032e-06 -4.794218e-06
## fregionC 1.021596e-04 1.022646e-04 4.850734e-06 -1.832688e-08
## fregionP 1.025831e-04 1.022399e-04 5.542872e-06 -3.318818e-06
## fregionK 1.015116e-04 1.005988e-04 6.123929e-06 -6.607376e-06
## fregionU 1.016879e-04 1.031143e-04 1.533598e-05 7.452497e-06
## fregionL 1.025097e-04 1.022234e-04 8.484251e-06 -7.832722e-06
## fregionH 1.020868e-04 1.022581e-04 7.011846e-06 -3.764169e-06
## fregionE 1.034574e-04 1.048254e-04 9.663411e-06 4.964611e-07
## fregionJ 1.016605e-04 1.008371e-04 7.831133e-07 -7.758178e-06
## fregionB 1.016424e-04 1.020891e-04 4.647422e-06 -4.279974e-06
## fregionM 1.012773e-04 1.033797e-04 2.423658e-05 1.646022e-05
## fregionZ 3.365638e-04 1.019333e-04 5.623728e-06 -1.101705e-05
## fregionT 1.019333e-04 1.590440e-04 1.156647e-05 7.722615e-06
## fspecSocial 5.623728e-06 1.156647e-05 1.657535e-04 1.027061e-04
## fspecLang -1.101705e-05 7.722615e-06 1.027061e-04 2.849076e-04
## fspecLaw 7.272372e-06 1.555445e-05 1.033034e-04 1.025057e-04
## fspecMat-Phys -2.082997e-06 1.186696e-05 1.026083e-04 1.034841e-04
## fspecEngin -2.203131e-06 5.951689e-06 1.020894e-04 1.028708e-04
## fspecScience -2.909723e-06 2.070255e-06 1.020857e-04 1.021834e-04
## fspecMedic -5.812652e-06 6.077983e-06 1.027525e-04 1.032205e-04
## fspecEconom 1.287623e-06 1.600694e-05 1.034019e-04 1.036670e-04
## fspecAgric 2.868416e-05 2.981068e-05 1.040363e-04 1.028865e-04
## fspecArt -1.308125e-06 2.532588e-05 1.039312e-04 1.050192e-04
## fspecLaw fspecMat-Phys fspecEngin fspecScience
## (Intercept) -1.158160e-04 -1.075772e-04 -1.039184e-04 -1.035725e-04
## fregionS 1.136511e-05 -1.297224e-07 3.516317e-07 -2.615603e-07
## fregionC 1.754327e-05 5.170287e-06 2.732096e-06 7.836906e-07
## fregionP 1.720455e-05 -1.690925e-07 -6.940539e-06 5.838505e-06
## fregionK 8.453615e-07 4.253874e-06 4.393692e-06 4.344921e-07
## fregionU 2.295827e-05 1.057390e-05 5.525088e-06 1.130438e-05
## fregionL 1.590225e-05 -6.122380e-07 -1.484394e-06 -2.864310e-07
## fregionH 1.080191e-05 1.458788e-05 -9.184423e-08 4.852612e-06
## fregionE 2.230938e-05 1.127995e-05 -3.150392e-06 -9.814639e-07
## fregionJ 6.582371e-06 -9.293771e-06 8.207013e-07 3.705295e-06
## fregionB 1.292575e-05 8.187614e-06 7.844985e-07 5.278403e-07
## fregionM 3.297904e-05 2.023267e-05 1.964833e-05 1.788432e-05
## fregionZ 7.272372e-06 -2.082997e-06 -2.203131e-06 -2.909723e-06
## fregionT 1.555445e-05 1.186696e-05 5.951689e-06 2.070255e-06
## fspecSocial 1.033034e-04 1.026083e-04 1.020894e-04 1.020857e-04
## fspecLang 1.025057e-04 1.034841e-04 1.028708e-04 1.021834e-04
## fspecLaw 2.336240e-04 1.030371e-04 1.019179e-04 1.023498e-04
## fspecMat-Phys 1.030371e-04 3.565494e-04 1.023021e-04 1.018275e-04
## fspecEngin 1.019179e-04 1.023021e-04 2.016054e-04 1.015482e-04
## fspecScience 1.023498e-04 1.018275e-04 1.015482e-04 2.065269e-04
## fspecMedic 1.030240e-04 1.032130e-04 1.025095e-04 1.025562e-04
## fspecEconom 1.039564e-04 1.035211e-04 1.027397e-04 1.024010e-04
## fspecAgric 1.051990e-04 1.040546e-04 1.024381e-04 1.022359e-04
## fspecArt 1.049214e-04 1.056042e-04 1.029795e-04 1.020804e-04
## fspecMedic fspecEconom fspecAgric fspecArt
## (Intercept) -1.053234e-04 -1.117251e-04 -1.261814e-04 -1.120803e-04
## fregionS -1.747645e-06 5.388386e-06 2.243536e-05 5.012924e-06
## fregionC 7.730410e-07 7.172107e-06 2.673643e-05 5.279016e-06
## fregionP -7.512734e-07 4.979715e-06 2.654674e-05 1.006513e-06
## fregionK 7.851661e-07 8.542314e-06 2.221538e-05 -1.858998e-05
## fregionU 1.821728e-05 1.594485e-05 2.828049e-05 2.214965e-05
## fregionL 2.995977e-07 5.500519e-06 1.965951e-05 3.824366e-06
## fregionH 5.228142e-06 5.011322e-06 3.473386e-05 1.167659e-05
## fregionE 5.092648e-06 2.132032e-05 3.732227e-05 1.461463e-05
## fregionJ -2.162083e-06 5.159279e-06 1.594359e-05 -1.306452e-05
## fregionB 2.990129e-06 6.527332e-06 1.920988e-05 1.087469e-05
## fregionM 2.586856e-05 2.983524e-05 3.202834e-05 7.813656e-06
## fregionZ -5.812652e-06 1.287623e-06 2.868416e-05 -1.308125e-06
## fregionT 6.077983e-06 1.600694e-05 2.981068e-05 2.532588e-05
## fspecSocial 1.027525e-04 1.034019e-04 1.040363e-04 1.039312e-04
## fspecLang 1.032205e-04 1.036670e-04 1.028865e-04 1.050192e-04
## fspecLaw 1.030240e-04 1.039564e-04 1.051990e-04 1.049214e-04
## fspecMat-Phys 1.032130e-04 1.035211e-04 1.040546e-04 1.056042e-04
## fspecEngin 1.025095e-04 1.027397e-04 1.024381e-04 1.029795e-04
## fspecScience 1.025562e-04 1.024010e-04 1.022359e-04 1.020804e-04
## fspecMedic 1.997466e-04 1.034978e-04 1.030147e-04 1.042026e-04
## fspecEconom 1.034978e-04 1.644054e-04 1.049520e-04 1.050798e-04
## fspecAgric 1.030147e-04 1.049520e-04 7.368809e-04 1.056533e-04
## fspecArt 1.042026e-04 1.050798e-04 1.056533e-04 4.969342e-04
And we can compare the result with manually recontructed estimate
X <- model.matrix(mAddit);
Bread <- solve(t(X) %*% X) %*% t(X)
Meat <- diag(residuals(mAddit)^2 / (1 - hatvalues(mAddit))^2)
Bread %*% Meat %*% t(Bread)
## (Intercept) fregionS fregionC fregionP
## (Intercept) 0.0002035881 -1.045279e-04 -1.065979e-04 -1.052227e-04
## fregionS -0.0001045279 1.942935e-04 1.020741e-04 1.023074e-04
## fregionC -0.0001065979 1.020741e-04 2.077926e-04 1.022329e-04
## fregionP -0.0001052227 1.023074e-04 1.022329e-04 4.862487e-04
## fregionK -0.0001034781 1.011661e-04 1.006027e-04 1.010982e-04
## fregionU -0.0001151940 1.022038e-04 1.021223e-04 1.022390e-04
## fregionL -0.0001054516 1.022997e-04 1.022098e-04 1.026960e-04
## fregionH -0.0001069183 1.017729e-04 1.019779e-04 1.022026e-04
## fregionE -0.0001119843 1.034123e-04 1.036049e-04 1.040351e-04
## fregionJ -0.0001020705 1.011792e-04 1.015031e-04 1.019322e-04
## fregionB -0.0001056148 1.015463e-04 1.015702e-04 1.016549e-04
## fregionM -0.0001238900 1.022109e-04 1.023576e-04 1.023937e-04
## fregionZ -0.0001021461 1.021408e-04 1.021596e-04 1.025831e-04
## fregionT -0.0001119894 1.021415e-04 1.022646e-04 1.022399e-04
## fspecSocial -0.0001095004 5.186032e-06 4.850734e-06 5.542872e-06
## fspecLang -0.0001018653 -4.794218e-06 -1.832688e-08 -3.318818e-06
## fspecLaw -0.0001158160 1.136511e-05 1.754327e-05 1.720455e-05
## fspecMat-Phys -0.0001075772 -1.297224e-07 5.170287e-06 -1.690925e-07
## fspecEngin -0.0001039184 3.516317e-07 2.732096e-06 -6.940539e-06
## fspecScience -0.0001035725 -2.615603e-07 7.836906e-07 5.838505e-06
## fspecMedic -0.0001053234 -1.747645e-06 7.730410e-07 -7.512734e-07
## fspecEconom -0.0001117251 5.388386e-06 7.172107e-06 4.979715e-06
## fspecAgric -0.0001261814 2.243536e-05 2.673643e-05 2.654674e-05
## fspecArt -0.0001120803 5.012924e-06 5.279016e-06 1.006513e-06
## fregionK fregionU fregionL fregionH
## (Intercept) -1.034781e-04 -1.151940e-04 -1.054516e-04 -1.069183e-04
## fregionS 1.011661e-04 1.022038e-04 1.022997e-04 1.017729e-04
## fregionC 1.006027e-04 1.021223e-04 1.022098e-04 1.019779e-04
## fregionP 1.010982e-04 1.022390e-04 1.026960e-04 1.022026e-04
## fregionK 4.261994e-04 1.006884e-04 1.008589e-04 1.004968e-04
## fregionU 1.006884e-04 4.576749e-04 1.024748e-04 1.026034e-04
## fregionL 1.008589e-04 1.024748e-04 1.988494e-04 1.020482e-04
## fregionH 1.004968e-04 1.026034e-04 1.020482e-04 4.092964e-04
## fregionE 1.014683e-04 1.040431e-04 1.037727e-04 1.032234e-04
## fregionJ 1.018529e-04 1.014398e-04 1.018628e-04 1.006620e-04
## fregionB 1.004206e-04 1.020086e-04 1.018307e-04 1.018632e-04
## fregionM 1.016880e-04 1.042711e-04 1.025255e-04 1.021591e-04
## fregionZ 1.015116e-04 1.016879e-04 1.025097e-04 1.020868e-04
## fregionT 1.005988e-04 1.031143e-04 1.022234e-04 1.022581e-04
## fspecSocial 6.123929e-06 1.533598e-05 8.484251e-06 7.011846e-06
## fspecLang -6.607376e-06 7.452497e-06 -7.832722e-06 -3.764169e-06
## fspecLaw 8.453615e-07 2.295827e-05 1.590225e-05 1.080191e-05
## fspecMat-Phys 4.253874e-06 1.057390e-05 -6.122380e-07 1.458788e-05
## fspecEngin 4.393692e-06 5.525088e-06 -1.484394e-06 -9.184423e-08
## fspecScience 4.344921e-07 1.130438e-05 -2.864310e-07 4.852612e-06
## fspecMedic 7.851661e-07 1.821728e-05 2.995977e-07 5.228142e-06
## fspecEconom 8.542314e-06 1.594485e-05 5.500519e-06 5.011322e-06
## fspecAgric 2.221538e-05 2.828049e-05 1.965951e-05 3.473386e-05
## fspecArt -1.858998e-05 2.214965e-05 3.824366e-06 1.167659e-05
## fregionE fregionJ fregionB fregionM
## (Intercept) -1.119843e-04 -1.020705e-04 -1.056148e-04 -1.238900e-04
## fregionS 1.034123e-04 1.011792e-04 1.015463e-04 1.022109e-04
## fregionC 1.036049e-04 1.015031e-04 1.015702e-04 1.023576e-04
## fregionP 1.040351e-04 1.019322e-04 1.016549e-04 1.023937e-04
## fregionK 1.014683e-04 1.018529e-04 1.004206e-04 1.016880e-04
## fregionU 1.040431e-04 1.014398e-04 1.020086e-04 1.042711e-04
## fregionL 1.037727e-04 1.018628e-04 1.018307e-04 1.025255e-04
## fregionH 1.032234e-04 1.006620e-04 1.018632e-04 1.021591e-04
## fregionE 4.707723e-04 1.025186e-04 1.029284e-04 1.054622e-04
## fregionJ 1.025186e-04 3.853793e-04 1.009154e-04 1.013285e-04
## fregionB 1.029284e-04 1.009154e-04 1.946619e-04 1.021131e-04
## fregionM 1.054622e-04 1.013285e-04 1.021131e-04 5.880047e-04
## fregionZ 1.034574e-04 1.016605e-04 1.016424e-04 1.012773e-04
## fregionT 1.048254e-04 1.008371e-04 1.020891e-04 1.033797e-04
## fspecSocial 9.663411e-06 7.831133e-07 4.647422e-06 2.423658e-05
## fspecLang 4.964611e-07 -7.758178e-06 -4.279974e-06 1.646022e-05
## fspecLaw 2.230938e-05 6.582371e-06 1.292575e-05 3.297904e-05
## fspecMat-Phys 1.127995e-05 -9.293771e-06 8.187614e-06 2.023267e-05
## fspecEngin -3.150392e-06 8.207013e-07 7.844985e-07 1.964833e-05
## fspecScience -9.814639e-07 3.705295e-06 5.278403e-07 1.788432e-05
## fspecMedic 5.092648e-06 -2.162083e-06 2.990129e-06 2.586856e-05
## fspecEconom 2.132032e-05 5.159279e-06 6.527332e-06 2.983524e-05
## fspecAgric 3.732227e-05 1.594359e-05 1.920988e-05 3.202834e-05
## fspecArt 1.461463e-05 -1.306452e-05 1.087469e-05 7.813656e-06
## fregionZ fregionT fspecSocial fspecLang
## (Intercept) -1.021461e-04 -1.119894e-04 -1.095004e-04 -1.018653e-04
## fregionS 1.021408e-04 1.021415e-04 5.186032e-06 -4.794218e-06
## fregionC 1.021596e-04 1.022646e-04 4.850734e-06 -1.832688e-08
## fregionP 1.025831e-04 1.022399e-04 5.542872e-06 -3.318818e-06
## fregionK 1.015116e-04 1.005988e-04 6.123929e-06 -6.607376e-06
## fregionU 1.016879e-04 1.031143e-04 1.533598e-05 7.452497e-06
## fregionL 1.025097e-04 1.022234e-04 8.484251e-06 -7.832722e-06
## fregionH 1.020868e-04 1.022581e-04 7.011846e-06 -3.764169e-06
## fregionE 1.034574e-04 1.048254e-04 9.663411e-06 4.964611e-07
## fregionJ 1.016605e-04 1.008371e-04 7.831133e-07 -7.758178e-06
## fregionB 1.016424e-04 1.020891e-04 4.647422e-06 -4.279974e-06
## fregionM 1.012773e-04 1.033797e-04 2.423658e-05 1.646022e-05
## fregionZ 3.365638e-04 1.019333e-04 5.623728e-06 -1.101705e-05
## fregionT 1.019333e-04 1.590440e-04 1.156647e-05 7.722615e-06
## fspecSocial 5.623728e-06 1.156647e-05 1.657535e-04 1.027061e-04
## fspecLang -1.101705e-05 7.722615e-06 1.027061e-04 2.849076e-04
## fspecLaw 7.272372e-06 1.555445e-05 1.033034e-04 1.025057e-04
## fspecMat-Phys -2.082997e-06 1.186696e-05 1.026083e-04 1.034841e-04
## fspecEngin -2.203131e-06 5.951689e-06 1.020894e-04 1.028708e-04
## fspecScience -2.909723e-06 2.070255e-06 1.020857e-04 1.021834e-04
## fspecMedic -5.812652e-06 6.077983e-06 1.027525e-04 1.032205e-04
## fspecEconom 1.287623e-06 1.600694e-05 1.034019e-04 1.036670e-04
## fspecAgric 2.868416e-05 2.981068e-05 1.040363e-04 1.028865e-04
## fspecArt -1.308125e-06 2.532588e-05 1.039312e-04 1.050192e-04
## fspecLaw fspecMat-Phys fspecEngin fspecScience
## (Intercept) -1.158160e-04 -1.075772e-04 -1.039184e-04 -1.035725e-04
## fregionS 1.136511e-05 -1.297224e-07 3.516317e-07 -2.615603e-07
## fregionC 1.754327e-05 5.170287e-06 2.732096e-06 7.836906e-07
## fregionP 1.720455e-05 -1.690925e-07 -6.940539e-06 5.838505e-06
## fregionK 8.453615e-07 4.253874e-06 4.393692e-06 4.344921e-07
## fregionU 2.295827e-05 1.057390e-05 5.525088e-06 1.130438e-05
## fregionL 1.590225e-05 -6.122380e-07 -1.484394e-06 -2.864310e-07
## fregionH 1.080191e-05 1.458788e-05 -9.184423e-08 4.852612e-06
## fregionE 2.230938e-05 1.127995e-05 -3.150392e-06 -9.814639e-07
## fregionJ 6.582371e-06 -9.293771e-06 8.207013e-07 3.705295e-06
## fregionB 1.292575e-05 8.187614e-06 7.844985e-07 5.278403e-07
## fregionM 3.297904e-05 2.023267e-05 1.964833e-05 1.788432e-05
## fregionZ 7.272372e-06 -2.082997e-06 -2.203131e-06 -2.909723e-06
## fregionT 1.555445e-05 1.186696e-05 5.951689e-06 2.070255e-06
## fspecSocial 1.033034e-04 1.026083e-04 1.020894e-04 1.020857e-04
## fspecLang 1.025057e-04 1.034841e-04 1.028708e-04 1.021834e-04
## fspecLaw 2.336240e-04 1.030371e-04 1.019179e-04 1.023498e-04
## fspecMat-Phys 1.030371e-04 3.565494e-04 1.023021e-04 1.018275e-04
## fspecEngin 1.019179e-04 1.023021e-04 2.016054e-04 1.015482e-04
## fspecScience 1.023498e-04 1.018275e-04 1.015482e-04 2.065269e-04
## fspecMedic 1.030240e-04 1.032130e-04 1.025095e-04 1.025562e-04
## fspecEconom 1.039564e-04 1.035211e-04 1.027397e-04 1.024010e-04
## fspecAgric 1.051990e-04 1.040546e-04 1.024381e-04 1.022359e-04
## fspecArt 1.049214e-04 1.056042e-04 1.029795e-04 1.020804e-04
## fspecMedic fspecEconom fspecAgric fspecArt
## (Intercept) -1.053234e-04 -1.117251e-04 -1.261814e-04 -1.120803e-04
## fregionS -1.747645e-06 5.388386e-06 2.243536e-05 5.012924e-06
## fregionC 7.730410e-07 7.172107e-06 2.673643e-05 5.279016e-06
## fregionP -7.512734e-07 4.979715e-06 2.654674e-05 1.006513e-06
## fregionK 7.851661e-07 8.542314e-06 2.221538e-05 -1.858998e-05
## fregionU 1.821728e-05 1.594485e-05 2.828049e-05 2.214965e-05
## fregionL 2.995977e-07 5.500519e-06 1.965951e-05 3.824366e-06
## fregionH 5.228142e-06 5.011322e-06 3.473386e-05 1.167659e-05
## fregionE 5.092648e-06 2.132032e-05 3.732227e-05 1.461463e-05
## fregionJ -2.162083e-06 5.159279e-06 1.594359e-05 -1.306452e-05
## fregionB 2.990129e-06 6.527332e-06 1.920988e-05 1.087469e-05
## fregionM 2.586856e-05 2.983524e-05 3.202834e-05 7.813656e-06
## fregionZ -5.812652e-06 1.287623e-06 2.868416e-05 -1.308125e-06
## fregionT 6.077983e-06 1.600694e-05 2.981068e-05 2.532588e-05
## fspecSocial 1.027525e-04 1.034019e-04 1.040363e-04 1.039312e-04
## fspecLang 1.032205e-04 1.036670e-04 1.028865e-04 1.050192e-04
## fspecLaw 1.030240e-04 1.039564e-04 1.051990e-04 1.049214e-04
## fspecMat-Phys 1.032130e-04 1.035211e-04 1.040546e-04 1.056042e-04
## fspecEngin 1.025095e-04 1.027397e-04 1.024381e-04 1.029795e-04
## fspecScience 1.025562e-04 1.024010e-04 1.022359e-04 1.020804e-04
## fspecMedic 1.997466e-04 1.034978e-04 1.030147e-04 1.042026e-04
## fspecEconom 1.034978e-04 1.644054e-04 1.049520e-04 1.050798e-04
## fspecAgric 1.030147e-04 1.049520e-04 7.368809e-04 1.056533e-04
## fspecArt 1.042026e-04 1.050798e-04 1.056533e-04 4.969342e-04
all.equal(Bread %*% Meat %*% t(Bread), VHC)
## [1] TRUE
This can be conpared with the classical estimate of the variance covariance matrix—in the following, we only focuss on some portion of the matrix.
VHC[1:5, 1:5]
## (Intercept) fregionS fregionC fregionP
## (Intercept) 0.0002035881 -0.0001045279 -0.0001065979 -0.0001052227
## fregionS -0.0001045279 0.0001942935 0.0001020741 0.0001023074
## fregionC -0.0001065979 0.0001020741 0.0002077926 0.0001022329
## fregionP -0.0001052227 0.0001023074 0.0001022329 0.0004862487
## fregionK -0.0001034781 0.0001011661 0.0001006027 0.0001010982
## fregionK
## (Intercept) -0.0001034781
## fregionS 0.0001011661
## fregionC 0.0001006027
## fregionP 0.0001010982
## fregionK 0.0004261994
vcov(mAddit)[1:5, 1:5]
## (Intercept) fregionS fregionC fregionP
## (Intercept) 2.049583e-04 -1.009075e-04 -9.704506e-05 -9.933825e-05
## fregionS -1.009075e-04 1.780801e-04 9.400668e-05 9.440175e-05
## fregionC -9.704506e-05 9.400668e-05 2.116711e-04 9.436525e-05
## fregionP -9.933825e-05 9.440175e-05 9.436525e-05 4.825169e-04
## fregionK -9.449827e-05 9.342499e-05 9.366535e-05 9.394418e-05
## fregionK
## (Intercept) -9.449827e-05
## fregionS 9.342499e-05
## fregionC 9.366535e-05
## fregionP 9.394418e-05
## fregionK 4.400577e-04
Note, that some variance component are overestimated by the standard estimate (function vcov()
) and some others or uderestimated.
(modelMatrix %*% VHC %*% t(modelMatrix))[1:5, 1:5]
## 1 2 3 4 5
## 1 6.143465e-04 2.644008e-06 -7.700916e-06 -1.319871e-05 -7.700916e-06
## 2 2.644008e-06 1.410302e-04 8.037152e-05 8.117887e-05 8.037152e-05
## 3 -7.700916e-06 8.037152e-05 1.430679e-04 8.120846e-05 1.430679e-04
## 4 -1.319871e-05 8.117887e-05 8.120846e-05 1.874579e-04 8.120846e-05
## 5 -7.700916e-06 8.037152e-05 1.430679e-04 8.120846e-05 1.430679e-04