Some functions from the packages listed below are used inside this tutorial:
library("car")
library("nlme")
Data used in this tutorial come from a text by Luc Duchateau, Paul Janssen, and John Rowlands that is available at a webpage of International Livestock Research Institute in Nairobi, Kenya.
The following dataset corresponds to Table 1.2.2 from the above mentioned text and comes from a randomised block design
study that aimed to find out whether low (L
) or high (H
) dose of a drug Berenil leads to different results in a treatment
of trypanosomosis (parasite in cattle transmitted by a tse-tse fly). Three (\(I = 3\)) herds, each consisting of
10 cows were divided (each of them) into two parts (each consisting of 5 cows). One half of each herd was treated
by a low dose of Berenil, the second half of the herd was treated by a high dose of Berenil. Severity of the disease
was measured by so called packed cell volume (PCV, proportion of the volume of blood serum consisting of red blood cells).
The primary outcome of the study was then increase of PCV (PCV.diff
) over a specified time period.
dt122 <- data.frame(herd = gl(3, 10),
dose = rep(gl(2, 5, labels = c("L", "H")), 3),
PCV.diff = c(0.9, 2.0, 2.0, 2.2, 2.0, 7.4, 6.8, 7.1, 6.7, 7.9,
2.3, 2.7, 1.3, 1.6, 2.1, 6.6, 7.1, 6.2, 7.8, 7.2,
3.4, 2.7, 2.8, 3.0, 3.4, 8.8, 8.3, 7.9, 8.2, 8.1))
print(dt122)
## herd dose PCV.diff
## 1 1 L 0.9
## 2 1 L 2.0
## 3 1 L 2.0
## 4 1 L 2.2
## 5 1 L 2.0
## 6 1 H 7.4
## 7 1 H 6.8
## 8 1 H 7.1
## 9 1 H 6.7
## 10 1 H 7.9
## 11 2 L 2.3
## 12 2 L 2.7
## 13 2 L 1.3
## 14 2 L 1.6
## 15 2 L 2.1
## 16 2 H 6.6
## 17 2 H 7.1
## 18 2 H 6.2
## 19 2 H 7.8
## 20 2 H 7.2
## 21 3 L 3.4
## 22 3 L 2.7
## 23 3 L 2.8
## 24 3 L 3.0
## 25 3 L 3.4
## 26 3 H 8.8
## 27 3 H 8.3
## 28 3 H 7.9
## 29 3 H 8.2
## 30 3 H 8.1
Object dt122
is a classical data.frame which can be worked with in a standard way.
Some basic exploratory plots useful in a context of comparison of the effect of low/high dose of Berenil on PCV.diff
,
while taking into account a grouped structure of the data are the following:
COL <- rep(rep(c("red", "darkblue"), each = 5), 3)
BG <- rep(rep(c("pink", "skyblue"), each = 5), 3)
COL2 <- rep(c("pink", "skyblue"), 3)
par(mfrow = c(1, 2), bty = "n")
plot(PCV.diff ~ as.numeric(herd:dose), data = dt122, xaxt = "n", xlab = "", col = COL, bg = BG, pch = 23)
axis(1, at = 1:6, labels = paste("H", rep(1:3, each = 2), "/", rep(c("L", "H"), 3), sep = ""))
plot(PCV.diff ~ as.factor(herd:dose), data = dt122, xaxt = "n", xlab = "", col = COL2)
axis(1, at = 1:6, labels = paste("H", rep(1:3, each = 2), "/", rep(c("L", "H"), 3), sep = ""))
par(mfrow = c(1, 1))
What can be concluded from here?
More possibilities to calculate descriptive statistics and draw plots
are obtained if data are stored as a groupedData
structure (capabilities of the nlme
package):
gdt122 <- groupedData(formula = PCV.diff ~ 1 | herd / dose, data = dt122)
print(gdt122)
## Grouped Data: PCV.diff ~ 1 | herd/dose
## <environment: 0x98ecba4>
## herd dose PCV.diff
## 1 1 L 0.9
## 2 1 L 2.0
## 3 1 L 2.0
## 4 1 L 2.2
## 5 1 L 2.0
## 6 1 H 7.4
## 7 1 H 6.8
## 8 1 H 7.1
## 9 1 H 6.7
## 10 1 H 7.9
## 11 2 L 2.3
## 12 2 L 2.7
## 13 2 L 1.3
## 14 2 L 1.6
## 15 2 L 2.1
## 16 2 H 6.6
## 17 2 H 7.1
## 18 2 H 6.2
## 19 2 H 7.8
## 20 2 H 7.2
## 21 3 L 3.4
## 22 3 L 2.7
## 23 3 L 2.8
## 24 3 L 3.0
## 25 3 L 3.4
## 26 3 H 8.8
## 27 3 H 8.3
## 28 3 H 7.9
## 29 3 H 8.2
## 30 3 H 8.1
summary(gdt122)
## herd dose PCV.diff
## 1:10 L:15 Min. :0.900
## 2:10 H:15 1st Qu.:2.225
## 3:10 Median :4.800
## Mean :4.883
## 3rd Qu.:7.350
## Max. :8.800
gsummary(gdt122)
## herd dose PCV.diff
## 1/L 1 L 1.82
## 1/H 1 H 7.18
## 2/L 2 L 2.00
## 2/H 2 H 6.98
## 3/L 3 L 3.06
## 3/H 3 H 8.26
gsummary(gdt122, FUN = mean)
## herd dose PCV.diff
## 1/L 1 L 1.82
## 1/H 1 H 7.18
## 2/L 2 L 2.00
## 2/H 2 H 6.98
## 3/L 3 L 3.06
## 3/H 3 H 8.26
gsummary(gdt122, FUN = sd)
## herd dose PCV.diff
## 1/L 1 L 0.5215362
## 1/H 1 H 0.4868265
## 2/L 2 L 0.5567764
## 2/H 2 H 0.6099180
## 3/L 3 L 0.3286335
## 3/H 3 H 0.3361547
plot(gdt122)
summary(m0 <- aov(PCV.diff ~ herd + dose + herd:dose, data = dt122))
## Df Sum Sq Mean Sq F value Pr(>F)
## herd 2 9.05 4.52 19.225 1.04e-05 ***
## dose 1 201.24 201.24 855.140 < 2e-16 ***
## herd:dose 2 0.18 0.09 0.387 0.683
## Residuals 24 5.65 0.24
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(m0.noInt <- aov(PCV.diff ~ herd + dose, data = dt122))
## Df Sum Sq Mean Sq F value Pr(>F)
## herd 2 9.05 4.52 20.18 5.13e-06 ***
## dose 1 201.24 201.24 897.48 < 2e-16 ***
## Residuals 26 5.83 0.22
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Anova(m0.noInt, type = "II")
## Anova Table (Type II tests)
##
## Response: PCV.diff
## Sum Sq Df F value Pr(>F)
## herd 9.049 2 20.177 5.133e-06 ***
## dose 201.243 1 897.482 < 2.2e-16 ***
## Residuals 5.830 26
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
lm0.noInt <- lm(PCV.diff ~ herd + dose, data = dt122)
summary(lm0.noInt)
##
## Call:
## lm(formula = PCV.diff ~ herd + dose, data = dt122)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.0100 -0.2975 0.0350 0.3050 0.8100
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.9100 0.1729 11.046 2.57e-11 ***
## herd2 -0.0100 0.2118 -0.047 0.963
## herd3 1.1600 0.2118 5.478 9.57e-06 ***
## doseH 5.1800 0.1729 29.958 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4735 on 26 degrees of freedom
## Multiple R-squared: 0.973, Adjusted R-squared: 0.9699
## F-statistic: 312.6 on 3 and 26 DF, p-value: < 2.2e-16
confint(lm0.noInt)["doseH",]
## 2.5 % 97.5 %
## 4.824581 5.535419
We will now consider herds as a random sample (of size 3) from a population of herds and will treat corresponding
parameters as random variables (random effects) rather than fixed quantities (fixed effects). In the following, let
\(Y_{i,j,k}\) denote PCV.diff
for cow \(k\) (\(k = 1,…,5 = K\)) in part of the herd \(i\) (\(i = 1, 2, 3 = I\)) being treated by dose
\(j\) (\(j = 1, 2 = J\)). Fully, we could write \(k = k(i,j)\) and \(j = j(i)\) since with respect to the design of the study,
cows are nested within a dose which is nested within a herd.
Classical mixed-effects ANOVA model corresponding to the design of this study is the following \[ Y_{i,j,k} = \mu + b_i^A + \beta_j^B + b_{i,j}^{AB} + \varepsilon_{i,j,k}, \] where
contr.treatment
: \(\beta_1^B = 0\),
or with contr.sum
: \(\beta_1^B + \beta_2^B = 0\));All random terms are assumed to be mutually uncorrelated.
That is, variability of response is decomposed into three sources (\(\sigma^2\) \(=\) variability between cows coming from the same herd and being treated by the same dose, \(d_2^2\) \(=\) variability between doses inside a herd, \(d_1^2\) \(=\) variability between herds.
ANOVA table for the above model:
summary(m1 <- aov(PCV.diff ~ dose + Error(herd/dose), data = dt122))
##
## Error: herd
## Df Sum Sq Mean Sq F value Pr(>F)
## Residuals 2 9.049 4.524
##
## Error: herd:dose
## Df Sum Sq Mean Sq F value Pr(>F)
## dose 1 201.24 201.24 2211 0.000452 ***
## Residuals 2 0.18 0.09
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error: Within
## Df Sum Sq Mean Sq F value Pr(>F)
## Residuals 24 5.648 0.2353
In the output:
Can you find a test for significance of the fixed effect of dose in the above output? How to interprete the results of this test?
The mean squares that correspond to the random components of the model satisfy the following (try to derive as an optional homework): \[ \begin{array}{rcl} \mbox{E}(\mbox{MS}_A) & = & \sigma^2 \;+\; K\,d_2^2 \;+\; J\cdot K\,d_1^2, \ \mbox{E}(\mbox{MS}_{AB}) & = & \sigma^2 \;+\; K\,d_2^2, \ \mbox{E}(\mbox{MS}_{e}) & = & \sigma^2. \ \end{array} \] Using above expressions, unbiased estimators of all variance components can easily be derived.
Here is unbiased estimate of \(\sigma^2\) and related estimate of \(\sigma\):
MSe <- deviance(m1$Within) / m1$Within$df.residual
sigmaHat <- sqrt(MSe)
data.frame(MSe = MSe, sigmaHat = sigmaHat)
## MSe sigmaHat
## 1 0.2353333 0.4851117
Here is unbiased estimate of \(d_1^2\) and related estimate of \(d_1\):
I <- 3
J <- 2
K <- 5
MSA <- deviance(m1$herd) / m1$herd$df.residual
MSAB <- deviance(m1$"herd:dose") / m1$"herd:dose"$df.residual
d1sqHat <- (MSA - MSAB) / (J * K)
d1Hat <- sqrt(d1sqHat)
data.frame(MSA = MSA, MSAB = MSAB, d1sqHat = d1sqHat, d1Hat = d1Hat)
## MSA MSAB d1sqHat d1Hat
## 1 4.524333 0.091 0.4433333 0.6658328
Here is unbiased estimate of \(d_2^2\):
(d2sqHat <- (MSAB - MSe) / K)
## [1] -0.02886667
To fit a linear mixed model by (RE)ML method, capabilities of the package nlme
can be used. In a context
of the current study, be aware of the fact that asymptotic is somehow questionable here since
only \(I = 3\) herds are available.
The same model as above, now fitted by the REML method (default to lme
function):
m2lme <- lme(PCV.diff ~ dose, random = ~1 | herd/dose, data = dt122)
summary(m2lme)
## Linear mixed-effects model fit by REML
## Data: dt122
## AIC BIC logLik
## 59.02353 65.68455 -24.51177
##
## Random effects:
## Formula: ~1 | herd
## (Intercept)
## StdDev: 0.6557517
##
## Formula: ~1 | dose %in% herd
## (Intercept) Residual
## StdDev: 1.591771e-05 0.4735301
##
## Fixed effects: PCV.diff ~ dose
## Value Std.Error DF t-value p-value
## (Intercept) 2.293333 0.3978511 24 5.764301 0.0000
## doseH 5.180000 0.1729088 2 29.957998 0.0011
## Correlation:
## (Intr)
## doseH -0.217
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -2.17303689 -0.65651639 0.07550478 0.60397762 1.67043567
##
## Number of Observations: 30
## Number of Groups:
## herd dose %in% herd
## 3 6
fixef(m2lme)
## (Intercept) doseH
## 2.293333 5.180000
ranef(m2lme)
## Level: herd
## (Intercept)
## 1 -0.3643349
## 2 -0.3738393
## 3 0.7381742
##
## Level: dose %in% herd
## (Intercept)
## 1/L -6.158230e-10
## 1/H 4.011472e-10
## 2/L 4.548455e-10
## 2/H -6.751214e-10
## 3/L 1.609775e-10
## 3/H 2.739742e-10
intervals(m2lme)
## Approximate 95% confidence intervals
##
## Fixed effects:
## lower est. upper
## (Intercept) 1.472209 2.293333 3.114458
## doseH 4.436034 5.180000 5.923966
## attr(,"label")
## [1] "Fixed effects:"
##
## Random Effects:
## Level: herd
## lower est. upper
## sd((Intercept)) 0.2338472 0.6557517 1.838852
## Level: dose
## lower est. upper
## sd((Intercept)) 0 1.591771e-05 Inf
##
## Within-group standard error:
## lower est. upper
## 0.3608343 0.4735301 0.6214231
Apparently, there are some problems in estimation of \(d_2^2\). We are approaching a boundary of the parameter space. See also a negative value of its unbiased estimate.
Also the ML method can be used:
m2MLlme <- lme(PCV.diff ~ dose, random = ~1 | herd/dose, data = dt122, method = "ML")
summary(m2MLlme)
## Linear mixed-effects model fit by maximum likelihood
## Data: dt122
## AIC BIC logLik
## 57.06219 64.06818 -23.5311
##
## Random effects:
## Formula: ~1 | herd
## (Intercept)
## StdDev: 0.5291783
##
## Formula: ~1 | dose %in% herd
## (Intercept) Residual
## StdDev: 1.113661e-05 0.4646783
##
## Fixed effects: PCV.diff ~ dose
## Value Std.Error DF t-value p-value
## (Intercept) 2.293333 0.3397556 24 6.74995 0.0000
## doseH 5.180000 0.1756319 2 29.49350 0.0011
## Correlation:
## (Intr)
## doseH -0.258
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -2.23260293 -0.66583662 0.07333894 0.59731185 1.68408520
##
## Number of Observations: 30
## Number of Groups:
## herd dose %in% herd
## 3 6
intervals(m2MLlme)
## Approximate 95% confidence intervals
##
## Fixed effects:
## lower est. upper
## (Intercept) 1.615889 2.293333 2.970777
## doseH 4.449941 5.180000 5.910059
## attr(,"label")
## [1] "Fixed effects:"
##
## Random Effects:
## Level: herd
## lower est. upper
## sd((Intercept)) 0.2233861 0.5291783 1.253568
## Level: dose
## lower est. upper
## sd((Intercept)) 0 1.113661e-05 Inf
##
## Within-group standard error:
## lower est. upper
## 0.3558910 0.4646783 0.6067192
The three models below are almost the same. Nevertheless, they (slightly) differ. How?
gdt122.2 <- groupedData(formula = PCV.diff ~ 1 | herd/dose, data = dt122)
gdt122.34 <- groupedData(formula = PCV.diff ~ 1 | herd, data = dt122)
m2 <- lme(PCV.diff ~ dose, random = ~1 | herd/dose, data = gdt122.2)
m3 <- lme(PCV.diff ~ dose, random = ~1 + dose | herd, data = gdt122.34)
m4 <- lme(PCV.diff ~ dose, random = pdDiag(~ 1 + dose), data = gdt122.34)
summary(m2)
## Linear mixed-effects model fit by REML
## Data: gdt122.2
## AIC BIC logLik
## 59.02353 65.68455 -24.51177
##
## Random effects:
## Formula: ~1 | herd
## (Intercept)
## StdDev: 0.6557517
##
## Formula: ~1 | dose %in% herd
## (Intercept) Residual
## StdDev: 1.591771e-05 0.4735301
##
## Fixed effects: PCV.diff ~ dose
## Value Std.Error DF t-value p-value
## (Intercept) 2.293333 0.3978511 24 5.764301 0.0000
## doseH 5.180000 0.1729088 2 29.957998 0.0011
## Correlation:
## (Intr)
## doseH -0.217
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -2.17303689 -0.65651639 0.07550478 0.60397762 1.67043567
##
## Number of Observations: 30
## Number of Groups:
## herd dose %in% herd
## 3 6
summary(m3)
## Linear mixed-effects model fit by REML
## Data: gdt122.34
## AIC BIC logLik
## 61.02353 69.01676 -24.51177
##
## Random effects:
## Formula: ~1 + dose | herd
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 6.557517e-01 (Intr)
## doseH 2.439662e-05 0
## Residual 4.735301e-01
##
## Fixed effects: PCV.diff ~ dose
## Value Std.Error DF t-value p-value
## (Intercept) 2.293333 0.3978511 26 5.764301 0
## doseH 5.180000 0.1729088 26 29.957998 0
## Correlation:
## (Intr)
## doseH -0.217
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -2.17303689 -0.65651640 0.07550478 0.60397763 1.67043567
##
## Number of Observations: 30
## Number of Groups: 3
summary(m4)
## Linear mixed-effects model fit by REML
## Data: gdt122.34
## AIC BIC logLik
## 59.02353 65.68455 -24.51177
##
## Random effects:
## Formula: ~1 + dose | herd
## Structure: Diagonal
## (Intercept) doseH Residual
## StdDev: 0.6557517 2.5681e-05 0.4735301
##
## Fixed effects: PCV.diff ~ dose
## Value Std.Error DF t-value p-value
## (Intercept) 2.293333 0.3978511 26 5.764301 0
## doseH 5.180000 0.1729088 26 29.957998 0
## Correlation:
## (Intr)
## doseH -0.217
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -2.17303689 -0.65651639 0.07550478 0.60397762 1.67043567
##
## Number of Observations: 30
## Number of Groups: 3
ranef(m2)
## Level: herd
## (Intercept)
## 1 -0.3643349
## 2 -0.3738393
## 3 0.7381742
##
## Level: dose %in% herd
## (Intercept)
## 1/L -6.158230e-10
## 1/H 4.011472e-10
## 2/L 4.548455e-10
## 2/H -6.751214e-10
## 3/L 1.609775e-10
## 3/H 2.739742e-10
ranef(m3)
## (Intercept) doseH
## 2 -0.3738393 -2.431589e-09
## 1 -0.3643349 1.181571e-10
## 3 0.7381742 2.313431e-09
ranef(m4)
## (Intercept) doseH
## 2 -0.3738393 -1.757296e-09
## 1 -0.3643349 1.044159e-09
## 3 0.7381742 7.131364e-10
m2
is, in fact, a model with two-levels of nesting,
m3
and m4
are specified as classical (single-level) linear mixed models.
Can you write mathematically the three models?
Models m2
and m3
or m4
have (among other things) somehow different interpretation
of the “interaction” random effects “\(b_{i,j}^{AB}\)”.