NMST432 Advanced Regression Models

Tutorial on grouped data analysis in R

Grouped data structures

We will need library("nlme") to access functions for longitudinal data analysis and library("lattice") for high-level plotting functions.

The examples use dataset Orthodont available in the package nlme. Access it by calling data(Orthodont, package = "nlme").

This data frame contains the following columns:

distance A numeric vector of distances (mm) from the pituitary (hypophysis gland) to the pterygomaxillary fissure (see the picture below). These distances are measured on x-ray images of the skull.
age A numeric vector of ages of the subject (yr).
Subject An ordered factor indicating the subject on which the measurement was made. The levels are labelled M01 to M16 for the males and F01 to F13 for the females. The ordering is by increasing average distance within sex.
Sex A factor with levels Male and Female.

pterygomaxillary fissure

There are four measurements of distance on each subject, obtained at ages 8, 10, 12, and 14. The goal of the analysis is to describe how the distance changes with age, how the growth varies between subjects and how it depends on gender.

Construction of groupedData objects

As you can see by checking

str(Orthodont)

## Classes 'nfnGroupedData', 'nfGroupedData', 'groupedData' and 'data.frame':   108 obs. of  4 variables:
##  $ distance: num  26 25 29 31 21.5 22.5 23 26.5 23 22.5 ...
##  $ age     : num  8 10 12 14 8 10 12 14 8 10 ...
##  $ Subject : Ord.factor w/ 27 levels "M16"<"M05"<"M02"<..: 15 15 15 15 3 3 3 3 7 7 ...
##  $ Sex     : Factor w/ 2 levels "Male","Female": 1 1 1 1 1 1 1 1 1 1 ...
##  - attr(*, "outer")=Class 'formula'  language ~Sex
##   .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
##  - attr(*, "formula")=Class 'formula'  language distance ~ age | Subject
##   .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
##  - attr(*, "labels")=List of 2
##   ..$ x: chr "Age"
##   ..$ y: chr "Distance from pituitary to pterygomaxillary fissure"
##  - attr(*, "units")=List of 2
##   ..$ x: chr "(yr)"
##   ..$ y: chr "(mm)"
##  - attr(*, "FUN")=function (x)  
##   ..- attr(*, "source")= chr "function (x) max(x, na.rm = TRUE)"
##  - attr(*, "order.groups")= logi TRUE

head(Orthodont)

## Grouped Data: distance ~ age | Subject
##   distance age Subject  Sex
## 1     26.0   8     M01 Male
## 2     25.0  10     M01 Male
## 3     29.0  12     M01 Male
## 4     31.0  14     M01 Male
## 5     21.5   8     M02 Male
## 6     22.5  10     M02 Male

the dataset has an additional special structure. It is a groupedData object. Let us explain how such objects can be constructed.

Remove the groupedData structure from Orthodont by creating a classical data.frame called Ort:

Ort <- as.data.frame(Orthodont)
str(Ort)

## 'data.frame':    108 obs. of  4 variables:
##  $ distance: num  26 25 29 31 21.5 22.5 23 26.5 23 22.5 ...
##  $ age     : num  8 10 12 14 8 10 12 14 8 10 ...
##  $ Subject : Ord.factor w/ 27 levels "M16"<"M05"<"M02"<..: 15 15 15 15 3 3 3 3 7 7 ...
##  $ Sex     : Factor w/ 2 levels "Male","Female": 1 1 1 1 1 1 1 1 1 1 ...

head(Ort)

##   distance age Subject  Sex
## 1     26.0   8     M01 Male
## 2     25.0  10     M01 Male
## 3     29.0  12     M01 Male
## 4     31.0  14     M01 Male
## 5     21.5   8     M02 Male
## 6     22.5  10     M02 Male

Now recreate a grouped data object it by running the command

Ort2 <- groupedData(formula = distance ~ age | Subject, 
                 data = Ort, outer = ~Sex,
                 labels = list(x = "Age", y = "Distance from pituitary to pterygomaxillary fissure"),
                 units = list(x = "[yr]", y = "[mm]"))
str(Ort2)

## Classes 'nfnGroupedData', 'nfGroupedData', 'groupedData' and 'data.frame':   108 obs. of  4 variables:
##  $ distance: num  26 25 29 31 21.5 22.5 23 26.5 23 22.5 ...
##  $ age     : num  8 10 12 14 8 10 12 14 8 10 ...
##  $ Subject : Ord.factor w/ 27 levels "M16"<"M05"<"M02"<..: 15 15 15 15 3 3 3 3 7 7 ...
##  $ Sex     : Factor w/ 2 levels "Male","Female": 1 1 1 1 1 1 1 1 1 1 ...
##  - attr(*, "formula")=Class 'formula'  language distance ~ age | Subject
##   .. ..- attr(*, ".Environment")=<environment: 0x906d1a8> 
##  - attr(*, "labels")=List of 2
##   ..$ x: chr "Age"
##   ..$ y: chr "Distance from pituitary to pterygomaxillary fissure"
##  - attr(*, "units")=List of 2
##   ..$ x: chr "[yr]"
##   ..$ y: chr "[mm]"
##  - attr(*, "outer")=Class 'formula'  language ~Sex
##   .. ..- attr(*, ".Environment")=<environment: 0x906d1a8> 
##  - attr(*, "FUN")=function (x)  
##  - attr(*, "order.groups")= logi TRUE

head(Ort2)

## Grouped Data: distance ~ age | Subject
## <environment: 0x906d1a8>
##   distance age Subject  Sex
## 1     26.0   8     M01 Male
## 2     25.0  10     M01 Male
## 3     29.0  12     M01 Male
## 4     31.0  14     M01 Male
## 5     21.5   8     M02 Male
## 6     22.5  10     M02 Male

This call to the groupedData function specifies the following:

formula defines the grouping structure, the response variable and the primary covariate (usually time in case of longitudinal data). The groups are specified by listing the variable Subject after the vertical bar at the end of the formula. The primary covariate in this case is age.
outer defines additional outer covariates of interest. Outer covariate is a covariate measured at the group level (all measurements within the same group share the same value of the outer covariate). [The opposite: inner covariate.]
labels and units arguments specify the labels and units for the response (component y) and the primary covariate (component x). These arguments are optional.

See also help(groupedData). Functions available in library("nlme") are able to use the structure specified by the groupedData object.

Exploratory analysis of grouped data

The standard R functions for descriptive analysis may not be useful for describing grouped data because they do not take into account the group structure. Problems may be caused, e.g., by the repetition of the values of outer variables on each subject. For example:

summary(Ort)

##     distance          age          Subject       Sex    
##  Min.   :16.50   Min.   : 8.0   M16    : 4   Male  :64  
##  1st Qu.:22.00   1st Qu.: 9.5   M05    : 4   Female:44  
##  Median :23.75   Median :11.0   M02    : 4              
##  Mean   :24.02   Mean   :11.0   M11    : 4              
##  3rd Qu.:26.00   3rd Qu.:12.5   M07    : 4              
##  Max.   :31.50   Max.   :14.0   M08    : 4              
##                                 (Other):84

Instead of using summary, one can use gsummary on groupedData object to see summaries of variables by group. Continuous variables are summarized by mean, which can be changed by providing the argument FUN.

gsummary(Ort2)

##     distance age Subject    Sex
## M16   23.000  11     M16   Male
## M05   23.000  11     M05   Male
## M02   23.375  11     M02   Male
## M11   23.625  11     M11   Male
## M07   23.750  11     M07   Male
## M08   23.875  11     M08   Male
## M03   24.250  11     M03   Male
## M12   24.250  11     M12   Male
## M13   24.250  11     M13   Male
## M14   24.875  11     M14   Male
## M09   25.125  11     M09   Male
## M15   25.875  11     M15   Male
## M06   26.375  11     M06   Male
## M04   26.625  11     M04   Male
## M01   27.750  11     M01   Male
## M10   29.500  11     M10   Male
## F10   18.500  11     F10 Female
## F09   21.125  11     F09 Female
## F06   21.125  11     F06 Female
## F01   21.375  11     F01 Female
## F05   22.625  11     F05 Female
## F07   23.000  11     F07 Female
## F02   23.000  11     F02 Female
## F08   23.375  11     F08 Female
## F03   23.750  11     F03 Female
## F04   24.875  11     F04 Female
## F11   26.375  11     F11 Female

gsummary(Ort2, FUN = max)

##     distance age Subject    Sex
## M16     25.0  14     M16   Male
## M05     26.0  14     M05   Male
## M02     26.5  14     M02   Male
## M11     25.0  14     M11   Male
## M07     26.5  14     M07   Male
## M08     25.5  14     M08   Male
## M03     27.5  14     M03   Male
## M12     28.0  14     M12   Male
## M13     29.5  14     M13   Male
## M14     26.0  14     M14   Male
## M09     31.0  14     M09   Male
## M15     30.0  14     M15   Male
## M06     28.5  14     M06   Male
## M04     27.5  14     M04   Male
## M01     31.0  14     M01   Male
## M10     31.5  14     M10   Male
## F10     19.5  14     F10 Female
## F09     22.0  14     F09 Female
## F06     22.5  14     F06 Female
## F01     23.0  14     F01 Female
## F05     23.5  14     F05 Female
## F07     25.0  14     F07 Female
## F02     25.5  14     F02 Female
## F08     24.0  14     F08 Female
## F03     26.0  14     F03 Female
## F04     26.5  14     F04 Female
## F11     28.0  14     F11 Female

This output can be further summarized across all groups by the function summary:

summary(gsummary(Ort2))

##     distance          age        Subject       Sex    
##  Min.   :18.50   Min.   :11   M16    : 1   Male  :16  
##  1st Qu.:23.00   1st Qu.:11   M05    : 1   Female:11  
##  Median :23.75   Median :11   M02    : 1              
##  Mean   :24.02   Mean   :11   M11    : 1              
##  3rd Qu.:25.00   3rd Qu.:11   M07    : 1              
##  Max.   :29.50   Max.   :11   M08    : 1              
##                               (Other):21

Compare this to the output of

summary(Ort2)

##     distance          age          Subject       Sex    
##  Min.   :16.50   Min.   : 8.0   M16    : 4   Male  :64  
##  1st Qu.:22.00   1st Qu.: 9.5   M05    : 4   Female:44  
##  Median :23.75   Median :11.0   M02    : 4              
##  Mean   :24.02   Mean   :11.0   M11    : 4              
##  3rd Qu.:26.00   3rd Qu.:12.5   M07    : 4              
##  Max.   :31.50   Max.   :14.0   M08    : 4              
##                                 (Other):84

The package nlme adds methods for creating subject-level plots of groupedData objects.

plot(Ort2)

plot of chunk unnamed-chunk-9

Note that such plot is only useful if the number of subjects is not too high. With a high number of subjects, user-tailored plots must be used! Another strategy is to select (randomly) a reasonable number of subjects and then to create plots for only this subset.

A two-panel plot by gender is obtained by calling

plot(Ort2, outer = TRUE, aspect = "fill")

plot of chunk unnamed-chunk-10

In case there are more subject-specific covariates in the data, only some of them can be selected to split the plot:

plot(Ort2, outer = ~Sex, aspect = "fill")

plot of chunk unnamed-chunk-11

For more details, see help(plot.nfnGroupedData).

Building a linear mixed model for longitudinal dara

It is not appropriate to analyze longitudinal and in general grouped data by a linear model using least squares. Nevertheless, we will still try this model to compare the results with the analysis by the linear mixed model we will consider later. Fitting a classical linear model can (sometimes!!!) be useful in an exploratory part of the analysis. A classical linear model is useful exploratory tool in two ways:

Using ordinary least squares (OLS) applied to the full data set while ignoring dependencies. This basically means fitting a marginal model in which we (mostly incorrectly) assume homoscedastic and independent errors. This is especially useful to decide (at least preliminary) on the fixed effects part of the model based on residual plots used for a classical linear model. Nevertheless, do not forget that formal tests are in general invalid due to correlated errors which are also not necessarily homoscedastic.
Using OLS applied separately to each subject. This is especially useful to decide (again at least preliminary) on the random effects part of the model. When doing so, we can
Use residuals from point 1 as outcome (= detrended observations);
Assume (and estimate) common residual variance for all subjects (which is basically achieved by considering Subject as a categorical covariate (factor) in the linear model).

In case there are no time-dependent regressors except time and no or only few (and categorical) outer regressors, steps 1 and 2 can be done almost together.

To obtain a meaningful interpretation of the intercept, we subtract 8 (the age at the initial measurement) from age. Then the intercept estimates the expected value at the start of the study.

OLS applied to full-data set

Let us first fit a classical linear model while assuming a linear trend over time with possibly different parameters of the line for male and female.

fitlm <- lm(distance ~ I(age - 8) * Sex, data = Ort2)
summary(fitlm)

## 
## Call:
## lm(formula = distance ~ I(age - 8) * Sex, data = Ort2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.6156 -1.3219 -0.1682  1.3299  5.2469 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           22.6156     0.4721  47.907  < 2e-16 ***
## I(age - 8)             0.7844     0.1262   6.217 1.07e-08 ***
## SexFemale             -1.4065     0.7396  -1.902    0.060 .  
## I(age - 8):SexFemale  -0.3048     0.1977  -1.542    0.126    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.257 on 104 degrees of freedom
## Multiple R-squared:  0.4227, Adjusted R-squared:  0.4061 
## F-statistic: 25.39 on 3 and 104 DF,  p-value: 2.108e-12

Classical basic residual plots:

par(mfrow = c(2, 2))
plot(fitlm, pch = 23, col = "darkblue", bg = "skyblue")

plot of chunk unnamed-chunk-13

par(mfrow = c(1, 1))

Especially from the first plot we can conclude that the linear trend with parameters of the lines depending on Sex is a reasonable model for the marginal evolution of the response, i.e., for the fixed effects part of the subsequent linear mixed model.

One more classical plot (mainly to see how the trends for male and female differ):

COL <- c("darkblue", "red")
BG <- c("skyblue", "pink")
PCH <- c(24, 25)
names(COL) <- names(BG) <- names(PCH) <- c("Male", "Female")
plot(distance ~ age, col = COL[Sex], pch = PCH[Sex], bg = BG[Sex], data = Ort2)
abline(lm(distance ~ age, data = subset(Ort2, Sex == "Female")), col="red", lwd = 2)
abline(lm(distance ~ age, data = subset(Ort2, Sex == "Male")), col = "blue", lwd = 2)

plot of chunk unnamed-chunk-14

However, the residuals for measurements taken on the same subject are correlated, as the following plot testifies.

bwplot(Ort2$Subject ~ residuals(fitlm))

plot of chunk unnamed-chunk-15

But we should not be surprised by that…

OLS applied separately to each subject

Let us now explore subject-specific evolutions of the response over time. In our dataset, we have just one outer covariate (Sex) which, moreover, has only two levels. Hence it is not really necessary to detrend the outcomes before fitting subject-specific models. We can easily take Sex into account at later stage as will be shown.

We will now consider fitting subject-specific regression lines to these data. At the same time, we will assume the same residual variance for all subjects. The subject-specific regression lines are obtained from the function lmList.

fitslm <- lmList(Ort2)

The information on how to identify the subjects and what models to fit is extracted from the groupedData structure of Ort2. The structure of the model can also be specified explicitely (and must be specified in this way if a simpler linear model is to be fitted than that implied by the groupedData structure). The lmList function can also be used with data stored in a standard data.frame:

fitslm <- lmList(distance ~ age | Subject, data = Ort)

The estimated subject-specific coefficients (intercepts and slopes) can be summarized and the residuals plotted easily.

summary(fitslm)

## Call:
##   Model: distance ~ age | Subject 
##    Data: Ort 
## 
## Coefficients:
##    (Intercept) 
##     Estimate Std. Error   t value     Pr(>|t|)
## M16    16.95   3.288173 5.1548379 3.695247e-06
## M05    13.65   3.288173 4.1512411 1.181678e-04
## M02    14.85   3.288173 4.5161854 3.458934e-05
## M11    20.05   3.288173 6.0976106 1.188838e-07
## M07    14.95   3.288173 4.5465974 3.116705e-05
## M08    19.75   3.288173 6.0063745 1.665712e-07
## M03    16.00   3.288173 4.8659237 1.028488e-05
## M12    13.25   3.288173 4.0295930 1.762580e-04
## M13     2.80   3.288173 0.8515366 3.982319e-01
## M14    19.10   3.288173 5.8086964 3.449588e-07
## M09    14.40   3.288173 4.3793313 5.509579e-05
## M15    13.50   3.288173 4.1056231 1.373664e-04
## M06    18.95   3.288173 5.7630783 4.078189e-07
## M04    24.70   3.288173 7.5117696 6.081644e-10
## M01    17.30   3.288173 5.2612799 2.523621e-06
## M10    21.25   3.288173 6.4625549 3.065505e-08
## F10    13.55   3.288173 4.1208291 1.306536e-04
## F09    18.10   3.288173 5.5045761 1.047769e-06
## F06    17.00   3.288173 5.1700439 3.499774e-06
## F01    17.25   3.288173 5.2460739 2.665260e-06
## F05    19.60   3.288173 5.9607565 1.971127e-07
## F07    16.95   3.288173 5.1548379 3.695247e-06
## F02    14.20   3.288173 4.3185072 6.763806e-05
## F08    21.45   3.288173 6.5233789 2.443813e-08
## F03    14.40   3.288173 4.3793313 5.509579e-05
## F04    19.65   3.288173 5.9759625 1.863600e-07
## F11    18.95   3.288173 5.7630783 4.078189e-07
##    age 
##     Estimate Std. Error   t value     Pr(>|t|)
## M16    0.550  0.2929338 1.8775576 6.584707e-02
## M05    0.850  0.2929338 2.9016799 5.361639e-03
## M02    0.775  0.2929338 2.6456493 1.065760e-02
## M11    0.325  0.2929338 1.1094659 2.721458e-01
## M07    0.800  0.2929338 2.7309929 8.511442e-03
## M08    0.375  0.2929338 1.2801529 2.059634e-01
## M03    0.750  0.2929338 2.5603058 1.328807e-02
## M12    1.000  0.2929338 3.4137411 1.222240e-03
## M13    1.950  0.2929338 6.6567951 1.485652e-08
## M14    0.525  0.2929338 1.7922141 7.870160e-02
## M09    0.975  0.2929338 3.3283976 1.577941e-03
## M15    1.125  0.2929338 3.8404587 3.247135e-04
## M06    0.675  0.2929338 2.3042752 2.508117e-02
## M04    0.175  0.2929338 0.5974047 5.527342e-01
## M01    0.950  0.2929338 3.2430540 2.030113e-03
## M10    0.750  0.2929338 2.5603058 1.328807e-02
## F10    0.450  0.2929338 1.5361835 1.303325e-01
## F09    0.275  0.2929338 0.9387788 3.520246e-01
## F06    0.375  0.2929338 1.2801529 2.059634e-01
## F01    0.375  0.2929338 1.2801529 2.059634e-01
## F05    0.275  0.2929338 0.9387788 3.520246e-01
## F07    0.550  0.2929338 1.8775576 6.584707e-02
## F02    0.800  0.2929338 2.7309929 8.511442e-03
## F08    0.175  0.2929338 0.5974047 5.527342e-01
## F03    0.850  0.2929338 2.9016799 5.361639e-03
## F04    0.475  0.2929338 1.6215270 1.107298e-01
## F11    0.675  0.2929338 2.3042752 2.508117e-02
## 
## Residual standard error: 1.31004 on 54 degrees of freedom

plot(fitslm)

plot of chunk unnamed-chunk-19

However, we would rather adjust the intercept to the expected distance at the age 8, so we redefine the fitted model as follows:

fitslm2 <- lmList(distance ~ I(age - 8), data = Ort2)

fitslm2 <- lmList(distance ~ I(age - 8) | Subject, data = Ort)

summary(fitslm2)

## Call:
##   Model: distance ~ I(age - 8) | Subject 
##    Data: Ort 
## 
## Coefficients:
##    (Intercept) 
##     Estimate Std. Error  t value     Pr(>|t|)
## M16    21.35   1.096058 19.47890 4.359278e-26
## M05    20.45   1.096058 18.65778 3.319674e-25
## M02    21.05   1.096058 19.20519 8.514661e-26
## M11    22.65   1.096058 20.66497 2.596148e-27
## M07    21.35   1.096058 19.47890 4.359278e-26
## M08    22.75   1.096058 20.75621 2.100698e-27
## M03    22.00   1.096058 20.07194 1.046935e-26
## M12    21.25   1.096058 19.38766 5.444880e-26
## M13    18.40   1.096058 16.78744 4.366662e-23
## M14    23.30   1.096058 21.25800 6.639625e-28
## M09    22.20   1.096058 20.25441 6.794018e-27
## M15    22.50   1.096058 20.52812 3.571686e-27
## M06    24.35   1.096058 22.21598 7.809554e-29
## M04    26.10   1.096058 23.81261 2.589060e-30
## M01    24.90   1.096058 22.71778 2.621421e-29
## M10    27.25   1.096058 24.86183 3.054764e-31
## F10    17.15   1.096058 15.64699 1.031736e-21
## F09    20.30   1.096058 18.52092 4.686286e-25
## F06    20.00   1.096058 18.24721 9.391685e-25
## F01    20.25   1.096058 18.47530 5.259209e-25
## F05    21.80   1.096058 19.88946 1.618187e-26
## F07    21.35   1.096058 19.47890 4.359278e-26
## F02    20.60   1.096058 18.79463 2.355967e-25
## F08    22.85   1.096058 20.84744 1.701036e-27
## F03    21.20   1.096058 19.34205 6.087007e-26
## F04    23.45   1.096058 21.39486 4.867852e-28
## F11    24.35   1.096058 22.21598 7.809554e-29
##    I(age - 8) 
##     Estimate Std. Error   t value     Pr(>|t|)
## M16    0.550  0.2929338 1.8775576 6.584707e-02
## M05    0.850  0.2929338 2.9016799 5.361639e-03
## M02    0.775  0.2929338 2.6456493 1.065760e-02
## M11    0.325  0.2929338 1.1094659 2.721458e-01
## M07    0.800  0.2929338 2.7309929 8.511442e-03
## M08    0.375  0.2929338 1.2801529 2.059634e-01
## M03    0.750  0.2929338 2.5603058 1.328807e-02
## M12    1.000  0.2929338 3.4137411 1.222240e-03
## M13    1.950  0.2929338 6.6567951 1.485652e-08
## M14    0.525  0.2929338 1.7922141 7.870160e-02
## M09    0.975  0.2929338 3.3283976 1.577941e-03
## M15    1.125  0.2929338 3.8404587 3.247135e-04
## M06    0.675  0.2929338 2.3042752 2.508117e-02
## M04    0.175  0.2929338 0.5974047 5.527342e-01
## M01    0.950  0.2929338 3.2430540 2.030113e-03
## M10    0.750  0.2929338 2.5603058 1.328807e-02
## F10    0.450  0.2929338 1.5361835 1.303325e-01
## F09    0.275  0.2929338 0.9387788 3.520246e-01
## F06    0.375  0.2929338 1.2801529 2.059634e-01
## F01    0.375  0.2929338 1.2801529 2.059634e-01
## F05    0.275  0.2929338 0.9387788 3.520246e-01
## F07    0.550  0.2929338 1.8775576 6.584707e-02
## F02    0.800  0.2929338 2.7309929 8.511442e-03
## F08    0.175  0.2929338 0.5974047 5.527342e-01
## F03    0.850  0.2929338 2.9016799 5.361639e-03
## F04    0.475  0.2929338 1.6215270 1.107298e-01
## F11    0.675  0.2929338 2.3042752 2.508117e-02
## 
## Residual standard error: 1.31004 on 54 degrees of freedom

Let us now plot the estimated intercepts and slopes against each other.

pairs(fitslm2, id = 0.01, adj = -0.5)

plot of chunk unnamed-chunk-23

The arguments id and adj are used to identify outliers. The subject M13 had an unusually large slope (fast skull growth). The intercepts and slopes, however, do not seem to be highly correlated. This would be different if we did not shift age by 8 years:

pairs(fitslm, id = 0.01, adj = -0.5)

plot of chunk unnamed-chunk-24

This plot shows a relatively large correlation between the intercepts and slopes.

Next, we will calculate and plot confidence intervals for the subject-specific intercepts and slopes.

plot(intervals(fitslm2))

plot of chunk unnamed-chunk-25

This plot tells us better how much the intercepts and slopes vary between individuals. It seems that the variability in the slopes is not as large as that in the intercepts.

Even better insight into the structure of the subject-specific coefficients is obtained if we use detrended outcomes. In our case, we may want to adjust the outcomes for Sex. This means, we take residuals from the overall OLS as outcomes when fitting the subject-specific models.

Ort  <- transform(Ort, resdistance = residuals(fitlm))

resfitslm  <- lmList(resdistance ~ age | Subject, data = Ort)
resfitslm2 <- lmList(resdistance ~ I(age - 8) | Subject, data = Ort)

plot(resfitslm)

plot of chunk unnamed-chunk-28

pairs(resfitslm, id = 0.01, adj = -0.5)

plot of chunk unnamed-chunk-29

pairs(resfitslm2, id = 0.01, adj = -0.5)

plot of chunk unnamed-chunk-30

plot(intervals(resfitslm2))

plot of chunk unnamed-chunk-31

Fitting linear mixed models to grouped data

The fitting function

The R function for fitting linear mixed models is lme (from package nlme). The function accepts two arguments that specify the structure of the model: the argument fixed for the mean structure (regression matrix \(\mathbb{X}_i\)) and the argument random for the random effects (regression matrix \(\mathbb{Z}_i\)).

For example, the model with random intercept and random slope (in age, starting at 8 years) for Orthodont data has the formula \[ Y_{ij}=\beta_1+(t_j-8)\beta_2+b_{i1}+(t_j-8)b_{i2}+\varepsilon_{ij}, \] where \(Y_{ij}\) is the \(j\)-th distance measurement on the \(i\)-th subject, \(t_j\) is the age at the \(j\)-th measurement (the same for all subjects), \(\beta_1\) is the population mean distance at 8 years, \(\beta_2\) is the mean growth in distance in one year, \(b_{i1}\) is the zero-mean random intercept for the \(i\)-th subject (the deviation of \(Y_{ij}-\varepsilon_{ij}\) from \(\beta_1\) at the age 8), \(b_{i2}\) is the zero-mean random slope for the \(i\)-th subject (the deviation of the growth rate from \(\beta_2\)), and \(\varepsilon_{ij}\) are uncorrelated random errors.

In the function lme, this is specified as

fit.1 <- lme(fixed = distance ~ I(age - 8), random = ~ I(age - 8) | Subject, data = Ort2)

Notice that the argument random does not specify the response in the formula, but includes the specification of groups (variable Subject after the vertical bar). The results of the fit can be displayed by the function summary:

summary(fit.1)

## Linear mixed-effects model fit by REML
##  Data: Ort2 
##        AIC      BIC    logLik
##   454.6367 470.6173 -221.3183
## 
## Random effects:
##  Formula: ~I(age - 8) | Subject
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev   Corr  
## (Intercept) 1.886629 (Intr)
## I(age - 8)  0.226428 0.209 
## Residual    1.310039       
## 
## Fixed effects: distance ~ I(age - 8) 
##                 Value Std.Error DF  t-value p-value
## (Intercept) 22.042593 0.4199079 80 52.49387       0
## I(age - 8)   0.660185 0.0712533 80  9.26533       0
##  Correlation: 
##            (Intr)
## I(age - 8) -0.208
## 
## Standardized Within-Group Residuals:
##          Min           Q1          Med           Q3          Max 
## -3.223105126 -0.493761342  0.007316612  0.472151020  3.916033672 
## 
## Number of Observations: 108
## Number of Groups: 27

We can see in two separate tables the estimates of standard deviations of the random effects, and estimates, standard errors, and Wald test statistics for the fixed effects . The first line of the output shows the estimation method for random effects. The default is REML. ML fit is obtained by specifying the argument method = "ML" in lme(). For example,

fixef(fit.1)

## (Intercept)  I(age - 8) 
##  22.0425926   0.6601852

the random effects (empirical Bayes) estimates by

ranef(fit.1)

##     (Intercept)   I(age - 8)
## M16  -0.7385868 -0.068853784
## M05  -0.9718653  0.025600396
## M02  -0.6114390  0.014507868
## M11  -0.0601169 -0.118826058
## M07  -0.3287723  0.034900066
## M08   0.1024078 -0.094736351
## M03   0.1129163  0.035852391
## M12  -0.0814824  0.114564208
## M13  -0.8201976  0.413669111
## M14   0.7913862 -0.014119883
## M09   0.6428730  0.135908731
## M15   1.1304470  0.208177854
## M06   1.8831736  0.083191273
## M04   2.4733754 -0.065885016
## M01   2.7770593  0.215684665
## M10   4.3424093  0.211146622
## F10  -4.2861070 -0.250590709
## F09  -2.0352814 -0.218041865
## F06  -2.1130409 -0.186557138
## F01  -1.9116365 -0.178209794
## F05  -0.8268549 -0.167957799
## F07  -0.7385868 -0.068853784
## F02  -0.9329855  0.009858033
## F08  -0.1448821 -0.174400493
## F03  -0.3676520  0.050642429
## F04   0.8302660 -0.029862247
## F11   1.8831736  0.083191273

The sum of fixed and random effects (estimated subject-specific intercepts and slopes) is calculated by

coef(fit.1)

##     (Intercept) I(age - 8)
## M16    21.30401  0.5913314
## M05    21.07073  0.6857856
## M02    21.43115  0.6746931
## M11    21.98248  0.5413591
## M07    21.71382  0.6950853
## M08    22.14500  0.5654488
## M03    22.15551  0.6960376
## M12    21.96111  0.7747494
## M13    21.22240  1.0738543
## M14    22.83398  0.6460653
## M09    22.68547  0.7960939
## M15    23.17304  0.8683630
## M06    23.92577  0.7433765
## M04    24.51597  0.5943002
## M01    24.81965  0.8758699
## M10    26.38500  0.8713318
## F10    17.75649  0.4095945
## F09    20.00731  0.4421433
## F06    19.92955  0.4736280
## F01    20.13096  0.4819754
## F05    21.21574  0.4922274
## F07    21.30401  0.5913314
## F02    21.10961  0.6700432
## F08    21.89771  0.4857847
## F03    21.67494  0.7108276
## F04    22.87286  0.6303229
## F11    23.92577  0.7433765

Confidence intervals for fixed effects and variance parameters (standard deviations and correlations of random effects and error terms) are calculated by function intervals.

intervals(fit.1)

## Approximate 95% confidence intervals
## 
##  Fixed effects:
##                  lower       est.      upper
## (Intercept) 21.2069492 22.0425926 22.8782360
## I(age - 8)   0.5183866  0.6601852  0.8019837
## attr(,"label")
## [1] "Fixed effects:"
## 
##  Random Effects:
##   Level: Subject 
##                                  lower      est.     upper
## sd((Intercept))              1.3043572 1.8866291 2.7288302
## sd(I(age - 8))               0.1024815 0.2264280 0.5002817
## cor((Intercept),I(age - 8)) -0.6100575 0.2085642 0.8118229
## 
##  Within-group standard error:
##    lower     est.    upper 
## 1.084958 1.310039 1.581815

Model with the mean intercept and slope depending on Sex is obtained as follows (the name of the first argument fixed is usually omitted).

fit.2 <- lme(distance ~ I(age - 8)*Sex, random = ~ I(age - 8) | Subject, data = Ort2)
summary(fit.2)

## Linear mixed-effects model fit by REML
##  Data: Ort2 
##        AIC      BIC    logLik
##   448.5817 469.7368 -216.2908
## 
## Random effects:
##  Formula: ~I(age - 8) | Subject
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev    Corr  
## (Intercept) 1.7983213 (Intr)
## I(age - 8)  0.1803446 -0.091
## Residual    1.3100400       
## 
## Fixed effects: distance ~ I(age - 8) * Sex 
##                          Value Std.Error DF  t-value p-value
## (Intercept)          22.615625 0.5265040 79 42.95433  0.0000
## I(age - 8)            0.784375 0.0859994 79  9.12070  0.0000
## SexFemale            -1.406534 0.8248732 25 -1.70515  0.1006
## I(age - 8):SexFemale -0.304830 0.1347352 79 -2.26243  0.0264
##  Correlation: 
##                      (Intr) I(g-8) SexFml
## I(age - 8)           -0.396              
## SexFemale            -0.638  0.253       
## I(age - 8):SexFemale  0.253 -0.638 -0.396
## 
## Standardized Within-Group Residuals:
##          Min           Q1          Med           Q3          Max 
## -3.168084303 -0.385940430  0.007102876  0.445155640  3.849462346 
## 
## Number of Observations: 108
## Number of Groups: 27

Calculating fitted values and residuals

The fitted values can be obtained by calling the function fitted. There are two types of the fitted values, the argument level decides, which of them are calculated. The variant

fitted(fit.1, level = 0)

##      M01      M01      M01      M01      M02      M02      M02      M02 
## 22.04259 23.36296 24.68333 26.00370 22.04259 23.36296 24.68333 26.00370 
##      M03      M03      M03      M03      M04      M04      M04      M04 
## 22.04259 23.36296 24.68333 26.00370 22.04259 23.36296 24.68333 26.00370 
##      M05      M05      M05      M05      M06      M06      M06      M06 
## 22.04259 23.36296 24.68333 26.00370 22.04259 23.36296 24.68333 26.00370 
##      M07      M07      M07      M07      M08      M08      M08      M08 
## 22.04259 23.36296 24.68333 26.00370 22.04259 23.36296 24.68333 26.00370 
##      M09      M09      M09      M09      M10      M10      M10      M10 
## 22.04259 23.36296 24.68333 26.00370 22.04259 23.36296 24.68333 26.00370 
##      M11      M11      M11      M11      M12      M12      M12      M12 
## 22.04259 23.36296 24.68333 26.00370 22.04259 23.36296 24.68333 26.00370 
##      M13      M13      M13      M13      M14      M14      M14      M14 
## 22.04259 23.36296 24.68333 26.00370 22.04259 23.36296 24.68333 26.00370 
##      M15      M15      M15      M15      M16      M16      M16      M16 
## 22.04259 23.36296 24.68333 26.00370 22.04259 23.36296 24.68333 26.00370 
##      F01      F01      F01      F01      F02      F02      F02      F02 
## 22.04259 23.36296 24.68333 26.00370 22.04259 23.36296 24.68333 26.00370 
##      F03      F03      F03      F03      F04      F04      F04      F04 
## 22.04259 23.36296 24.68333 26.00370 22.04259 23.36296 24.68333 26.00370 
##      F05      F05      F05      F05      F06      F06      F06      F06 
## 22.04259 23.36296 24.68333 26.00370 22.04259 23.36296 24.68333 26.00370 
##      F07      F07      F07      F07      F08      F08      F08      F08 
## 22.04259 23.36296 24.68333 26.00370 22.04259 23.36296 24.68333 26.00370 
##      F09      F09      F09      F09      F10      F10      F10      F10 
## 22.04259 23.36296 24.68333 26.00370 22.04259 23.36296 24.68333 26.00370 
##      F11      F11      F11      F11 
## 22.04259 23.36296 24.68333 26.00370 
## attr(,"label")
## [1] "Fitted values [mm]"

generates the population-level fitted values \(\widehat{\beta}_1+(t_j-8)\widehat{\beta}_2\), the variant

fitted(fit.1, level = 1)

##      M01      M01      M01      M01      M02      M02      M02      M02 
## 24.81965 26.57139 28.32313 30.07487 21.43115 22.78054 24.12993 25.47931 
##      M03      M03      M03      M03      M04      M04      M04      M04 
## 22.15551 23.54758 24.93966 26.33173 24.51597 25.70457 26.89317 28.08177 
##      M05      M05      M05      M05      M06      M06      M06      M06 
## 21.07073 22.44230 23.81387 25.18544 23.92577 25.41252 26.89927 28.38602 
##      M07      M07      M07      M07      M08      M08      M08      M08 
## 21.71382 23.10399 24.49416 25.88433 22.14500 23.27590 24.40680 25.53769 
##      M09      M09      M09      M09      M10      M10      M10      M10 
## 22.68547 24.27765 25.86984 27.46203 26.38500 28.12767 29.87033 31.61299 
##      M11      M11      M11      M11      M12      M12      M12      M12 
## 21.98248 23.06519 24.14791 25.23063 21.96111 23.51061 25.06011 26.60961 
##      M13      M13      M13      M13      M14      M14      M14      M14 
## 21.22240 23.37010 25.51781 27.66552 22.83398 24.12611 25.41824 26.71037 
##      M15      M15      M15      M15      M16      M16      M16      M16 
## 23.17304 24.90977 26.64649 28.38322 21.30401 22.48667 23.66933 24.85199 
##      F01      F01      F01      F01      F02      F02      F02      F02 
## 20.13096 21.09491 22.05886 23.02281 21.10961 22.44969 23.78978 25.12987 
##      F03      F03      F03      F03      F04      F04      F04      F04 
## 21.67494 23.09660 24.51825 25.93991 22.87286 24.13350 25.39415 26.65480 
##      F05      F05      F05      F05      F06      F06      F06      F06 
## 21.21574 22.20019 23.18465 24.16910 19.92955 20.87681 21.82406 22.77132 
##      F07      F07      F07      F07      F08      F08      F08      F08 
## 21.30401 22.48667 23.66933 24.85199 21.89771 22.86928 23.84085 24.81242 
##      F09      F09      F09      F09      F10      F10      F10      F10 
## 20.00731 20.89160 21.77588 22.66017 17.75649 18.57567 19.39486 20.21405 
##      F11      F11      F11      F11 
## 23.92577 25.41252 26.89927 28.38602 
## attr(,"label")
## [1] "Fitted values [mm]"

[same as fitted(fit.1)] generates the subject-level fitted values \(\widehat{\beta}_1+(t_j-8)\widehat{\beta}_2+\widehat{b}_{i1}+(t_j-8)\widehat{b}_{i2}\).

The residuals are calculated by the function resid, which also takes the level argument. Standardized residuals are obtained by specifying the argument type = "pearson". For example, this command generates the standardized subject-level residuals:

resid(fit.1, level = 1, type = "pearson")

##          M01          M01          M01          M01          M02 
##  0.901001924 -1.199499401  0.516678092  0.706184057  0.052552954 
##          M02          M02          M02          M03          M03 
## -0.214145956 -0.862512747  0.779127752  0.644630246 -0.799658385 
##          M03          M03          M04          M04          M04 
## -0.717275490  0.891778933  0.751146791  1.370517172 -0.300119738 
##          M04          M05          M05          M05          M05 
## -0.825753002 -0.817324439  0.807381412 -1.002923672  0.621782180 
##          M06          M06          M06          M06          M07 
##  0.438333232  0.066777324  0.076889297  0.087001269  0.218451184 
##          M07          M07          M07          M08          M08 
## -0.842715677  0.004456870  0.469961536  1.415987558 -1.355606487 
##          M08          M08          M09          M09          M09 
##  0.071146168 -0.028772705  0.240095394 -2.883617929  3.916033672 
##          M09          M10          M10          M10          M10 
## -1.116019061  0.851117962 -0.097451617  0.862318213 -0.086251366 
##          M11          M11          M11          M11          M12 
##  0.776712695 -0.049764870 -0.494574553 -0.176048473 -0.351981898 
##          M12          M12          M12          M13          M13 
## -0.008098210 -0.809218167  1.061337048 -3.223105126  0.862490321 
##          M13          M13          M14          M14          M14 
##  0.368071188  1.400323581 -0.254937990  1.048739799  0.062410298 
##          M14          M15          M15          M15          M15 
## -0.542251321 -0.132087308 -0.312788790 -0.493490272  1.234147654 
##          M16          M16          M16          M16          F01 
##  0.531277285 -0.753159410 -0.129256696  0.112978136  0.663372276 
##          F01          F01          F01          F02          F02 
## -0.835781594 -0.426596055 -0.017410517 -0.083666976 -0.724934997 
##          F02          F02          F03          F03          F03 
##  0.160468509  0.282536251 -0.896874160  0.689600732 -0.013931665 
##          F03          F04          F04          F04          F04 
##  0.045871701  0.478719473  0.279759153 -0.300869048 -0.118161486 
##          F05          F05          F05          F05          F06 
##  0.216987553  0.610521663 -0.522615755 -0.510749527  0.053775714 
##          F06          F06          F06          F07          F07 
##  0.094037024 -0.629037430 -0.207108239  0.149609404  0.010176354 
##          F07          F07          F08          F08          F08 
## -0.510924578  0.112978136  0.841416994  0.099783336 -0.260182441 
##          F08          F09          F09          F09          F09 
## -0.620148217 -0.005580887  0.082747259  0.171075404 -0.885600095 
##          F10          F10          F10          F10          F11 
## -0.959120398  0.323902786 -0.301413439 -0.545061782  0.438333232 
##          F11          F11          F11 
## -0.314890558  0.840225060 -0.294666612 
## attr(,"label")
## [1] "Standardized residuals"

Residual plots

The basic plot of the standardized residuals versus fitted values, both evaluated at the innermost level of nesting, is obtained by the function plot:

plot(fit.1)

plot of chunk unnamed-chunk-42

However, the plot function has additional arguments for lme objects that allow considerable flexibility in creating residual plots. For example, a plot of standardized residuals versus fitted values by gender is created by the call

plot(fit.1, resid(., type = "p") ~ fitted(.) | Sex, abline = 0)

plot of chunk unnamed-chunk-43

and box-plots of residuals by subject by the call

plot(fit.1, Subject ~ resid(.))

plot of chunk unnamed-chunk-44

The formula argument that specifies the type of the requested plot is explained in the help to lattice library functions dotplot, xyplot, and bwplot.

Plotting results of fitted models

A plot of observed versus fitted values by subject can be obtained by another application of the plot function:

plot(fit.1, distance ~ fitted(.) | Subject, abline = c(0, 1))

plot of chunk unnamed-chunk-45

A scatter plot of the coefficients \(\widehat{\beta}_1+\widehat{b}_{i1}\) versus \(\widehat{\beta}_2+\widehat{b}_{i2}\) is obtained by a plain call to the pairs function

pairs(fit.1)

plot of chunk unnamed-chunk-46

This is the same as specifying pairs(fit.1, form = ~ coef(.)). Other values of the form argument generate alternative plots such as a scatter plot of coefficients by gender for identifying unusual subjects

pairs(fit.1, ~ coef(., augFrame = TRUE) | Sex, id = 0.1, adj = -0.5)

plot of chunk unnamed-chunk-47

or a scatter plot of estimated random effects

pairs(fit.1, ~ranef(.))

plot of chunk unnamed-chunk-48 or estimated random effects by Sex

pairs(fit.1, ~ranef(.) | Sex)

A plot of random effects by subject is generated by the function plot:

plot(ranef(fit.1))

plot of chunk unnamed-chunk-50

A plot of random effects by gender is another variant:

fit1RE <- ranef(fit.1, aug = TRUE)
print(fit1RE)

##     (Intercept)   I(age - 8) distance age    Sex
## M16  -0.7385868 -0.068853784   23.000  11   Male
## M05  -0.9718653  0.025600396   23.000  11   Male
## M02  -0.6114390  0.014507868   23.375  11   Male
## M11  -0.0601169 -0.118826058   23.625  11   Male
## M07  -0.3287723  0.034900066   23.750  11   Male
## M08   0.1024078 -0.094736351   23.875  11   Male
## M03   0.1129163  0.035852391   24.250  11   Male
## M12  -0.0814824  0.114564208   24.250  11   Male
## M13  -0.8201976  0.413669111   24.250  11   Male
## M14   0.7913862 -0.014119883   24.875  11   Male
## M09   0.6428730  0.135908731   25.125  11   Male
## M15   1.1304470  0.208177854   25.875  11   Male
## M06   1.8831736  0.083191273   26.375  11   Male
## M04   2.4733754 -0.065885016   26.625  11   Male
## M01   2.7770593  0.215684665   27.750  11   Male
## M10   4.3424093  0.211146622   29.500  11   Male
## F10  -4.2861070 -0.250590709   18.500  11 Female
## F09  -2.0352814 -0.218041865   21.125  11 Female
## F06  -2.1130409 -0.186557138   21.125  11 Female
## F01  -1.9116365 -0.178209794   21.375  11 Female
## F05  -0.8268549 -0.167957799   22.625  11 Female
## F07  -0.7385868 -0.068853784   23.000  11 Female
## F02  -0.9329855  0.009858033   23.000  11 Female
## F08  -0.1448821 -0.174400493   23.375  11 Female
## F03  -0.3676520  0.050642429   23.750  11 Female
## F04   0.8302660 -0.029862247   24.875  11 Female
## F11   1.8831736  0.083191273   26.375  11 Female

plot(fit1RE, form = ~ Sex)

plot of chunk unnamed-chunk-51

Histograms of estimated random effects may help in evaluating the normality assumption of the random effects.

par(mfrow = c(1, 2))
hist(fit1RE[, "(Intercept)"], xlab = expression(hat(b)[0]), prob = TRUE, main = "Intercept", col = "salmon")
hist(fit1RE[, "I(age - 8)"], xlab = expression(hat(b)[1]), prob = TRUE, main = "Slope", col = "salmon")

plot of chunk unnamed-chunk-52

par(mfrow = c(1, 1))

Nevertheless, be aware of the shrinkage effect of the empirical Bayes estimates of the random effects. It causes that the empirical distribution of those estimators may look normal even if the true distribution of random effects is not. The same warning holds indeed for interpretation of the QQ plots based on the empirical Bayes estimates of the random effects.

par(mfrow = c(1, 2))
qqnorm(fit1RE[, "(Intercept)"], main = "Intercept", col = "red4", bg = "salmon", pch = 21)
qqline(fit1RE[, "(Intercept)"], col = "darkblue", lwd = 2)
qqnorm(fit1RE[, "I(age - 8)"], main = "Slope", col = "red4", bg = "salmon", pch = 21)
qqline(fit1RE[, "I(age - 8)"], col = "darkblue", lwd = 2)

plot of chunk unnamed-chunk-53

par(mfrow = c(1, 1))

For the next illustration (a boxplot of random slopes by gender), we need to recode the right-hand side of fit.1.

Ort2$age.m.8 <- Ort2$age - 8
fit.1x=lme(fixed= distance ~ age.m.8, random = ~ age.m.8 | Subject, data=Ort2)
fit1xRE <- ranef(fit.1x, aug = TRUE)
print(fit1xRE)

##     (Intercept)      age.m.8 distance age    Sex
## M16  -0.7385868 -0.068853784   23.000  11   Male
## M05  -0.9718653  0.025600396   23.000  11   Male
## M02  -0.6114390  0.014507868   23.375  11   Male
## M11  -0.0601169 -0.118826058   23.625  11   Male
## M07  -0.3287723  0.034900066   23.750  11   Male
## M08   0.1024078 -0.094736351   23.875  11   Male
## M03   0.1129163  0.035852391   24.250  11   Male
## M12  -0.0814824  0.114564208   24.250  11   Male
## M13  -0.8201976  0.413669111   24.250  11   Male
## M14   0.7913862 -0.014119883   24.875  11   Male
## M09   0.6428730  0.135908731   25.125  11   Male
## M15   1.1304470  0.208177854   25.875  11   Male
## M06   1.8831736  0.083191273   26.375  11   Male
## M04   2.4733754 -0.065885016   26.625  11   Male
## M01   2.7770593  0.215684665   27.750  11   Male
## M10   4.3424093  0.211146622   29.500  11   Male
## F10  -4.2861070 -0.250590709   18.500  11 Female
## F09  -2.0352814 -0.218041865   21.125  11 Female
## F06  -2.1130409 -0.186557138   21.125  11 Female
## F01  -1.9116365 -0.178209794   21.375  11 Female
## F05  -0.8268549 -0.167957799   22.625  11 Female
## F07  -0.7385868 -0.068853784   23.000  11 Female
## F02  -0.9329855  0.009858033   23.000  11 Female
## F08  -0.1448821 -0.174400493   23.375  11 Female
## F03  -0.3676520  0.050642429   23.750  11 Female
## F04   0.8302660 -0.029862247   24.875  11 Female
## F11   1.8831736  0.083191273   26.375  11 Female

plot(fit1xRE, form = age.m.8 ~ Sex)

plot of chunk unnamed-chunk-54

We can compare the random intercepts and slopes estimated by the LME model to individual-specific intercepts and slopes estimated by the sequence of linear models. Numerical summaries are created by function compareFits, which has a plotting method to display them graphically:

cmp.1 <- compareFits(coef(fitslm2), coef(fit.1))
plot(cmp.1, mark = fixef(fit.1))

plot of chunk unnamed-chunk-55

Notice that the purple dots (coefficients from the LME model) are all shifted towards the fixed effect estimates shown by vertical dashed lines (shrinkage estimators).

We can also compare graphical representations of the fitted lines by the command comparePred.

cmp.2 <- comparePred(fitslm2, fit.1, length.out = 2)

## Error in sprintf(gettext(fmt, domain = domain), ...): invalid type of argument[1]: 'symbol'

plot(cmp.2, layout = c(8, 4), between = list(y = c(0, 0.5)))

## Error in plot(cmp.2, layout = c(8, 4), between = list(y = c(0, 0.5))): object 'cmp.2' not found

Specifying the variance matrix of random effects

By default, the variance matrix of random effects \(D\) is any positive definite matrix. The user can make the random effects uncorrelated (requiring \(\mathsf{cor}(b_{i1},b_{i2})=0\)) by enclosing the random effect formula in the pdDiag function. This requests a diagonal variance matrix of the random effects. There are several other built-in functions for specification of the structure of this matrix, their names all start with the prefix pd (see help to pdClasses).

fit.1.uncor <- lme(fixed = distance ~ age.m.8, random = pdDiag(~ age.m.8), data = Ort2)
summary(fit.1.uncor)

## Linear mixed-effects model fit by REML
##  Data: Ort2 
##        AIC      BIC    logLik
##   452.8916 466.2088 -221.4458
## 
## Random effects:
##  Formula: ~age.m.8 | Subject
##  Structure: Diagonal
##         (Intercept)   age.m.8 Residual
## StdDev:    1.962957 0.2469767 1.293767
## 
## Fixed effects: distance ~ age.m.8 
##                 Value Std.Error DF  t-value p-value
## (Intercept) 22.042593 0.4314009 80 51.09538       0
## age.m.8      0.660185 0.0732042 80  9.01841       0
##  Correlation: 
##         (Intr)
## age.m.8 -0.294
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -2.98617508 -0.49418192  0.01618754  0.42884417  3.96519081 
## 
## Number of Observations: 108
## Number of Groups: 27

Notice that the grouping factor | Subject cannot be specified in the random term when the pdDiag function is used. The grouping is extracted from the groupedData object Ort2.

Back to course page