NMSA407 Linear Regression: Tutorial

ANOVA Tables of Type I, II and III

Data Cars2004nh

Introduction

Load used data and calculate basic summaries

``````data(Cars2004nh, package = "mffSM")
``````
``````##                         vname type drive price.retail price.dealer   price cons.city cons.highway
## 1          Chevrolet.Aveo.4dr    1     1        11690        10965 11327.5       8.4          6.9
## 2 Chevrolet.Aveo.LS.4dr.hatch    1     1        12585        11802 12193.5       8.4          6.9
## 3      Chevrolet.Cavalier.2dr    1     1        14610        13697 14153.5       9.0          6.4
## 4      Chevrolet.Cavalier.4dr    1     1        14810        13884 14347.0       9.0          6.4
## 5   Chevrolet.Cavalier.LS.2dr    1     1        16385        15357 15871.0       9.0          6.4
## 6           Dodge.Neon.SE.4dr    1     1        13670        12849 13259.5       8.1          6.5
##   consumption engine.size ncylinder horsepower weight      iweight  lweight wheel.base length width
## 1        7.65         1.6         4        103   1075 0.0009302326 6.980076        249    424   168
## 2        7.65         1.6         4        103   1065 0.0009389671 6.970730        249    389   168
## 3        7.70         2.2         4        140   1187 0.0008424600 7.079184        264    465   175
## 4        7.70         2.2         4        140   1214 0.0008237232 7.101676        264    465   173
## 5        7.70         2.2         4        140   1187 0.0008424600 7.079184        264    465   175
## 6        7.30         2.0         4        132   1171 0.0008539710 7.065613        267    442   170
##      ftype fdrive
## 1 personal  front
## 2 personal  front
## 3 personal  front
## 4 personal  front
## 5 personal  front
## 6 personal  front
``````
``````dim(Cars2004nh)
``````
``````## [1] 425  20
``````
``````summary(Cars2004nh)
``````
``````##     vname                type           drive        price.retail     price.dealer
##  Length:425         Min.   :1.000   Min.   :1.000   Min.   : 10280   Min.   :  9875
##  Class :character   1st Qu.:1.000   1st Qu.:1.000   1st Qu.: 20370   1st Qu.: 18973
##  Mode  :character   Median :1.000   Median :1.000   Median : 27905   Median : 25672
##                     Mean   :2.219   Mean   :1.692   Mean   : 32866   Mean   : 30096
##                     3rd Qu.:3.000   3rd Qu.:2.000   3rd Qu.: 39235   3rd Qu.: 35777
##                     Max.   :6.000   Max.   :3.000   Max.   :192465   Max.   :173560
##
##      price          cons.city      cons.highway     consumption     engine.size      ncylinder
##  Min.   : 10078   Min.   : 6.20   Min.   : 5.100   Min.   : 5.65   Min.   :1.300   Min.   :-1.000
##  1st Qu.: 19600   1st Qu.:11.20   1st Qu.: 8.100   1st Qu.: 9.65   1st Qu.:2.400   1st Qu.: 4.000
##  Median : 26656   Median :12.40   Median : 9.000   Median :10.70   Median :3.000   Median : 6.000
##  Mean   : 31481   Mean   :12.36   Mean   : 9.142   Mean   :10.75   Mean   :3.208   Mean   : 5.791
##  3rd Qu.: 37514   3rd Qu.:13.80   3rd Qu.: 9.800   3rd Qu.:11.65   3rd Qu.:3.900   3rd Qu.: 6.000
##  Max.   :183012   Max.   :23.50   Max.   :19.600   Max.   :21.55   Max.   :8.300   Max.   :12.000
##                   NA's   :14      NA's   :14       NA's   :14
##    horsepower        weight        iweight             lweight        wheel.base        length
##  Min.   :100.0   Min.   : 923   Min.   :0.0003067   Min.   :6.828   Min.   :226.0   Min.   :363.0
##  1st Qu.:165.0   1st Qu.:1412   1st Qu.:0.0005542   1st Qu.:7.253   1st Qu.:262.0   1st Qu.:450.0
##  Median :210.0   Median :1577   Median :0.0006341   Median :7.363   Median :272.0   Median :472.0
##  Mean   :216.8   Mean   :1626   Mean   :0.0006412   Mean   :7.373   Mean   :274.9   Mean   :470.6
##  3rd Qu.:255.0   3rd Qu.:1804   3rd Qu.:0.0007082   3rd Qu.:7.498   3rd Qu.:284.0   3rd Qu.:490.0
##  Max.   :500.0   Max.   :3261   Max.   :0.0010834   Max.   :8.090   Max.   :366.0   Max.   :577.0
##                  NA's   :2      NA's   :2           NA's   :2       NA's   :2       NA's   :26
##      width            ftype       fdrive
##  Min.   :163.0   personal:242   front:223
##  1st Qu.:175.0   wagon   : 30   rear :110
##  Median :180.0   SUV     : 60   4x4  : 92
##  Mean   :181.1   pickup  : 24
##  3rd Qu.:185.0   sport   : 49
##  Max.   :206.0   minivan : 20
##  NA's   :28
``````

Complete cases subset used here

To be able to compare a model fitted here with other models where also other covariates will be included, we restrict ourselves to a subset of the dataset where all variables `consumption`, `lweight` and `engine.size` are known.

``````isComplete <- complete.cases(Cars2004nh[, c("consumption", "lweight", "engine.size")])
sum(!isComplete)
``````
``````## [1] 16
``````
``````CarsNow <- subset(Cars2004nh, isComplete, select = c("consumption", "drive", "fdrive", "weight", "lweight", "engine.size"))
dim(CarsNow)
``````
``````## [1] 409   6
``````
``````summary(CarsNow)
``````
``````##   consumption        drive         fdrive        weight        lweight       engine.size
##  Min.   : 5.65   Min.   :1.000   front:212   Min.   : 923   Min.   :6.828   Min.   :1.300
##  1st Qu.: 9.65   1st Qu.:1.000   rear :108   1st Qu.:1415   1st Qu.:7.255   1st Qu.:2.400
##  Median :10.70   Median :1.000   4x4  : 89   Median :1577   Median :7.363   Median :3.000
##  Mean   :10.75   Mean   :1.699               Mean   :1622   Mean   :7.371   Mean   :3.178
##  3rd Qu.:11.65   3rd Qu.:2.000               3rd Qu.:1804   3rd Qu.:7.498   3rd Qu.:3.800
##  Max.   :21.55   Max.   :3.000               Max.   :2903   Max.   :7.973   Max.   :6.000
``````

Dependence of `consumption` on `lweight` and `fdrive`

Scatterplots `consumption` on `lweight` by `fdrive`

``````par(mfrow = c(2, 2), bty = BTY, mar = c(5, 4, 3, 1) + 0.1)
for (dr in levels(CarsNow[, "fdrive"])){
plot(consumption ~ lweight, data = subset(CarsNow, fdrive == dr), pch = PCH, col = COL, bg = BGC,
xlab = "Log(weight) [log(kg)]", ylab = "Consumption [l/100 km]", main = dr,
xlim = range(CarsNow[, "lweight"]), ylim = range(CarsNow[, "consumption"]))
}
``````

Scatterplots `consumption` on `lweight` by `fdrive` in one plot

``````FCOL <- rainbow_hcl(3)
FCOL2 <- c("red3", "darkgreen", "darkblue")
FPCH <- c(21, 23, 24)
names(FCOL) <- names(FCOL2) <- names(FPCH) <- levels(CarsNow[, "fdrive"])
par(mfrow = c(1, 1), bty = BTY, mar = c(4, 4, 1, 1) + 0.1)
plot(consumption ~ lweight, data = CarsNow, pch = FPCH[fdrive], col = FCOL2[fdrive], bg = FCOL[fdrive],
xlab = "Log(weight) [log(kg)]", ylab = "Consumption [l/100 km]")
legend(6.9, 21, legend = levels(CarsNow[, "fdrive"]), title = "Drive", pch = FPCH, col = FCOL2, pt.bg = FCOL)
``````

Series of models with `lweight` and `fdrive` as covariates

``````mInter  <- lm(consumption ~ fdrive + lweight + fdrive:lweight, data = CarsNow)
mAddit  <- lm(consumption ~ fdrive + lweight,                  data = CarsNow)
mDrive  <- lm(consumption ~ fdrive,                            data = CarsNow)
mWeight <- lm(consumption ~ lweight,                           data = CarsNow)
m0      <- lm(consumption ~ 1,                                 data = CarsNow)
``````

Interaction model

``````summary(mInter)
``````
``````##
## Call:
## lm(formula = consumption ~ fdrive + lweight + fdrive:lweight,
##     data = CarsNow)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -3.4038 -0.6438 -0.1021  0.5672  4.3237
##
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)
## (Intercept)        -52.8047     2.5266 -20.900  < 2e-16 ***
## fdriverear          19.8445     5.1297   3.869 0.000128 ***
## fdrive4x4          -12.5366     4.6506  -2.696 0.007319 **
## lweight              8.5716     0.3461  24.763  < 2e-16 ***
## fdriverear:lweight  -2.5890     0.6956  -3.722 0.000226 ***
## fdrive4x4:lweight    1.7837     0.6240   2.858 0.004480 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9404 on 403 degrees of freedom
## Multiple R-squared:  0.8081, Adjusted R-squared:  0.8057
## F-statistic: 339.4 on 5 and 403 DF,  p-value: < 2.2e-16
``````

``````summary(mAddit)
``````
``````##
## Call:
## lm(formula = consumption ~ fdrive + lweight, data = CarsNow)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -3.4064 -0.6649 -0.1323  0.5747  5.1533
##
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) -52.5605     1.9627 -26.780  < 2e-16 ***
## fdriverear    0.6964     0.1181   5.897 7.83e-09 ***
## fdrive4x4     0.8787     0.1363   6.445 3.29e-10 ***
## lweight       8.5381     0.2688  31.762  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9726 on 405 degrees of freedom
## Multiple R-squared:  0.7937, Adjusted R-squared:  0.7922
## F-statistic: 519.5 on 3 and 405 DF,  p-value: < 2.2e-16
``````

Model with `fdrive` only

``````summary(mDrive)
``````
``````##
## Call:
## lm(formula = consumption ~ fdrive, data = CarsNow)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -4.0913 -1.2489 -0.0440  0.9587  9.0511
##
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   9.7413     0.1247  78.149  < 2e-16 ***
## fdriverear    1.5527     0.2146   7.237 2.32e-12 ***
## fdrive4x4     2.7576     0.2292  12.030  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.815 on 406 degrees of freedom
## Multiple R-squared:  0.2799, Adjusted R-squared:  0.2764
## F-statistic: 78.91 on 2 and 406 DF,  p-value: < 2.2e-16
``````

Model with `lweight` only

``````summary(mWeight)
``````
``````##
## Call:
## lm(formula = consumption ~ lweight, data = CarsNow)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -3.6544 -0.7442 -0.1526  0.5160  5.1616
##
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) -58.2480     1.8941  -30.75   <2e-16 ***
## lweight       9.3606     0.2569   36.44   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.035 on 407 degrees of freedom
## Multiple R-squared:  0.7654, Adjusted R-squared:  0.7648
## F-statistic:  1328 on 1 and 407 DF,  p-value: < 2.2e-16
``````

Only intercept model

``````summary(m0)
``````
``````##
## Call:
## lm(formula = consumption ~ 1, data = CarsNow)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -5.1013 -1.1013 -0.0513  0.8987 10.7987
##
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  10.7513     0.1055   101.9   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.134 on 408 degrees of freedom
``````

Explicit comparison of the two models

``````anova(mAddit, mInter)
``````
``````## Analysis of Variance Table
##
## Model 1: consumption ~ fdrive + lweight
## Model 2: consumption ~ fdrive + lweight + fdrive:lweight
##   Res.Df   RSS Df Sum of Sq      F    Pr(>F)
## 1    405 383.1
## 2    403 356.4  2    26.702 15.097 4.758e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
``````

Type I ANOVA table

• The last F-test is (indeed) the same as above.
``````anova(mInter)
``````
``````## Analysis of Variance Table
##
## Response: consumption
##                 Df Sum Sq Mean Sq  F value    Pr(>F)
## fdrive           2 519.89  259.94  293.935 < 2.2e-16 ***
## lweight          1 954.26  954.26 1079.040 < 2.2e-16 ***
## fdrive:lweight   2  26.70   13.35   15.097 4.758e-07 ***
## Residuals      403 356.40    0.88
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
``````

Type I ANOVA tables

• The results (indeed) depend on the ordering of the covariates.
``````mInter1 <- lm(consumption ~ fdrive + lweight + fdrive:lweight, data = CarsNow)
anova(mInter1)
``````
``````## Analysis of Variance Table
##
## Response: consumption
##                 Df Sum Sq Mean Sq  F value    Pr(>F)
## fdrive           2 519.89  259.94  293.935 < 2.2e-16 ***
## lweight          1 954.26  954.26 1079.040 < 2.2e-16 ***
## fdrive:lweight   2  26.70   13.35   15.097 4.758e-07 ***
## Residuals      403 356.40    0.88
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
``````
``````mInter2 <- lm(consumption ~ lweight + fdrive + fdrive:lweight, data = CarsNow)
anova(mInter2)
``````
``````## Analysis of Variance Table
##
## Response: consumption
##                 Df  Sum Sq Mean Sq  F value    Pr(>F)
## lweight          1 1421.57 1421.57 1607.458 < 2.2e-16 ***
## fdrive           2   52.58   26.29   29.726 9.079e-13 ***
## lweight:fdrive   2   26.70   13.35   15.097 4.758e-07 ***
## Residuals      403  356.40    0.88
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
``````

Type II ANOVA tables

• Function `Anova` comes from package `car`.
``````library("car")
Anova(mInter1, type = "II")
``````
``````## Anova Table (Type II tests)
##
## Response: consumption
##                Sum Sq  Df  F value    Pr(>F)
## fdrive          52.58   2   29.726 9.079e-13 ***
## lweight        954.26   1 1079.040 < 2.2e-16 ***
## fdrive:lweight  26.70   2   15.097 4.758e-07 ***
## Residuals      356.40 403
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
``````
``````Anova(mInter2, type = "II")     ### the same results
``````
``````## Anova Table (Type II tests)
##
## Response: consumption
##                Sum Sq  Df  F value    Pr(>F)
## lweight        954.26   1 1079.040 < 2.2e-16 ***
## fdrive          52.58   2   29.726 9.079e-13 ***
## lweight:fdrive  26.70   2   15.097 4.758e-07 ***
## Residuals      356.40 403
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
``````

Type III ANOVA tables

``````Anova(mInter1, type = "III")
``````
``````## Anova Table (Type III tests)
##
## Response: consumption
##                Sum Sq  Df F value    Pr(>F)
## (Intercept)    386.28   1 436.793 < 2.2e-16 ***
## fdrive          26.49   2  14.979 5.310e-07 ***
## lweight        542.30   1 613.216 < 2.2e-16 ***
## fdrive:lweight  26.70   2  15.097 4.758e-07 ***
## Residuals      356.40 403
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
``````
``````Anova(mInter2, type = "III")     ### the same results
``````
``````## Anova Table (Type III tests)
##
## Response: consumption
##                Sum Sq  Df F value    Pr(>F)
## (Intercept)    386.28   1 436.793 < 2.2e-16 ***
## lweight        542.30   1 613.216 < 2.2e-16 ***
## fdrive          26.49   2  14.979 5.310e-07 ***
## lweight:fdrive  26.70   2  15.097 4.758e-07 ***
## Residuals      356.40 403
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
``````

Use three different parameterizations of the categorical covariate `fdrive`

``````mInter <- lm(consumption ~ fdrive + lweight + fdrive:lweight, data = CarsNow)
mInterSAS <- lm(consumption ~ fdrive + lweight + fdrive:lweight, data = CarsNow, contrasts = list(fdrive = contr.SAS))
mIntersum <- lm(consumption ~ fdrive + lweight + fdrive:lweight, data = CarsNow, contrasts = list(fdrive = contr.sum))
``````

Interpretation of the model parameters?

``````summary(mInter)
``````
``````##
## Call:
## lm(formula = consumption ~ fdrive + lweight + fdrive:lweight,
##     data = CarsNow)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -3.4038 -0.6438 -0.1021  0.5672  4.3237
##
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)
## (Intercept)        -52.8047     2.5266 -20.900  < 2e-16 ***
## fdriverear          19.8445     5.1297   3.869 0.000128 ***
## fdrive4x4          -12.5366     4.6506  -2.696 0.007319 **
## lweight              8.5716     0.3461  24.763  < 2e-16 ***
## fdriverear:lweight  -2.5890     0.6956  -3.722 0.000226 ***
## fdrive4x4:lweight    1.7837     0.6240   2.858 0.004480 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9404 on 403 degrees of freedom
## Multiple R-squared:  0.8081, Adjusted R-squared:  0.8057
## F-statistic: 339.4 on 5 and 403 DF,  p-value: < 2.2e-16
``````
``````summary(mInterSAS)
``````
``````##
## Call:
## lm(formula = consumption ~ fdrive + lweight + fdrive:lweight,
##     data = CarsNow, contrasts = list(fdrive = contr.SAS))
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -3.4038 -0.6438 -0.1021  0.5672  4.3237
##
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)
## (Intercept)     -65.3414     3.9045 -16.735  < 2e-16 ***
## fdrive1          12.5366     4.6506   2.696  0.00732 **
## fdrive2          32.3811     5.9309   5.460 8.35e-08 ***
## lweight          10.3553     0.5192  19.943  < 2e-16 ***
## fdrive1:lweight  -1.7837     0.6240  -2.858  0.00448 **
## fdrive2:lweight  -4.3727     0.7961  -5.493 7.01e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9404 on 403 degrees of freedom
## Multiple R-squared:  0.8081, Adjusted R-squared:  0.8057
## F-statistic: 339.4 on 5 and 403 DF,  p-value: < 2.2e-16
``````
``````summary(mIntersum)
``````
``````##
## Call:
## lm(formula = consumption ~ fdrive + lweight + fdrive:lweight,
##     data = CarsNow, contrasts = list(fdrive = contr.sum))
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -3.4038 -0.6438 -0.1021  0.5672  4.3237
##
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)
## (Intercept)     -50.3688     2.1489 -23.440  < 2e-16 ***
## fdrive1          -2.4360     2.5972  -0.938    0.349
## fdrive2          17.4085     3.3558   5.188 3.38e-07 ***
## lweight           8.3031     0.2894  28.696  < 2e-16 ***
## fdrive1:lweight   0.2684     0.3517   0.763    0.446
## fdrive2:lweight  -2.3206     0.4529  -5.124 4.64e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9404 on 403 degrees of freedom
## Multiple R-squared:  0.8081, Adjusted R-squared:  0.8057
## F-statistic: 339.4 on 5 and 403 DF,  p-value: < 2.2e-16
``````

Type III ANOVA tables

• Different interpretations, now on the row `lweight`.
``````Anova(mInter, type = "III")
``````
``````## Anova Table (Type III tests)
##
## Response: consumption
##                Sum Sq  Df F value    Pr(>F)
## (Intercept)    386.28   1 436.793 < 2.2e-16 ***
## fdrive          26.49   2  14.979 5.310e-07 ***
## lweight        542.30   1 613.216 < 2.2e-16 ***
## fdrive:lweight  26.70   2  15.097 4.758e-07 ***
## Residuals      356.40 403
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
``````
``````Anova(mInterSAS, type = "III")
``````
``````## Anova Table (Type III tests)
##
## Response: consumption
##                Sum Sq  Df F value    Pr(>F)
## (Intercept)    247.68   1 280.063 < 2.2e-16 ***
## fdrive          26.49   2  14.979 5.310e-07 ***
## lweight        351.72   1 397.714 < 2.2e-16 ***
## fdrive:lweight  26.70   2  15.097 4.758e-07 ***
## Residuals      356.40 403
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
``````
``````Anova(mIntersum, type = "III")
``````
``````## Anova Table (Type III tests)
##
## Response: consumption
##                Sum Sq  Df F value    Pr(>F)
## (Intercept)    485.88   1 549.416 < 2.2e-16 ***
## fdrive          26.49   2  14.979 5.310e-07 ***
## lweight        728.22   1 823.440 < 2.2e-16 ***
## fdrive:lweight  26.70   2  15.097 4.758e-07 ***
## Residuals      356.40 403
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
``````