# NMSA407 Linear Regression: Tutorial

Two numeric covariates

Data Cars2004nh

## Introduction

### Load used data and calculate basic summaries

``````data(Cars2004nh, package = "mffSM")
``````
``````##                         vname type drive price.retail price.dealer   price cons.city cons.highway
## 1          Chevrolet.Aveo.4dr    1     1        11690        10965 11327.5       8.4          6.9
## 2 Chevrolet.Aveo.LS.4dr.hatch    1     1        12585        11802 12193.5       8.4          6.9
## 3      Chevrolet.Cavalier.2dr    1     1        14610        13697 14153.5       9.0          6.4
## 4      Chevrolet.Cavalier.4dr    1     1        14810        13884 14347.0       9.0          6.4
## 5   Chevrolet.Cavalier.LS.2dr    1     1        16385        15357 15871.0       9.0          6.4
## 6           Dodge.Neon.SE.4dr    1     1        13670        12849 13259.5       8.1          6.5
##   consumption engine.size ncylinder horsepower weight      iweight  lweight wheel.base length width
## 1        7.65         1.6         4        103   1075 0.0009302326 6.980076        249    424   168
## 2        7.65         1.6         4        103   1065 0.0009389671 6.970730        249    389   168
## 3        7.70         2.2         4        140   1187 0.0008424600 7.079184        264    465   175
## 4        7.70         2.2         4        140   1214 0.0008237232 7.101676        264    465   173
## 5        7.70         2.2         4        140   1187 0.0008424600 7.079184        264    465   175
## 6        7.30         2.0         4        132   1171 0.0008539710 7.065613        267    442   170
##      ftype fdrive
## 1 personal  front
## 2 personal  front
## 3 personal  front
## 4 personal  front
## 5 personal  front
## 6 personal  front
``````
``````dim(Cars2004nh)
``````
``````## [1] 425  20
``````
``````summary(Cars2004nh)
``````
``````##     vname                type           drive        price.retail     price.dealer
##  Length:425         Min.   :1.000   Min.   :1.000   Min.   : 10280   Min.   :  9875
##  Class :character   1st Qu.:1.000   1st Qu.:1.000   1st Qu.: 20370   1st Qu.: 18973
##  Mode  :character   Median :1.000   Median :1.000   Median : 27905   Median : 25672
##                     Mean   :2.219   Mean   :1.692   Mean   : 32866   Mean   : 30096
##                     3rd Qu.:3.000   3rd Qu.:2.000   3rd Qu.: 39235   3rd Qu.: 35777
##                     Max.   :6.000   Max.   :3.000   Max.   :192465   Max.   :173560
##
##      price          cons.city      cons.highway     consumption     engine.size      ncylinder
##  Min.   : 10078   Min.   : 6.20   Min.   : 5.100   Min.   : 5.65   Min.   :1.300   Min.   :-1.000
##  1st Qu.: 19600   1st Qu.:11.20   1st Qu.: 8.100   1st Qu.: 9.65   1st Qu.:2.400   1st Qu.: 4.000
##  Median : 26656   Median :12.40   Median : 9.000   Median :10.70   Median :3.000   Median : 6.000
##  Mean   : 31481   Mean   :12.36   Mean   : 9.142   Mean   :10.75   Mean   :3.208   Mean   : 5.791
##  3rd Qu.: 37514   3rd Qu.:13.80   3rd Qu.: 9.800   3rd Qu.:11.65   3rd Qu.:3.900   3rd Qu.: 6.000
##  Max.   :183012   Max.   :23.50   Max.   :19.600   Max.   :21.55   Max.   :8.300   Max.   :12.000
##                   NA's   :14      NA's   :14       NA's   :14
##    horsepower        weight        iweight             lweight        wheel.base        length
##  Min.   :100.0   Min.   : 923   Min.   :0.0003067   Min.   :6.828   Min.   :226.0   Min.   :363.0
##  1st Qu.:165.0   1st Qu.:1412   1st Qu.:0.0005542   1st Qu.:7.253   1st Qu.:262.0   1st Qu.:450.0
##  Median :210.0   Median :1577   Median :0.0006341   Median :7.363   Median :272.0   Median :472.0
##  Mean   :216.8   Mean   :1626   Mean   :0.0006412   Mean   :7.373   Mean   :274.9   Mean   :470.6
##  3rd Qu.:255.0   3rd Qu.:1804   3rd Qu.:0.0007082   3rd Qu.:7.498   3rd Qu.:284.0   3rd Qu.:490.0
##  Max.   :500.0   Max.   :3261   Max.   :0.0010834   Max.   :8.090   Max.   :366.0   Max.   :577.0
##                  NA's   :2      NA's   :2           NA's   :2       NA's   :2       NA's   :26
##      width            ftype       fdrive
##  Min.   :163.0   personal:242   front:223
##  1st Qu.:175.0   wagon   : 30   rear :110
##  Median :180.0   SUV     : 60   4x4  : 92
##  Mean   :181.1   pickup  : 24
##  3rd Qu.:185.0   sport   : 49
##  Max.   :206.0   minivan : 20
##  NA's   :28
``````

### Complete cases subset used here

To be able to compare a model fitted here with other models where also other covariates will be included, we restrict ourselves to a subset of the dataset where all variables `consumption`, `lweight` and `engine.size` are known.

``````isComplete <- complete.cases(Cars2004nh[, c("consumption", "lweight", "engine.size")])
sum(!isComplete)
``````
``````## [1] 16
``````
``````CarsNow <- subset(Cars2004nh, isComplete, select = c("consumption", "drive", "fdrive", "weight", "lweight", "engine.size", "horsepower"))
dim(CarsNow)
``````
``````## [1] 409   7
``````
``````summary(CarsNow)
``````
``````##   consumption        drive         fdrive        weight        lweight       engine.size
##  Min.   : 5.65   Min.   :1.000   front:212   Min.   : 923   Min.   :6.828   Min.   :1.300
##  1st Qu.: 9.65   1st Qu.:1.000   rear :108   1st Qu.:1415   1st Qu.:7.255   1st Qu.:2.400
##  Median :10.70   Median :1.000   4x4  : 89   Median :1577   Median :7.363   Median :3.000
##  Mean   :10.75   Mean   :1.699               Mean   :1622   Mean   :7.371   Mean   :3.178
##  3rd Qu.:11.65   3rd Qu.:2.000               3rd Qu.:1804   3rd Qu.:7.498   3rd Qu.:3.800
##  Max.   :21.55   Max.   :3.000               Max.   :2903   Max.   :7.973   Max.   :6.000
##    horsepower
##  Min.   :100.0
##  1st Qu.:165.0
##  Median :210.0
##  Mean   :215.8
##  3rd Qu.:250.0
##  Max.   :493.0
``````

## Dependence of `consumption` on `lweight` and `engine.size`

### Scatterplot matrix

• Function `scatterplotMatrix` is from package `car`.
``````library("car")
scatterplotMatrix(~consumption + lweight + engine.size, reg.line = lm, smooth = FALSE, spread = TRUE, diagonal = "histogram",
data = CarsNow, pch = 16)
``````

## Several linear models

### Intercept only model

``````(summary(m0 <- lm(consumption ~ 1, data = CarsNow)))
``````
``````##
## Call:
## lm(formula = consumption ~ 1, data = CarsNow)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -5.1013 -1.1013 -0.0513  0.8987 10.7987
##
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  10.7513     0.1055   101.9   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.134 on 408 degrees of freedom
``````

### Simple regression lines

``````(summary(m1lw <- lm(consumption ~ lweight, data = CarsNow)))
``````
``````##
## Call:
## lm(formula = consumption ~ lweight, data = CarsNow)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -3.6544 -0.7442 -0.1526  0.5160  5.1616
##
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) -58.2480     1.8941  -30.75   <2e-16 ***
## lweight       9.3606     0.2569   36.44   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.035 on 407 degrees of freedom
## Multiple R-squared:  0.7654, Adjusted R-squared:  0.7648
## F-statistic:  1328 on 1 and 407 DF,  p-value: < 2.2e-16
``````
``````(summary(m1es <- lm(consumption ~ engine.size, data = CarsNow)))
``````
``````##
## Call:
## lm(formula = consumption ~ engine.size, data = CarsNow)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -3.6598 -0.8339 -0.0680  0.7671  6.1454
##
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  5.51028    0.19981   27.58   <2e-16 ***
## engine.size  1.64905    0.05973   27.61   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.26 on 407 degrees of freedom
## Multiple R-squared:  0.6519, Adjusted R-squared:  0.651
## F-statistic: 762.1 on 1 and 407 DF,  p-value: < 2.2e-16
``````

``````(summary(m2 <- lm(consumption ~ lweight + engine.size, data = CarsNow)))
``````
``````##
## Call:
## lm(formula = consumption ~ lweight + engine.size, data = CarsNow)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -3.3243 -0.6741 -0.1286  0.5270  5.0459
##
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept) -42.65641    2.99243 -14.255  < 2e-16 ***
## lweight       7.01155    0.43501  16.118  < 2e-16 ***
## engine.size   0.54231    0.08304   6.531 1.96e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9854 on 406 degrees of freedom
## Multiple R-squared:  0.7877, Adjusted R-squared:  0.7867
## F-statistic: 753.3 on 2 and 406 DF,  p-value: < 2.2e-16
``````

### Some submodel F-tests

``````anova(m1es, m2)
``````
``````## Analysis of Variance Table
##
## Model 1: consumption ~ engine.size
## Model 2: consumption ~ lweight + engine.size
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)
## 1    407 646.54
## 2    406 394.26  1    252.28 259.79 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
``````
``````anova(m1lw, m2)
``````
``````## Analysis of Variance Table
##
## Model 1: consumption ~ lweight
## Model 2: consumption ~ lweight + engine.size
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)
## 1    407 435.68
## 2    406 394.26  1    41.415 42.648 1.962e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
``````

### Some estimated model-based means

``````pdata <- data.frame(lweight = rep(c(7, 7.4, 7.8), 3), engine.size = rep(c(2, 4, 6), each = 3))
print(pdata)
``````
``````##   lweight engine.size
## 1     7.0           2
## 2     7.4           2
## 3     7.8           2
## 4     7.0           4
## 5     7.4           4
## 6     7.8           4
## 7     7.0           6
## 8     7.4           6
## 9     7.8           6
``````
``````predict(m2, newdata = pdata, se.fit = TRUE, interval = "confidence", level = 0.95)
``````
``````## \$fit
##         fit       lwr       upr
## 1  7.509080  7.294631  7.723529
## 2 10.313701 10.080069 10.547333
## 3 13.118322 12.573170 13.663474
## 4  8.593698  8.148256  9.039140
## 5 11.398319 11.248898 11.547741
## 6 14.202940 13.919662 14.486218
## 7  9.678316  8.927290 10.429343
## 8 12.482937 12.032093 12.933782
## 9 15.287558 15.010984 15.564133
##
## \$se.fit
##          1          2          3          4          5          6          7          8          9
## 0.10908867 0.11884683 0.27731483 0.22659307 0.07600955 0.14410134 0.38204140 0.22934138 0.14069120
##
## \$df
## [1] 406
##
## \$residual.scale
## [1] 0.9854357
``````

### Possible visualization of the fitted models

#### Categorized lweight and engine.size

``````step.lweight <- 0.4
step.engine.size <- 2
breaks.lweight <- seq(6.8, 8, by = step.lweight)
breaks.engine.size <- seq(1, 7, by = step.engine.size)

CarsNow <- transform(CarsNow, flweight = cut(lweight, breaks = breaks.lweight, include.lowest = TRUE),
fengine.size = cut(engine.size, breaks = breaks.engine.size, include.lowest = TRUE))
with(CarsNow, table(flweight, useNA = "ifany"))
``````
``````## flweight
## [6.8,7.2] (7.2,7.6]   (7.6,8]
##        75       285        49
``````
``````with(CarsNow, table(fengine.size, useNA = "ifany"))
``````
``````## fengine.size
## [1,3] (3,5] (5,7]
##   210   181    18
``````

#### Midpoints of intervals

``````(mid.lweight <- breaks.lweight[-1] - step.lweight/2)
``````
``````## [1] 7.0 7.4 7.8
``````
``````(mid.engine.size <- breaks.engine.size[-1] - step.engine.size/2)
``````
``````## [1] 2 4 6
``````

#### Grids of values for both predictors

``````(grid.lweight <- seq(6.8, 8, length = 10))
``````
``````##  [1] 6.800000 6.933333 7.066667 7.200000 7.333333 7.466667 7.600000 7.733333 7.866667 8.000000
``````
``````(grid.engine.size <- seq(1, 7, length = 10))
``````
``````##  [1] 1.000000 1.666667 2.333333 3.000000 3.666667 4.333333 5.000000 5.666667 6.333333 7.000000
``````

#### Some color palletes

``````PALlw <- diverge_hcl(length(levels(CarsNow[, "flweight"])))
LINlw <- c("blue", "grey50", "red")
names(PALlw) <- names(LINlw) <- levels(CarsNow[, "flweight"])
print(PALlw)
``````
``````## [6.8,7.2] (7.2,7.6]   (7.6,8]
## "#023FA5" "#E2E2E2" "#8E063B"
``````
``````PALes <- diverge_hcl(length(levels(CarsNow[, "fengine.size"])))
LINes <- c("blue", "grey50", "red")
names(PALes) <- names(LINes) <- levels(CarsNow[, "fengine.size"])
print(PALes)
``````
``````##     [1,3]     (3,5]     (5,7]
## "#023FA5" "#E2E2E2" "#8E063B"
``````

#### Fitted lines from the additive model considered as functions of `lweight` with fixed values of `engine.size`

``````par(mfrow = c(1, 1), bty = BTY, mar = c(4, 4, 1, 1) + 0.1)
plot(consumption ~ lweight, data = CarsNow, pch = PCH, col = "black", bg = PALes[fengine.size],
xlab = "Log(weight) [log(kg)]", ylab = "Consumption [l/100 km]")
legend(6.85, 21, legend = levels(CarsNow[, "fengine.size"]), pch = PCH, col = "black", pt.bg = PALes, bty = "n")
legend(6.95, 21, legend = mid.engine.size, col = LINes, lty = 1, lwd = 2, bty = "n")
text(6.85, 21.2, labels = "Engine size [liters]", pos = 4)
for (i in 1:length(mid.engine.size)){
ifit <- predict(m2, newdata = data.frame(lweight = grid.lweight, engine.size = mid.engine.size[i]))
lines(grid.lweight, ifit, col = LINes[i], lwd = 2)
}
``````

#### Fitted lines from the additive model considered as functions of `weight` with fixed values of `engine.size`

``````par(mfrow = c(1, 1), bty = BTY, mar = c(4, 4, 1, 1) + 0.1)
plot(consumption ~ weight, data = CarsNow, pch = PCH, col = "black", bg = PALes[fengine.size],
xlab = "Weight [kg]", ylab = "Consumption [l/100 km]")
legend(940, 21, legend = levels(CarsNow[, "fengine.size"]), pch = PCH, col = "black", pt.bg = PALes, bty = "n")
legend(1100, 21, legend = mid.engine.size, col = LINes, lty = 1, lwd = 2, bty = "n")
text(940, 21.2, labels = "Engine size [liters]", pos = 4)
for (i in 1:length(mid.engine.size)){
ifit <- predict(m2, newdata = data.frame(lweight = grid.lweight, engine.size = mid.engine.size[i]))
lines(exp(grid.lweight), ifit, col = LINes[i], lwd = 2)
}
``````

#### Fitted lines from the additive model considered as functions of `lweight` with fixed values of `engine.size` plus the line from the model with `lweight` only

``````par(mfrow = c(1, 1), bty = BTY, mar = c(4, 4, 1, 1) + 0.1)
plot(consumption ~ lweight, data = CarsNow, pch = PCH, col = "black", bg = PALes[fengine.size],
xlab = "Log(weight) [log(kg)]", ylab = "Consumption [l/100 km]")
legend(6.85, 21, legend = levels(CarsNow[, "fengine.size"]), pch = PCH, col = "black", pt.bg = PALes, bty = "n")
legend(6.95, 21, legend = mid.engine.size, col = LINes, lty = 1, lwd = 2, bty = "n")
text(6.85, 21.2, labels = "Engine size [liters]", pos = 4)
for (i in 1:length(mid.engine.size)){
ifit <- predict(m2, newdata = data.frame(lweight = grid.lweight, engine.size = mid.engine.size[i]))
lines(grid.lweight, ifit, col = LINes[i], lwd = 2)
}
abline(coef(m1lw), col = "orange", lwd = 2)
legend(6.8, 18, legend = "M1(LW)", col = "orange", lty = 1, lwd = 2, bty = "n")
``````

#### Fitted lines from the additive model considered as functions of `engine.size` with fixed values of `lweight`

``````par(mfrow = c(1, 1), bty = BTY, mar = c(4, 4, 1, 1) + 0.1)
plot(consumption ~ engine.size, data = CarsNow, pch = PCH, col = "black", bg = PALlw[flweight],
xlab = "Engine size [liters]", ylab = "Consumption [l/100 km]")
legend(1.5, 21, legend = levels(CarsNow[, "flweight"]), pch = PCH, col = "black", pt.bg = PALes, bty = "n")
legend(2.1, 21, legend = mid.lweight, col = LINlw, lty = 1, lwd = 2, bty = "n")
text(1.5, 21.2, labels = "Log(weight) [log(kg)]", pos = 4)
for (i in 1:length(mid.lweight)){
ifit <- predict(m2, newdata = data.frame(lweight = mid.lweight[i], engine.size = grid.engine.size))
lines(grid.engine.size, ifit, col = LINlw[i], lwd = 2)
}
``````

#### Fitted lines from the additive model considered as functions of `engine.size` with fixed values of `lweight` plus the line from the model with `engine.size` only

``````par(mfrow = c(1, 1), bty = BTY, mar = c(4, 4, 1, 1) + 0.1)
plot(consumption ~ engine.size, data = CarsNow, pch = PCH, col = "black", bg = PALlw[flweight],
xlab = "Engine size [liters]", ylab = "Consumption [l/100 km]")
legend(1.5, 21, legend = levels(CarsNow[, "flweight"]), pch = PCH, col = "black", pt.bg = PALes, bty = "n")
legend(2.1, 21, legend = mid.lweight, col = LINlw, lty = 1, lwd = 2, bty = "n")
text(1.5, 21.2, labels = "Log(weight) [log(kg)]", pos = 4)
for (i in 1:length(mid.lweight)){
ifit <- predict(m2, newdata = data.frame(lweight = mid.lweight[i], engine.size = grid.engine.size))
lines(grid.engine.size, ifit, col = LINlw[i], lwd = 2)
}
abline(coef(m1es), col = "orange", lwd = 2)
legend(1.5, 18, legend = "M1(ES)", col = "orange", lty = 1, lwd = 2, bty = "n")
``````

## Inference based on the additive model

### Confidence intervals for regression coefficiens

``````confint(m2, level = 0.95)
``````
``````##                   2.5 %      97.5 %
## (Intercept) -48.5390004 -36.7738113
## lweight       6.1563998   7.8667052
## engine.size   0.3790641   0.7055541
``````

### Estimates and confidence intervals for the mean differences between the lines on the above plots

• Function `LSest` is from package `mffSM`.
``````coef(m2)
``````
``````## (Intercept)     lweight engine.size
## -42.6564059   7.0115525   0.5423091
``````
``````L2 <- matrix(c(0, 0,            step.engine.size,
0, step.lweight, 0),
ncol = 3, byrow = TRUE)
colnames(L2) <- names(coef(m2))
rownames(L2) <- c("Engine lines diffs.", "Log(weight) lines diffs.")
print(L2)
``````
``````##                          (Intercept) lweight engine.size
## Engine lines diffs.                0     0.0           2
## Log(weight) lines diffs.           0     0.4           0
``````
``````library("mffSM")
LSest(m2, L = L2)
``````
``````##                          Estimate Std. Error   t value    P value     Lower    Upper
## Engine lines diffs.      1.084618  0.1660830  6.530579 1.9617e-10 0.7581282 1.411108
## Log(weight) lines diffs. 2.804621  0.1740039 16.118151 < 2.22e-16 2.4625599 3.146682
``````

### Visualization on the plots

``````par(mfrow = c(1, 1), bty = BTY, mar = c(4, 4, 1, 1) + 0.1)
plot(consumption ~ lweight, data = CarsNow, pch = PCH, col = "black", bg = PALes[fengine.size],
xlab = "Log(weight) [log(kg)]", ylab = "Consumption [l/100 km]")
abline(v = mid.lweight, col = "skyblue", lty = 4, lwd = 2)
legend(6.85, 21, legend = levels(CarsNow[, "fengine.size"]), pch = PCH, col = "black", pt.bg = PALes, bty = "n")
legend(6.95, 21, legend = mid.engine.size, col = LINes, lty = 1, lwd = 2, bty = "n")
text(6.85, 21.2, labels = "Engine size [liters]", pos = 4)
for (i in 1:length(mid.engine.size)){
ifit <- predict(m2, newdata = data.frame(lweight = grid.lweight, engine.size = mid.engine.size[i]))
lines(grid.lweight, ifit, col = LINes[i], lwd = 2)
}
``````

``````par(mfrow = c(1, 1), bty = BTY, mar = c(4, 4, 1, 1) + 0.1)
plot(consumption ~ engine.size, data = CarsNow, pch = PCH, col = "black", bg = PALlw[flweight],
xlab = "Engine size [liters]", ylab = "Consumption [l/100 km]")
abline(v = mid.engine.size, col = "skyblue", lty = 4, lwd = 2)
legend(1.5, 21, legend = levels(CarsNow[, "flweight"]), pch = PCH, col = "black", pt.bg = PALes, bty = "n")
legend(2.1, 21, legend = mid.lweight, col = LINlw, lty = 1, lwd = 2, bty = "n")
text(1.5, 21.2, labels = "Log(weight) [log(kg)]", pos = 4)
for (i in 1:length(mid.lweight)){
ifit <- predict(m2, newdata = data.frame(lweight = mid.lweight[i], engine.size = grid.engine.size))
lines(grid.engine.size, ifit, col = LINlw[i], lwd = 2)
}
``````

## Interaction model

### Estimates and basic inference

``````(summary(m3 <- lm(consumption ~ lweight +  engine.size + lweight:engine.size, data = CarsNow)))
#(summary(m3 <- lm(consumption ~ engine.size + lweight + engine.size:lweight, data = CarsNow)))
``````
``````(summary(m3 <- lm(consumption ~ lweight * engine.size, data = CarsNow)))     ## it is the same as above
``````
``````##
## Call:
## lm(formula = consumption ~ lweight * engine.size, data = CarsNow)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -3.3999 -0.6538 -0.1407  0.4779  3.9219
##
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)
## (Intercept)         -25.4574     5.1267  -4.966 1.01e-06 ***
## lweight               4.6877     0.7104   6.599 1.30e-10 ***
## engine.size          -5.3160     1.4338  -3.708 0.000238 ***
## lweight:engine.size   0.7860     0.1921   4.092 5.15e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9669 on 405 degrees of freedom
## Multiple R-squared:  0.7961, Adjusted R-squared:  0.7946
## F-statistic: 527.2 on 3 and 405 DF,  p-value: < 2.2e-16
``````

### Is the interaction model significantly better than the additive model?

``````anova(m2, m3)
``````
``````## Analysis of Variance Table
##
## Model 1: consumption ~ lweight + engine.size
## Model 2: consumption ~ lweight * engine.size
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)
## 1    406 394.26
## 2    405 378.60  1    15.656 16.748 5.154e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
``````

### Some estimated model-based means

``````pdata <- data.frame(lweight = rep(c(7, 7.4, 7.8), 3), engine.size = rep(c(2, 4, 6), each = 3))
print(pdata)
``````
``````##   lweight engine.size
## 1     7.0           2
## 2     7.4           2
## 3     7.8           2
## 4     7.0           4
## 5     7.4           4
## 6     7.8           4
## 7     7.0           6
## 8     7.4           6
## 9     7.8           6
``````
``````predict(m3, newdata = pdata, se.fit = TRUE, interval = "confidence", level = 0.95)
``````
``````## \$fit
##         fit       lwr       upr
## 1  7.728981  7.493545  7.964417
## 2 10.232884 10.000389 10.465378
## 3 12.736786 12.171377 13.302196
## 4  8.101219  7.604251  8.598188
## 5 11.233934 11.067414 11.400454
## 6 14.366648 14.077796 14.655500
## 7  8.473458  7.536461  9.410455
## 8 12.234984 11.776878 12.693090
## 9 15.996510 15.561061 16.431958
##
## \$se.fit
##          1          2          3          4          5          6          7          8          9
## 0.11976367 0.11826733 0.28761765 0.25280226 0.08470678 0.14693563 0.47663972 0.23303334 0.22150788
##
## \$df
## [1] 405
##
## \$residual.scale
## [1] 0.966863
``````

### Possible visualization

#### Fitted lines from the interaction model considered as functions of `lweight` with fixed values of `engine.size`

``````par(mfrow = c(1, 1), bty = BTY, mar = c(4, 4, 1, 1) + 0.1)
plot(consumption ~ lweight, data = CarsNow, pch = PCH, col = "black", bg = PALes[fengine.size],
xlab = "Log(weight) [log(kg)]", ylab = "Consumption [l/100 km]")
legend(6.85, 21, legend = levels(CarsNow[, "fengine.size"]), pch = PCH, col = "black", pt.bg = PALes, bty = "n")
legend(6.95, 21, legend = mid.engine.size, col = LINes, lty = 1, lwd = 2, bty = "n")
text(6.85, 21.2, labels = "Engine size [liters]", pos = 4)
for (i in 1:length(mid.engine.size)){
ifit <- predict(m3, newdata = data.frame(lweight = grid.lweight, engine.size = mid.engine.size[i]))
lines(grid.lweight, ifit, col = LINes[i], lwd = 2)
}
``````

#### Fitted lines from the interaction model considered as functions of `weight` with fixed values of `engine.size`

``````par(mfrow = c(1, 1), bty = BTY, mar = c(4, 4, 1, 1) + 0.1)
plot(consumption ~ weight, data = CarsNow, pch = PCH, col = "black", bg = PALes[fengine.size],
xlab = "Weight [kg]", ylab = "Consumption [l/100 km]")
legend(940, 21, legend = levels(CarsNow[, "fengine.size"]), pch = PCH, col = "black", pt.bg = PALes, bty = "n")
legend(1100, 21, legend = mid.engine.size, col = LINes, lty = 1, lwd = 2, bty = "n")
text(940, 21.2, labels = "Engine size [liters]", pos = 4)
for (i in 1:length(mid.engine.size)){
ifit <- predict(m3, newdata = data.frame(lweight = grid.lweight, engine.size = mid.engine.size[i]))
lines(exp(grid.lweight), ifit, col = LINes[i], lwd = 2)
}
``````

#### Fitted lines from the interaction model considered as functions of `engine.size` with fixed values of `lweight`

``````par(mfrow = c(1, 1), bty = BTY, mar = c(4, 4, 1, 1) + 0.1)
plot(consumption ~ engine.size, data = CarsNow, pch = PCH, col = "black", bg = PALlw[flweight],
xlab = "Engine size [liters]", ylab = "Consumption [l/100 km]")
legend(1.5, 21, legend = levels(CarsNow[, "flweight"]), pch = PCH, col = "black", pt.bg = PALes, bty = "n")
legend(2.1, 21, legend = mid.lweight, col = LINlw, lty = 1, lwd = 2, bty = "n")
text(1.5, 21.2, labels = "Log(weight) [log(kg)]", pos = 4)
for (i in 1:length(mid.lweight)){
ifit <- predict(m3, newdata = data.frame(lweight = mid.lweight[i], engine.size = grid.engine.size))
lines(grid.engine.size, ifit, col = LINlw[i], lwd = 2)
}
``````

## Inference based on the interaction model

### Confidence intervals for regression coefficiens

``````confint(m3, level = 0.95)
``````
``````##                           2.5 %     97.5 %
## (Intercept)         -35.5355861 -15.379127
## lweight               3.2912774   6.084179
## engine.size          -8.1346471  -2.497315
## lweight:engine.size   0.4084413   1.163587
``````

### Estimates and confidence intervals for the effects of one predictor when the second one is fixed

#### Check the regression coefficients

``````coef(m3)
``````
``````##         (Intercept)             lweight         engine.size lweight:engine.size
##         -25.4573566           4.6877284          -5.3159808           0.7860143
``````

#### Effect of `engine size` for different values of `lweight`

``````L3es <- cbind(0, 0, 1, mid.lweight)
colnames(L3es) <- names(coef(m3))
rownames(L3es) <- paste("Log(weight)=", mid.lweight, sep="")
print(L3es)
``````
``````##                 (Intercept) lweight engine.size lweight:engine.size
## Log(weight)=7             0       0           1                 7.0
## Log(weight)=7.4           0       0           1                 7.4
## Log(weight)=7.8           0       0           1                 7.8
``````
``````LSest(m3, L = L3es)
``````
``````##                  Estimate Std. Error  t value    P value       Lower     Upper
## Log(weight)=7   0.1861193 0.11922185 1.561118    0.11928 -0.04825159 0.4204903
## Log(weight)=7.4 0.5005251 0.08211366 6.095515 2.5440e-09  0.33910286 0.6619473
## Log(weight)=7.8 0.8149308 0.10524346 7.743291 7.8352e-14  0.60803911 1.0218225
``````

#### Differences between the lines on the plot

``````L3es2 <- cbind(0, 0, 2, 2*mid.lweight)
colnames(L3es2) <- names(coef(m3))
rownames(L3es2) <- paste("Log(weight)=", mid.lweight, sep="")
print(L3es2)
``````
``````##                 (Intercept) lweight engine.size lweight:engine.size
## Log(weight)=7             0       0           2                14.0
## Log(weight)=7.4           0       0           2                14.8
## Log(weight)=7.8           0       0           2                15.6
``````
``````LSest(m3, L = L3es2)
``````
``````##                  Estimate Std. Error  t value    P value       Lower     Upper
## Log(weight)=7   0.3722387  0.2384437 1.561118    0.11928 -0.09650318 0.8409805
## Log(weight)=7.4 1.0010501  0.1642273 6.095515 2.5440e-09  0.67820571 1.3238945
## Log(weight)=7.8 1.6298616  0.2104869 7.743291 7.8352e-14  1.21607822 2.0436449
``````

#### Above differences on a plot

``````par(mfrow = c(1, 1), bty = BTY, mar = c(4, 4, 1, 1) + 0.1)
plot(consumption ~ lweight, data = CarsNow, pch = PCH, col = "black", bg = PALes[fengine.size],
xlab = "Log(weight) [log(kg)]", ylab = "Consumption [l/100 km]")
abline(v = mid.lweight, col = "skyblue", lty = 4, lwd = 2)
legend(6.85, 21, legend = levels(CarsNow[, "fengine.size"]), pch = PCH, col = "black", pt.bg = PALes, bty = "n")
legend(6.95, 21, legend = mid.engine.size, col = LINes, lty = 1, lwd = 2, bty = "n")
text(6.85, 21.2, labels = "Engine size [liters]", pos = 4)
for (i in 1:length(mid.engine.size)){
ifit <- predict(m3, newdata = data.frame(lweight = grid.lweight, engine.size = mid.engine.size[i]))
lines(grid.lweight, ifit, col = LINes[i], lwd = 2)
}
``````

#### Effect of `lweight` for different values of `engine size`

``````L3lw <- cbind(0, 1, 0, mid.engine.size)
colnames(L3lw) <- names(coef(m3))
rownames(L3lw) <- paste("Engine size=", mid.engine.size, sep="")
print(L3lw)
``````
``````##               (Intercept) lweight engine.size lweight:engine.size
## Engine size=2           0       1           0                   2
## Engine size=4           0       1           0                   4
## Engine size=6           0       1           0                   6
``````
``````LSest(m3, L = L3lw)
``````
``````##               Estimate Std. Error  t value    P value    Lower     Upper
## Engine size=2 6.259757  0.4646670 13.47149 < 2.22e-16 5.346297  7.173217
## Engine size=4 7.831786  0.4715287 16.60935 < 2.22e-16 6.904836  8.758735
## Engine size=6 9.403814  0.7237966 12.99234 < 2.22e-16 7.980947 10.826682
``````

#### Differences between the lines on the plot

``````L3lw0.4 <- cbind(0, 0.4, 0, 0.4*mid.engine.size)
colnames(L3lw0.4) <- names(coef(m3))
rownames(L3lw0.4) <- paste("Engine size=", mid.engine.size, sep="")
print(L3lw0.4)
``````
``````##               (Intercept) lweight engine.size lweight:engine.size
## Engine size=2           0     0.4           0                 0.8
## Engine size=4           0     0.4           0                 1.6
## Engine size=6           0     0.4           0                 2.4
``````
``````LSest(m3, L = L3lw0.4)
``````
``````##               Estimate Std. Error  t value    P value    Lower    Upper
## Engine size=2 2.503903  0.1858668 13.47149 < 2.22e-16 2.138519 2.869287
## Engine size=4 3.132714  0.1886115 16.60935 < 2.22e-16 2.761934 3.503494
## Engine size=6 3.761526  0.2895186 12.99234 < 2.22e-16 3.192379 4.330673
``````

#### Above differences on a plot

``````par(mfrow = c(1, 1), bty = BTY, mar = c(4, 4, 1, 1) + 0.1)
plot(consumption ~ engine.size, data = CarsNow, pch = PCH, col = "black", bg = PALlw[flweight],
xlab = "Engine size [liters]", ylab = "Consumption [l/100 km]")
abline(v = mid.engine.size, col = "skyblue", lty = 4, lwd = 2)
legend(1.5, 21, legend = levels(CarsNow[, "flweight"]), pch = PCH, col = "black", pt.bg = PALes, bty = "n")
legend(2.1, 21, legend = mid.lweight, col = LINlw, lty = 1, lwd = 2, bty = "n")
text(1.5, 21.2, labels = "Log(weight) [log(kg)]", pos = 4)
for (i in 1:length(mid.lweight)){
ifit <- predict(m3, newdata = data.frame(lweight = mid.lweight[i], engine.size = grid.engine.size))
lines(grid.engine.size, ifit, col = LINlw[i], lwd = 2)
}
``````

## Some more complex model (based on a larger set of covariates)

### Model fit

``````(summary(m4 <- lm(consumption ~ lweight + engine.size + horsepower + lweight:engine.size + lweight:horsepower, data = CarsNow)))
``````
``````##
## Call:
## lm(formula = consumption ~ lweight + engine.size + horsepower +
##     lweight:engine.size + lweight:horsepower, data = CarsNow)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -2.8693 -0.6931 -0.1064  0.4869  3.5607
##
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)
## (Intercept)         -30.413021   5.239954  -5.804 1.31e-08 ***
## lweight               5.294284   0.723855   7.314 1.41e-12 ***
## engine.size         -19.779667   2.845077  -6.952 1.46e-11 ***
## horsepower            0.259533   0.049023   5.294 1.97e-07 ***
## lweight:engine.size   2.706148   0.382889   7.068 6.98e-12 ***
## lweight:horsepower   -0.034300   0.006606  -5.192 3.31e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9136 on 403 degrees of freedom
## Multiple R-squared:  0.8189, Adjusted R-squared:  0.8167
## F-statistic: 364.5 on 5 and 403 DF,  p-value: < 2.2e-16
``````
• Can you interprete this model?

### Is model `m4` significantly better than previously considered model `m3`?

``````anova(m3, m4)
``````
``````## Analysis of Variance Table
##
## Model 1: consumption ~ lweight * engine.size
## Model 2: consumption ~ lweight + engine.size + horsepower + lweight:engine.size +
##     lweight:horsepower
##   Res.Df    RSS Df Sum of Sq      F   Pr(>F)
## 1    405 378.60
## 2    403 336.34  2    42.261 25.318 4.39e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
``````