Linear regression (NMSA407)

Arnošt Komárek


Home (CZ) | Teaching (CZ) | BESEDA | NMST552 |

Teaching winter

NMSA407 | NMST431 |

Teaching summer

NMST432 | NMST440 | NMST432 | NMST440 |

Teaching, software

Rko (CZ) |


Diploma theses (CZ) | Bachelor theses (CZ) |

Linear regression (NMSA407)

Winter semester 2017–18

SIS pages of the course:    ENG    CZE


Lectures: Thursday 9:00 in K1   
Thursday 14:00 in K1   
Exercise class (MM1): Tuesday 10:40 in K4    (RNDr. Matúš Maciak, Ph.D.)
Exercise class (MM2): Tuesday 12:20 in K4    (RNDr. Matúš Maciak, Ph.D.)
Exercise class (SN): Friday 9:00 in K11    (Mgr. Stanislav Nagy, Ph.D.)
  • Language of both lectures and all exercise classes is English.
  • Personal communication with the lecturer and the exercise class instructors can also be conducted in Czech or Slovak.


As of 20171220: in many cases, the course notes include also proofs or derivations. Nevertheless, there are still some proofs or derivations that are not included in the lecture notes and knowledge of whose is still expected for exam.

20/12/2017:   Course notes with proofs
The latest update of the course notes (pdf) includes also proofs of many theorems/lemmas that were shown only on the blackboard.
20/12/2017:   Sample exam assignment
Sample assignment of the written part of the exam is here (pdf), an example of a solution to be awarded by 90 points is here (pdf).
06/12/2017:   Exam projects and details on the exam
Details concerning the exam are described in this document.
Assignments of the exam projects were sent to all students via e-mail on 06/12/2017.
Sample assignment of the written part of the exam will be published before Christmas.


Course notes will be gradually updated. They provide a record of the lecture including notes, comments etc. mentioned perhaps only orally during the lecture. In many cases, the course notes do not include proofs or derivations, especially those that are fully shown on the blackboard during the lecture.

The lecture will follow the notes quite closely and more or less in a linear way. Students are advised to bring printed course notes to the lecture and supplement them by their own hand-written notes. Not everything that will be said will be written on the blackboard (especially various remarks etc.). Also statements of definitions and theorems will not be fully written on the blackboard.

Notes (pdf), latest update including also proofs of most theorems/lemmas published on 20171220
    chapters 1 – 8, appendices published 20170924.
  chapters 9 and 10 added 20171019
  chapter 11 added 20171112
  chapter 12 added 20171123
  chapters 13 and 14 added 20171129
  chapter 15 added 20171130

Next to gradually updated course notes, full course notes from academic year 2016–17 are available here. Nevertheless, those might be different from the 2017–18 version at some places. Moreover, possible errors found in the 2016–17 version are corrected only in the 2017–18 update.


Course slides will be projected during the lecture. They mainly contain

  • the structure of the lecture;
  • statements of definitions and theorems;
  • some illustrative plots/computer output.
Course slides alone are rather incomplete as a study material. In principle, it is not necessary to print the slides. Information they contain is just a subset of information included in the notes, only in a different format (suitable for projection).

Main lecture (pdf), chapters 1 – 8 published 20170924.
 chapters 9 – 10 added 20171019
 chapter 11 added 20171112
 chapter 12 added 20171123
 chapters 13 and 14 added 20171129
 chapter 15 added 20171130
Appendices (pdf) published 20170924

The course slides used in the academic year 2016–17 are available here. Note that those may differ at some places from the slides used in 2017–18.


The course is supplemented by the R package mffSM which contains example datasets used throughout the course and few additional small functions related to processing of the linear model fit. Upon download (from the link below, not from CRAN), the package can be installed in R in a standard way (``from a local repository''). Windows binary file is intended for the MS Windows users (as the title suggests), the source code is intended for users of other (mostly more reliable) operating systems where it is a standard to compile the package from its source (Linux, Mac etc.). The mffSM package depends on packages colorspace, lattice, car, which are available in a standard way from CRAN. All those dependency packages should normally be automatically installed if the installation of the mffSM package is performed directly from the R console on an Internet-connected computer using the command (its appropriately modified analogy):

install.packages("PATH_WHERE_DOWNLOADED/mffSM_1.1.[tar.gz,zip]", repos = NULL)

Source code:   mffSM_1.1.tar.gz
Windows binary:


R tutorials show the R analyses that are based on theory given during the lectures. They also provide the code used to prepare majority of the output/plots that is used during the lectures as illustrations. The R tutorials may serve as a reference for the assignments performed during the exercise classes or required in homeworks.

R tutorials will be gradually published during the semester in correspondence with the topics covered by the lecture. All tutorials from the academic year 2016–17 are available here.

The R scripts provided below assume that the content of the .Rprofile is sourced at start.

1. Linear Model
  1. Simple illustration of a linear model (data Hosi0)    html    R code
2. Least Squares Estimation
  1. Matrix algebra background of linear regression    html    R code
  2. R function lm    html    R code
3. Normal Linear Model
  1. Inference in a model with the regression line (data Cars2004nh)    html    R code
  2. Joint inference on a vector of estimable parameters (data Cars2004nh)    html    R code
  3. Confidence interval for the model based mean, prediction interval (data Hosi0)    html    R code
  4. Confidence interval for the model based mean, prediction interval (data Kojeni)    html    R code
4. Basic Regression Diagnostics
  1. Basic Regression Diagnostics (data Cars2004nh)    html    R code
7. General Linear Model
  1. Weighted least squares (data Kojeni and wKojeni)    html    R code
8. Parameterizations of Covariates
  1. Numeric covariate: simple transformation, polynomial regression, regression splines (data Houses1987)    html    R code
  2. Numeric covariate: regression splines (data Motorcycle)    html    R code
  3. Categorical nominal covariate (data Cars2004nh)    html    R code
  4. Categorical ordinal covariate (data Cars2004nh)    html    R code
9. Additivity and Interactions
  1. Two numeric covariates (data Cars2004nh)    html    R code
  2. Numeric and categorical covariate (data Cars2004nh)    html    R code
  3. ANOVA tables of type I, II and III (data Cars2004nh)    html    R code
10. Analysis of Variance
  1. Two-way Analysis of Variance (data Howells)    html    R code
11. Simultaneous Inference in a Linear Model
  1. Multiple comparison procedures (Tukey, Hothorn–Bretz–Westfall) (data Howells)    html    R code
  2. Multiple comparison procedures (Hothorn–Bretz–Westfall) (data Cars2004nh)    html    R code
  3. Confidence band around and for the regression function (data Kojeni)    html    R code
12. Checking Model Assumptions
  1. Partial residuals, Simpson's paradox (data Policie)    html    R code
  2. Partial residuals (data Cars2004nh)    html    R code
  3. Residual plots and tests on assumptions (data Cars2004nh)    html    R code
  4. Checking homoscedasticity (data Draha)    html    R code
  5. Checking uncorrelated errors (data Olympic)    html    R code
  6. Transformation of response: ANOVA with log-transformed response    html    R code
      to get normality and homoscedasticity (data Houses1987)
  7. Transformation of response: Regression with log-transformed response    html    R code
      to stabilize the variance, Box–Cox transformation (data Cars2004nh)
13. Problematic Regression Space
  1. Multicollinearity (data IQ)    html    R code
  2. Multicollinearity (data Cars2004nh)    html    R code
15. Unusual Observations
  1. Unusual observations (data Cars2004)    html    R code


All information related to the exercise classes is available at the central exercise classes webpage.

Requirements to get the course credit (zápočet) are described here (published on 26 September 2017).

Exercise classes are synchronized. Content of the classes held in the same week is approximately the same.


  • It is necessary to be in possession of a course credit (zápočet) to be able to take exam.
  • Exam grade will be based on three parts:
    1. Take home project (practical analysis), results delivered in a form of a written report by prescribed deadline. Assignments will be published latest on January 2, 2018. were sent to students on December 6, 2017.
    2. Written part composed of theoretical and semi-practical assignments (no computer analysis).
    3. Oral part.
    For details, see this document (pdf).

All exams take place between January 11 and February 16, 2018. There will be four to five opportunities to take an exam spread over this period. There will be no exam dates later on.


View My Stats