Notifications
Some additional topics will be covered in lectures published around May 26 (or a bit later). They are not required at the exam.
Exam opportunities can be arranged in May, June, July or September upon request.
Schedule
Lectures  
Tuesday  14:00  15:30  K4  
Exercise Class  
Tuesday  12:20  13:50  Praktikum KPMS 
Course Notes & Contents
The course notes are under development. So far they only include a part of the material covered in this course.
Distant learning
 Lecture March 17: Please study Sec. 3.2.2 (Woolf estimator), Sec. 3.2.3 (MantelHaenszel estimator), and 3.3 (Logistic regression for stratified casecontrol studies) in the course notes, pp. 3035. Look at the supplementary reading in Breslow and Day I (Chap. IV, pp. 136–146, Chap. VI, pp. 192–242).
 Lecture March 24: Please study Sec. 4.1 (Principles of matching) and Sec. 4.2 (Classical methods for matched casecontrol studies) in the course notes, pp. 3640. Look at the supplementary reading in Breslow and Day I (Chap. V, pp. 162169).
 Lecture March 31: Please study Sec. 4.3 (Conditional logistic regression for matched casecontrol studies) in the course notes, pp. 4143. Look at the supplementary reading in Breslow and Day I (Chap. VII, pp. 248268).
 Lecture April 7: Please study Sec. 5.1 (Cohort study design) and Sec. 5.2 (Models for ungrouped cohort data) in the course notes, pp. 4548. Look at the supplementary reading in Breslow and Day II (Chap. 5, pp. 178197).
 Lecture April 14: Please study Sec. 5.3 ( Models for grouped cohort data) in the course notes, pp. 4852. Look at the supplementary reading in Breslow and Day II (Chap. 3, pp. 82–91, Chap. 4, pp. 120–171).
 Lecture April 21: Please study Sec. 5.4 (Discrete Cox model) in the course notes, pp. 5255.
 Lecture April 28: Please study Chapter 6 (Diagnostic tests) in the course notes, pp. 5763.
 Lecture May 5: Audio recording of lecture can be downloaded from this link. Presentation slides are here.
 Lecture May 12: Audio recording of lecture can be downloaded from this link. Presentation slides are here. Look at example contents of a clinical trial protocol and an example of a full protocol (published in NEJM, 2017).

Lecture May 19:
Audio recording: Part A,
Part B.
Presentation slides are here. Look at an example of a Statistical Analysis Plan in a Phase III trial.
Textbooks
 [EBR] Esteve J, Benhamou E, Raymond L. Statistical Methods in Cancer Research, Vol. IV: Descriptive Epidemiology. International Agency for Research on Cancer: Lyon, 1994.
 [BD1] Breslow NE, Day NE. Statistical Methods in Cancer Research, Vol. I: The analysis of casecontrol studies. International Agency for Research on Cancer: Lyon, 1980.
 [BD2] Breslow NE, Day NE. Statistical Methods in Cancer Research, Vol. II: The design and analysis of cohort studies. International Agency for Research on Cancer: Lyon, 1987.
 [FFD] Friedman LM, Furberg CD, DeMets DL. Fundamentals of Clinical Trials. 4th Ed., Springer: New York, 2010.
Exercise Class Assignments
 Assignment 1:
Estimating incidence rates
(due date: March 3)
The dataset dbtr contains the numbers of cases of Type 1 Diabetes Mellitus observed in the Czech Republic, together with the population size, aggregated by sex, age (0 to 14) and calendar year (19892009).
Solve the following tasks by writing your own code; do not use any specialized epidemiological packages; do not try to "google" solutions on the web.
 Estimate agespecific incidence rates of Type 1 DM by calendar year for boys, girls and all children. [Decide whether and how age and calendar year should be grouped.] Plot estimated incidence rates against age for (i) different time periods; (ii) different birth cohorts. Do you feel that incidence of Type 1 DM changed between 1989 and 2009?
 Estimate the cumulative risks (i.e., probabilities) of developing Type 1 DM before the 15th year of age at different time periods (again for boys, girls, and all together).
 Calculate agestandardized incidence rates for Type 1 DM at different time periods (again for boys, girls, and all together). [Age standardized rates combine agespecific rates over the same standard age distribution at all periods.]
 Calculate confidence intervals for cumulative risks and agestandardized incidence rates.
Here is R code to solve this problem, incidence figure by calendar period and by year of birth. Here is an alternative presentation by Martin Otava.

Assignment 2:
Casecontrol analysis via logistic regression
(due date: March 31)
IlleetVilaine Data contain the results of a casecontrol study investigating the effect of alcohol and tobacco consumption on the risk of oesophageal cancer. There are 200 male cases and 775 male controls, all of them inhabitants of the French departement IlleetVilaine. (See [BD1], Sec. 4.1, p. 122–124).
 Reproduce the descriptive results in Table 4.1 of [BD1], p. 123.
 Conduct a grouped analysis of alcohol risk adjusted for age (see [BD1], p. 210–213).
 Conduct a joint grouped analysis of alcohol and tobacco risk adjusted for age (see [BD1], p. 213–221, esp. Tables 6.5 and 6.6).
 Conduct a joint ungrouped analysis of alcohol and tobacco risk adjusted for age (see [BD1], p. 227–231, esp. Table 6.12).
 Send a record of your work (R program, results, comments and interpretations) by email before the due date. Any format (processed RMarkdown, commented R code with attached output, processed LaTeX document) is acceptable.
Here is a solution prepared by Martin Otava.

Assignment 3:
Matched casecontrol analysis via conditional logistic regression
(due date: April 14)
The Los Angeles Study of Endometrial Cancer was a matched casecontrol study conducted in California in the 1970's (description in [BD1], Chap. 5.1, p. 162–163, data in [BD1], App. III, p. 290–296). There are 63 cases of endometrial cancer, all women age 55 or over, each matched to four controls living in the same retirement community. The primary exposure of interest was estrogen use. The secondary exposure was gallbladder disease.
The Epi library in R includes two versions of the data: the full dataset bdendo and a subset containing a single control matched to each case bdendo11.
 Conduct descriptive analysis similar to Table 5.1 of [BD1], p. 163. Use bdendo11 data to estimate odds ratios using the method for 1:1 matching and binary exposure.
 Conduct conditional logistic analysis of the bdendo11 dataset (1:1 matching) using the function glm. See [BD1], Chap. 7.3, p. 253–259, for inspiration and comparison of results.
 Conduct conditional logistic analysis of the bdendo dataset (1:4 matching) using one of the conditional logistic regression functions available in R (function clogistic from Epi library or function clogit from survival library). See [BD1], Chap. 7.4, p. 259–268, for inspiration and comparison of results.

Send a record of your work (R program, results, comments and interpretations) by email before the due date. Any format (processed RMarkdown, commented R code with attached output, processed LaTeX document) is acceptable.
You are not asked to reproduce all results from [BD1]. Read the relevant pages, try to understand the strategy and logic in the analysis, choose youself what you want to try, do it, check that the results agree with [BD1]. You can also try your own ideas that are not in the book. This is what is meant by „use the book for inspiration”.

Assignment 4: Analysis of
cohort followup studies
(due date: May 5)
The Cardiovascular Health Study was a prospective cohort study of risk factors for cardiovascular disease among adults aged 65 years and older. The subjects were enrolled in 19891990 and followed till 2000.
You will investigate the following questions:
 Is the risk of myocardial infarction (MI) among the elderly the same for men as it is for women? If not, does the relative risk between men and women vary with age?
 Is the carotid artery intimamedia wall thickness associated with the risk of future myocardial infarction?
The dataset mi.RData includes information on 3917 subjects, of whom 408 had myocardial infarction during the followup. The description of variables is provided in a separate codesheet.
 Conduct a descriptive analysis of MI risk and its association with age, gender, and intima wall thickness (using empirical incidence estimates).
 Build a Cox proportional hazards regression model addressing the two questions of interest. Interpret the results, answer the questions.
 Take the final model from step 2. and refit it using the grouped Poisson approach. Interpret the results. Have the answers to the questions changed substantially?
 Take the final model from step 2. and refit it using logistic regression with MI status as the outcome (ignoring the timing of the events). Interpret the results. Have the answers to the questions changed substantially?
 Send a record of your work (R program, results, comments and interpretations) by email before the due date. Any format (processed RMarkdown, commented R code with attached output, processed LaTeX document) is acceptable.
For steps 1.3., you may find helpful the following R code example illustrating how to aggregate followup time and number of cases across age/exposure categories and how to include interactions of exposure with time in the Cox model.
Course Plan
We will learn statistical methods used in medicine, especially in epidemiology and clinical trials. Terminology specific to medical applications will be explained and some specialized methods will be covered. We will review study designs used in medical studies (cohort study, casecontrol study, randomized controlled trial) and explain how to analyze each of them. Ethical and administrative aspects of human experiments and their impact on handling statistical issues will be discussed.
Prerequisites
This course assumes advanced knowledge of statistical theory and practice, especially linear regression, logistic regression, loglinear models, survival analysis. Master students of "Probability, statistics and econometrics" must have completed the course on Linear Regression (NMSA407), Advanced Regression Models (NMST432), and Censored Data Analysis (NMST531) before enrolling in this course.