Úterý: liché 17.20 K6, sudé online 18.15


Pro veškerou komunikaci stran předmětu a zasílání omluv na hodiny použijte email otava at karlin.mff.cuni.cz.
První cvičení se koná online výjimečně v pátek 1.3. v 16.00

Assignment 1: Estimating incidence rates (due: 12.00 10.03.2024)

The dataset dbtr (here) contains the numbers of cases of Type 1 Diabetes Mellitus observed in the Czech Republic, together with the population size, aggregated by sex, age (0 to 14) and calendar year (1989-2009).

  1. Estimate age-specific incidence rates of Type 1 DM by calendar year for boys, girls and all children. [Decide whether and how age and calendar year should be grouped.] Plot estimated incidence rates against age for (i) different time periods; (ii) different birth cohorts. Do you feel that incidence of Type 1 DM changed between 1989 and 2009?
  2. Estimate the cumulative risks (i.e., probabilities) of developing Type 1 DM before the 15th year of age at different time periods (again for boys, girls, and all together).
  3. Calculate age-standardized incidence rates for Type 1 DM at different time periods (again for boys, girls, and all together). [Age standardized rates combine age-specific rates over the same standard age distribution at all periods.]
  4. Calculate confidence intervals for cumulative risks and age-standardized incidence rates.

Solution (focus on codes, not format): HTML output and underlying Quarto code

Assignment 2: Case-control analysis via logistic regression (due: 12.00 31.03.2024)

Ille-et-Vilaine Data contains the results of a case-control study investigating the effect of alcohol and tobacco consumption on the risk of oesophageal cancer. There are 200 male cases and 775 male controls, all of them inhabitants of the French departement Ille-et-Vilaine. (See [BD1], Sec. 4.1, p. 122–124, reference in course materials).

  1. Reproduce the descriptive results in Table 4.1 of [BD1], p. 123.
  2. Conduct a grouped analysis of alcohol risk adjusted for age (see [BD1], p. 210–213).
  3. Conduct a joint grouped analysis of alcohol and tobacco risk adjusted for age (see [BD1], p. 213–221, esp. Tables 6.5 and 6.6).
  4. Conduct a joint ungrouped analysis of alcohol and tobacco risk adjusted for age (see [BD1], p. 227–231, esp. Table 6.12).

Solution: HTML output and underlying Quarto code

Assignment 3: Matched case-control analysis via conditional logistic regression (due: 12.00 21.04.2023)

The Los Angeles Study of Endometrial Cancer was a matched case-control study conducted in California in the 1970's (description in [BD1], Chap. 5.1, p. 162–163, data in [BD1], App. III, p. 290–296). There are 63 cases of endometrial cancer, all women age 55 or over, each matched to four controls living in the same retirement community. The primary exposure of interest was estrogen use. The secondary exposure was gallbladder disease. The Epi library in R includes two versions of the data: the full dataset bdendo and a subset containing a single control matched to each case bdendo11 .

  1. Conduct descriptive analysis similar to Table 5.1 of [BD1], p. 163. Use bdendo11 data to estimate odds ratios using the method for 1:1 matching and binary exposure.
  2. Conduct conditional logistic analysis of the bdendo11 dataset (1:1 matching) using the function glm. See [BD1], Chap. 7.3, p. 253–259, for inspiration and comparison of results.
  3. Conduct conditional logistic analysis of the bdendo dataset (1:4 matching) using one of the conditional logistic regression functions available in R (function clogistic from Epi library or function clogit from survival library. See [BD1], Chap. 7.4, p. 253–268, for inspiration and comparison of results.

Assignment 4: Analysis of cohort follow-up studies (due: 12.00 12.05.2023)

The Cardiovascular Health Study was a prospective cohort study of risk factors for cardiovascular disease among adults aged 65 years and older. The subjects were enrolled in 1989-1990 and followed till 2000. We will investigate the following questions:

  • (1) Is the risk of myocardial infarction (MI) among the elderly different for men than for women? If it is different, does the difference vary with age?
  • (2) Is the carotid artery intima-media wall thickness associated with future risk of myocardial infarction?
The dataset mi.RData includes information on 3917 subjects, of whom 402 had myocardial infarction during the follow-up. The description of variables is provided in a separate codesheet.

Conduct a descriptive analysis of MI risk and its association with age, gender, and intima wall thickness. Build regression models addressing the questions of interest using three different approaches and compare the results:
  1. MI risk analysis by the Cox model.
  2. MI risk analysis by the grouped Poisson model.
  3. Analysis of MI as a binary outcome (ignoring the timing of the MI event).

Extra materials

Spiegelhalter on communicating statistics

Spiegelhalter on odds ratio

ISL and ESL books

ISCB ČR

Tidyverse book

Dplyr & tidyverse examples

Epidemiology in R Handbook

R Inferno

R Inferno zhudebněno

Prediction models in healthcare: a playground for researchers

Discussion on ICs

Causal inference introduction

What If (causal inference)