Exercise 6: Building survival models


Back to course page


Obtaining the dataset

We will work with the dataset pbc contained in the survival library.

library(survival)
data(pbd)
str(pbc)
help(pbc)

Investigate the structure of the dataset and read the description included in the R help page for this data.

Comparing multiple survival curves

Task 1

Plot survival curves and calculate the logrank test to compare time to death in subgroups defined by the following variables [continuous variables should be cut in sample quartiles to obtain four groups of approximately equal size]. Interpret the results.

  1. treatment
  2. bilirubin
  3. age
  4. sex
  5. albumin
  6. prothrombin time
  7. alkaline phosphatase
  8. platelet count
  9. presence of ascites
  10. edema
  11. hepatomegaly
  12. spiders

Hint: write a function, call it repeatedly for the 12 variables

Fitting Cox proportional hazards model

The function to fit the proportional hazards model is called coxph. It’s syntax is [example]

fit=coxph(Surv(time,delta)~age+bili+hepato,data=pbc)

The logic of specifying the model formula is the same as with functions lm or glm. Printing the results and performing tests of hypotheses for model building is also very similar to other regression functions in R:

summary(fit)
anova(fit)
anova(fit1,fit2)
drop1(fit,test="Chisq")

The parameter estimation is based on the maximum partial likelihood method but the asymptotic properties, tests etc. are all the same as with ordinary likelihood methods, as applied, for example, in generalized linear models. In particular, a submodel is tested against a wider model by likelihood ratio tests performed by functions anova or drop1. The difference of log-likelihoods multiplied by 2 has asymptotically \(\chi^2_d\) distribution if the submodel is true, where \(d\) is the difference in the number of parameters. The tests shown in the parameter table of summary function are Wald tests for zero values of the individual parameters.

Task 2

Fit a proportional hazards model with grouped bilirubin as the single covariate. Interpret the estimated parameters. Test the hypothesis that bilirubin does not affect survival of primary biliary cirrhosis patients. Compare the results with the logrank test from Task 1.

Task 3

Build a proportional hazards model for survival of primary biliary cirrhosis patients. Consider the variables listed in Task 1 as potential covariates. Continuous variables can be included in the linear form, or after transformation, or in the grouped form. Do not consider interactions. Keep only the covariates that have a significant effect on survival.

Interpret the effect of covariates that have been kept in the final model and test their effects on survival.