# Exercise 6: Building survival models

Back to course page

## Obtaining the dataset

We will work with the dataset pbc contained in the survival library.

library(survival)
data(pbd)
str(pbc)
help(pbc)

Investigate the structure of the dataset and read the description included in the R help page for this data.

## Comparing multiple survival curves

Plot survival curves and calculate the logrank test to compare time to death in subgroups defined by the following variables [continuous variables should be cut in sample quartiles to obtain four groups of approximately equal size]. Interpret the results.

1. treatment
2. bilirubin
3. age
4. sex
5. albumin
6. prothrombin time
7. alkaline phosphatase
8. platelet count
9. presence of ascites
10. edema
11. hepatomegaly
12. spiders

Hint: write a function, call it repeatedly for the 12 variables

## Fitting Cox proportional hazards model

The function to fit the proportional hazards model is called coxph. Itâ€™s syntax is [example]

fit=coxph(Surv(time,delta)~age+bili+hepato,data=pbc)

The logic of specifying the model formula is the same as with functions lm or glm. Printing the results and performing tests of hypotheses for model building is also very similar to other regression functions in R:

summary(fit)
anova(fit)
anova(fit1,fit2)
drop1(fit,test="Chisq")

The parameter estimation is based on the maximum partial likelihood method but the asymptotic properties, tests etc. are all the same as with ordinary likelihood methods, as applied, for example, in generalized linear models. In particular, a submodel is tested against a wider model by likelihood ratio tests performed by functions anova or drop1. The difference of log-likelihoods multiplied by 2 has asymptotically $$\chi^2_d$$ distribution if the submodel is true, where $$d$$ is the difference in the number of parameters. The tests shown in the parameter table of summary function are Wald tests for zero values of the individual parameters.