We will work with the dataset `pbc`

contained in the `survival`

library.

```
library(survival)
data(pbd)
str(pbc)
help(pbc)
```

Investigate the structure of the dataset and read the description included in the R help page for this data.

Plot survival curves and calculate the logrank test to compare time to death in subgroups defined by the following variables [continuous variables should be cut in sample quartiles to obtain four groups of approximately equal size]. Interpret the results.

- treatment
- bilirubin
- age
- sex
- albumin
- prothrombin time
- alkaline phosphatase
- platelet count
- presence of ascites
- edema
- hepatomegaly
- spiders

Hint: write a function, call it repeatedly for the 12 variables

The function to fit the proportional hazards model is called `coxph`

. Itâ€™s syntax is [example]

`fit=coxph(Surv(time,delta)~age+bili+hepato,data=pbc)`

The logic of specifying the model formula is the same as with functions `lm`

or `glm`

. Printing the results and performing tests of hypotheses for model building is also very similar to other regression functions in R:

```
summary(fit)
anova(fit)
anova(fit1,fit2)
drop1(fit,test="Chisq")
```

The parameter estimation is based on the maximum partial likelihood method but the asymptotic properties, tests etc. are all the same as with ordinary likelihood methods, as applied, for example, in generalized linear models. In particular, a submodel is tested against a wider model by likelihood ratio tests performed by functions `anova`

or `drop1`

. The difference of log-likelihoods multiplied by 2 has asymptotically \(\chi^2_d\) distribution if the submodel is true, where \(d\) is the difference in the number of parameters. The tests shown in the parameter table of `summary`

function are Wald tests for zero values of the individual parameters.

Fit a proportional hazards model with grouped bilirubin as the single covariate. Interpret the estimated parameters. Test the hypothesis that bilirubin does not affect survival of primary biliary cirrhosis patients. Compare the results with the logrank test from Task 1.

Build a proportional hazards model for survival of primary biliary cirrhosis patients. Consider the variables listed in Task 1 as potential covariates. Continuous variables can be included in the linear form, or after transformation, or in the grouped form. Do not consider interactions. Keep only the covariates that have a significant effect on survival.

Interpret the effect of covariates that have been kept in the final model and test their effects on survival.