Proportional Hazards

We will explore the relationship between survival and explanatory variables by modeling. In this class, we consider a broad class of regression models: Proportional Hazards (PH) models. \[ \log\lambda(t;{\bf Z})=\log\lambda_0(t)+{\boldsymbol\beta}{\bf Z} \] Suppose \(Z = 1\) for treated subjects and \(Z = 0\) for untreated subjects. Then this model says that the hazard is increased by a factor of \(e^{\beta}\) for treated subjects versus untreated subjects (\(e^{\beta}\) might be < 1).

This group of PH models divides further into:

library(survival)
library(KMsurv)

Cox Proportional Hazards model

\[ \lambda(t;{\bf Z})=\lambda_0(t)\exp\{{\boldsymbol\beta}{\bf Z}\} \]

Why do we call it proportional hazards? Think of the first example, where \(Z = 1\) for treated and \(Z = 0\) for control. Then if we think of \(\lambda_1(t)\) as the hazard rate for the treated group, and \(\lambda_0(t)\) as the hazard for control, then we can write: \[ \lambda_1(t) = \lambda(t;Z=1)=\lambda_0(t)\exp\{\beta Z\} = \lambda_0(t)\exp\{\beta\} \] This implies that the ratio of the two hazards is a constant, \(\phi\), which does NOT depend on time, t. In other words, the hazards of the two groups remain proportional over time. \[ \phi = \frac{\lambda_1(t)}{\lambda_0(t)} = e^{\beta} \] \(\phi\) is referred to as the hazard ratio.

Baseline Hazard Function

In the example of comparing two treatment groups, \(\lambda_0(t)\) is the hazard rate for the control group.

In general, \(\lambda_0(t)\) is called the baseline hazard function, and reflects the underlying hazard for subjects with all covariates \(Z_1,\ldots,Z_p\) equal to 0 (i.e., the 'reference group').

The general form is: \[ \lambda(t;{\bf Z})=\lambda_0(t)\exp\{\beta_1Z_1+\ldots+\beta_pZ_p\} \] So when we substitute all of the \(Z_j\) equal to 0, we get \(\lambda(t;{\bf Z})=\lambda_0(t)\).

In the general case, we think of the i-th individual having a set of covariates \({\bf Z}_i = (Z_{1i}, Z_{2i},\ldots, Z_{pi})\), and we model their hazard rate as some multiple of the baseline hazard rate: \[ \lambda(t;{\bf Z}_i)=\lambda_0(t)\exp\{\beta_1Z_{1i}+\ldots+\beta_pZ_{pi}\} \] This means we can write the log of the hazard ratio for the i-th individual to the reference group as: \[ \log\frac{\lambda(t;{\bf Z}_i)}{\lambda_0(t)}=\beta_1Z_{1i}+\ldots+\beta_pZ_{pi} \] The Cox Proportional Hazards model is a linear model for the log of the hazard ratio.

One of the biggest advantages of the framework of the Cox PH model is that we can estimate the parameters \({\boldsymbol\beta}\) which reflect the effects of treatment and other covariates without having to make any assumptions about the form of \(\lambda_0(t)\).

In other words, we don???t have to assume that \(\lambda_0(t)\) follows an exponential model, or a Weibull model, or any other particular parametric model. That???s what makes the model semi-parametric.

Questions:

  1. Why don't we just model the hazard ratio, \(\phi = \frac{\lambda_i(t)}{\lambda_0(t)}\), directly as a linear function of the covariates \({\bf Z}\)?
  2. Why doesn't the model have an intercept?

Parameters of interest, \({\boldsymbol\beta}\), are estimated via partial likelihood. \(\lambda_0(\cdot)\) is treated as a nuisance parameter.

leuk = read.table("http://www.karlin.mff.cuni.cz/~pesta/prednasky/NMFM404/Data/leuk.dat", sep=" ", head=T, skip=7) # read data
head(leuk)
##   weeks remiss trtmt
## 1     6      0     1
## 2     6      1     1
## 3     6      1     1
## 4     6      1     1
## 5     7      1     1
## 6     9      0     1
# Cox PH Model
leuk.ph = coxph(Surv(weeks,remiss) ~ trtmt, data=leuk)
# Note: default = Efron method for handling ties
summary(leuk.ph)
## Call:
## coxph(formula = Surv(weeks, remiss) ~ trtmt, data = leuk)
## 
##   n= 42, number of events= 30 
## 
##          coef exp(coef) se(coef)      z Pr(>|z|)    
## trtmt -1.5721    0.2076   0.4124 -3.812 0.000138 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##       exp(coef) exp(-coef) lower .95 upper .95
## trtmt    0.2076      4.817   0.09251    0.4659
## 
## Concordance= 0.69  (se = 0.053 )
## Rsquare= 0.322   (max possible= 0.988 )
## Likelihood ratio test= 16.35  on 1 df,   p=5.261e-05
## Wald test            = 14.53  on 1 df,   p=0.0001378
## Score (logrank) test = 17.25  on 1 df,   p=3.283e-05
# Breslow handling of ties
leuk.phb = coxph(Surv(weeks, remiss) ~ trtmt, data=leuk, method="breslow")
summary(leuk.phb)
## Call:
## coxph(formula = Surv(weeks, remiss) ~ trtmt, data = leuk, method = "breslow")
## 
##   n= 42, number of events= 30 
## 
##          coef exp(coef) se(coef)      z Pr(>|z|)    
## trtmt -1.5092    0.2211   0.4096 -3.685 0.000229 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##       exp(coef) exp(-coef) lower .95 upper .95
## trtmt    0.2211      4.523   0.09907    0.4934
## 
## Concordance= 0.69  (se = 0.053 )
## Rsquare= 0.304   (max possible= 0.989 )
## Likelihood ratio test= 15.21  on 1 df,   p=9.615e-05
## Wald test            = 13.58  on 1 df,   p=0.0002288
## Score (logrank) test = 15.93  on 1 df,   p=6.571e-05
# Exact handling of ties
leuk.phe = coxph(Surv(weeks, remiss) ~ trtmt, data=leuk, method="exact")
summary(leuk.phe)
## Call:
## coxph(formula = Surv(weeks, remiss) ~ trtmt, data = leuk, method = "exact")
## 
##   n= 42, number of events= 30 
## 
##          coef exp(coef) se(coef)      z Pr(>|z|)    
## trtmt -1.6282    0.1963   0.4331 -3.759  0.00017 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##       exp(coef) exp(-coef) lower .95 upper .95
## trtmt    0.1963      5.095   0.08398    0.4587
## 
## Rsquare= 0.321   (max possible= 0.98 )
## Likelihood ratio test= 16.25  on 1 df,   p=5.544e-05
## Wald test            = 14.13  on 1 df,   p=0.0001704
## Score (logrank) test = 16.79  on 1 df,   p=4.169e-05

Adjustments for ties

The proportional hazards model assumes a continuous hazard ??? ties are not possible. There are four proposed modifications to the likelihood to adjust for ties.

Bottom Line: Implications of Ties (See Allison (1995), p.127-137)

Confidence intervals and hypothesis tests

Many software packages provide estimates of \({\boldsymbol\beta}\), but the hazard ratio \(HR=\exp\{{\boldsymbol\beta}\}\) is usually the parameter of interest.

For each covariate of interest, the null hypothesis is \[ H_0:\,HR_j=1\,\Leftrightarrow\,\beta_j=0 \] Wald test is used. For nested models, a likelihood ratio test is constructed.

MAC prevention clinical trial data

ACTG 196 was a randomized clinical trial to study the effects of combination regimens on prevention of MAC (mycobacterium avium complex), one of the most common opportunistic infections in AIDS patients.

The treatment regimens were:

mac = read.csv("http://www.karlin.mff.cuni.cz/~pesta/prednasky/NMFM404/Data/mac.csv", head=F, col.names =
    c("patid", "age", "agecat", "sex", "karnof", "ivdrug",
    "antiret", "cd4", "cd4cat", "ctg", "dthstat", "dthtime",
    "macstat", "mactime", "swdrstat", "swdrtime", "rif",
    "clari", "toxstat", "toxtime"))
# mactime = time to mac
# macstat = event indicator
head(mac)     # Check out a few data rows:
##   patid age agecat sex karnof ivdrug antiret cd4 cd4cat ctg dthstat
## 1     1  42      1   0     90      0       1   8      0   1       0
## 2     2  33      0   0     90      0       0  30      1   1       0
## 3     3  39      1   0    100      0       1  80      1   1       1
## 4     4  35      0   0     80      0       0  58      1   1       0
## 5     5  42      1   0     90      0       0  59      1   1       0
## 6     6  45      1   0     90      0       0  18      0   1       1
##   dthtime macstat mactime swdrstat swdrtime rif clari toxstat toxtime
## 1     623       1     560        1      106   1     0       0     623
## 2     651       0     651        1       86   1     0       0     651
## 3     464       0      26        0       26   1     0       0      26
## 4     622       0     622        1      111   0     1       0     501
## 5     643       0     643        1      119   0     1       0     643
## 6     216       0     171        0      171   0     1       1     171
# Treatment arms:
#  rif + clari (N=389)
#  clari (N=397)
#  rif (N=391)
table(mac$clari, mac$rif)
##    
##       0   1
##   0 389 391
##   1 397   0
# Fit and plot KM survival curves
fit.km = survfit(Surv(mactime, macstat) ~ clari + rif, data=mac)

plot(fit.km, mark.time=F, lty=1:3, xlab="Days", ylab="MAC-Free Survival")
legend(lty=1:3, x=100, y=.4, legend=c("Rif + Clari", "Rifabutin", "Clarithromycin"), cex=0.8)

plot of chunk ph-mac-km

Model 1

# Fit Cox PH model
fit.mac1 = coxph(Surv(mactime,macstat) ~ karnof + clari + rif, data=mac)
summary(fit.mac1)
## Call:
## coxph(formula = Surv(mactime, macstat) ~ karnof + clari + rif, 
##     data = mac)
## 
##   n= 1177, number of events= 121 
## 
##            coef exp(coef) se(coef)      z Pr(>|z|)    
## karnof -0.04485   0.95614  0.01064 -4.217 2.48e-05 ***
## clari   0.27557   1.31728  0.25801  1.068 0.285509    
## rif     0.87197   2.39161  0.23694  3.680 0.000233 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##        exp(coef) exp(-coef) lower .95 upper .95
## karnof    0.9561     1.0459    0.9364    0.9763
## clari     1.3173     0.7591    0.7944    2.1842
## rif       2.3916     0.4181    1.5032    3.8051
## 
## Concordance= 0.649  (se = 0.028 )
## Rsquare= 0.027   (max possible= 0.73 )
## Likelihood ratio test= 32.02  on 3 df,   p=5.193e-07
## Wald test            = 32.29  on 3 df,   p=4.548e-07
## Score (logrank) test = 33.16  on 3 df,   p=2.977e-07

Model 2

# Fit Cox PH model adding baseline CD4 variable
fit.mac2 = coxph(Surv(mactime,macstat) ~ karnof + clari + rif + cd4, data=mac)
summary(fit.mac2)
## Call:
## coxph(formula = Surv(mactime, macstat) ~ karnof + clari + rif + 
##     cd4, data = mac)
## 
##   n= 1177, number of events= 121 
## 
##             coef exp(coef)  se(coef)      z Pr(>|z|)    
## karnof -0.036874  0.963798  0.010665 -3.457 0.000546 ***
## clari   0.252345  1.287041  0.258337  0.977 0.328664    
## rif     0.879749  2.410294  0.237092  3.711 0.000207 ***
## cd4    -0.018360  0.981807  0.003684 -4.984 6.23e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##        exp(coef) exp(-coef) lower .95 upper .95
## karnof    0.9638     1.0376    0.9439    0.9842
## clari     1.2870     0.7770    0.7757    2.1354
## rif       2.4103     0.4149    1.5145    3.8360
## cd4       0.9818     1.0185    0.9747    0.9889
## 
## Concordance= 0.716  (se = 0.028 )
## Rsquare= 0.053   (max possible= 0.73 )
## Likelihood ratio test= 63.77  on 4 df,   p=4.682e-13
## Wald test            = 55.59  on 4 df,   p=2.449e-11
## Score (logrank) test = 56.22  on 4 df,   p=1.806e-11

We can also compute the hazard ratio ourselves, by exponentiating the \(\beta\) coefficients: \[ HR_{cd4} = \exp\{???0.01835\} = 0.98 \] Why is this HR so close to 1, and yet still highly significant? What is the interpretation of this HR?

The likelihood ratio test for the effect of CD4 is twice the difference in minus log-likelihoods between the two models (on 1 df): \[ ??^2{LR} = 2???(63.8???32.0)=31.8 \] How does this test statistic compare to the Wald \(\chi^2\) test?

In the mac study, there were three treatment arms (rif, clari, and the rif+clari combination (for rif=clari=0). The combination therapy is the reference group, so the coefficients of rif and clari are in comparison to the combination group.

How do we compare the Rifa arm against the Clari arm? That is equivalent to testing \[ H_0:\,\beta_{clari}=\beta_{rifa} \] For that, we need the matrix \(\mathsf{var}\widehat{\boldsymbol\beta}\).

names(fit.mac2)   # check components of fit object
##  [1] "coefficients"      "var"               "loglik"           
##  [4] "score"             "iter"              "linear.predictors"
##  [7] "residuals"         "means"             "concordance"      
## [10] "method"            "n"                 "nevent"           
## [13] "terms"             "assign"            "wald.test"        
## [16] "y"                 "formula"           "call"
beta_hat = fit.mac2$coef
Var_beta = fit.mac2$var   # Var(beta_hat)
A = c(0,1,-1,0)   # contrast vector
beta_dif = t(A) %*% beta_hat     # beta_clari - beta_rif
var_dif = t(A) %*% Var_beta %*% A   # Var(beta_clari-beta_rif)
se_dif = sqrt(var_dif)        # SE(beta_clari-beta_rif)
t_dif = beta_dif/se_dif
t_dif     # Wald test t-statistic, df=n-p
##           [,1]
## [1,] -2.959871
2* pt( -abs(t_dif), df=1177-4) # p-value for difference in arms
##             [,1]
## [1,] 0.003139506

Testing Model 2 for overall treatment effect Two ways:

  1. Wald test. \(\beta_{rif} = 0\,\&\,\beta_{clari} = 0\). Null distribution is Chi-square with df=2 (why?).
    A = matrix(0, ncol=4, nrow=2)
    A[1,2] = A[2,3] = 1
    beta_trt = A %% beta_hat
    var_trt = A %% Var_beta %% t(A)
    (Wald_trt = t(beta_trt) %% solve(var_trt) %*% beta_trt)
    
    ##          [,1]
    ## [1,] 17.00252
    
    1-pchisq(Wald_trt, df=2)  # p-value for Wald test for treatment
    
    ##              [,1]
    ## [1,] 0.0002032125
    
  2. Likelihood ratio test. Fit model with and without rif, clari and compare log-likelihood, against Chi-square with df=2.
    fit.mac3 = update(fit.mac2, .~. -rif -clari)
    anova(fit.mac3, fit.mac2, test="Chisq")
    
    ## Analysis of Deviance Table
    ##  Cox model: response is  Surv(mactime, macstat)
    ##  Model 1: ~ karnof + cd4
    ##  Model 2: ~ karnof + clari + rif + cd4
    ##    loglik  Chisq Df P(>|Chi|)
    ## 1 -747.07
    ## 2 -738.62 16.914 2 0.0002125 *** ## --- ## Signif. codes: 0 '**' 0.001 '' 0.01 '' 0.05 '.' 0.1 ' ' 1

Predicted survival

For the i-th patient with covariates \({\bf Z}_i\), we have: \[ S_i(t)=[S_0(t)]^{\exp\{{\boldsymbol\beta}{\bf Z}_i\}} \] Say we are interested in the survival pattern for single males in the nursing home study. Based on the previous formula, if we had an estimate for the survival function in the reference group, i.e., \(\widehat{S}_0(t)\), we could get estimates of the survival function for any set of covariates \({\bf Z}_i\).

How can we estimate the survival function, \(S_0(t)\)? We could use the KM estimator, but there are a few disadvantages of that approach:

Instead, we will use a baseline hazard estimator which takes advantage of the proportional hazards assumption to get a smoother estimate. \[ \widehat{S}_i(t)=[\widehat{S}_0(t)]^{\exp\{\widehat{\boldsymbol\beta}{\bf Z}_i\}} \] Using the above formula, we substitute \(\widehat{\boldsymbol\beta}\) based on fitting the Cox PH model, and calculate \(\widehat{S}_0(t)\) by one of the following approaches:

Consider the Nursing Home study. We fit the model \[ \lambda_(t) = \lambda_0(t) \exp\{\beta_1[\mbox{married}] + \beta_2[\mbox{health status}]\} \] Let's predict the survival for single individuals with health status = 5 (worst), and for those with health status = 4 (second worst).

nurshome = read.table("http://www.karlin.mff.cuni.cz/~pesta/prednasky/NMFM404/Data/NursingHome.dat", head=F, skip=14, col.names = c("los", "age", "rx", "gender", "married", "health", "censor"))
nurshome$discharged = 1 - nurshome$censor # event indicator
head(nurshome)
##    los age rx gender married health censor discharged
## 1   37  86  1      0       0      2      0          1
## 2   61  77  1      0       0      4      0          1
## 3 1084  75  1      0       0      3      1          0
## 4 1092  77  1      0       1      2      1          0
## 5   23  86  1      0       0      4      0          1
## 6 1091  71  1      1       0      3      1          0
table(nurshome$health)
## 
##   2   3   4   5 
## 343 576 513 169
# Fit Cox PH model
fit.nurs = coxph(Surv(los,discharged) ~ married + health, data=nurshome)
# Predict survival for single patients with health = 5
newdat = data.frame(married=0, health=5)
newdat    # data frame with same predictors as fit.nurs
##   married health
## 1       0      5
ps5 = survfit(fit.nurs, newdata=newdat)    # predicted survival
# Compare with Kaplan-Meier
nursh5 = nurshome[nurshome$health==5 & nurshome$married==0,]
fit.km5 = survfit(Surv(los, discharged) ~ 1, data = nursh5)
# Predict survival for single patients, health =4
newdat[1,] = c(0,4)
ps4 =  survfit(fit.nurs, newdata=newdat)
# Compare with Kaplan-Meier
nursh4 = nurshome[nurshome$health==4 & nurshome$married==0,]
fit.km4 = survfit(Surv(los, discharged) ~ 1, data = nursh4)

plot(ps5, xlab= "Length of stay in nursing home (days)", ylab = "Proportion not discharged", col=2, conf.int=F)
lines(fit.km5, mark.time=F, col=2)     # add line to existing plot
lines(ps4, xlab= "Length of stay in nursing home (days)", ylab = "Proportion not discharged", col=4, conf.int=F)
lines(fit.km4, mark.time=F, col=4)     # add line to existing plot

plot of chunk ph-nurs-predict

Continuing the Nursing Home example, let's get and plot the baseline survival \(S_0(t)\) and cumulative hazard \(\Lambda_0(t)\). They correspond to married=0, health=0.

newdat[1,] = c(0,0)
ps0 = survfit(fit.nurs, newdata=newdat)   # S_0 = ps0$surv
bh = basehaz(fit.nurs, centered=F)
Lambda0 = bh$hazard   # it???s *cummulative* hazard!
# same as Lambda0 = -log(ps0$surv)

plot(bh$time, Lambda0, type="l", xlab="Length of stay in nursing home (days)", ylab="Cummulative hazard", col=4)

plot of chunk ph-nurs-baseline

Model selection

Collett (Section 3.6) has an excellent discussion of various approaches for model selection. In practice, model selection proceeds through a combination of

Many advocate the approach of first doing a univariate analysis to 'screen' out potentially significant variables for consideration in the multivariate model (see Collett).

Moreover, typically univariate analysis is a part of a larger analysis, in order to identify the importance of each predictor taken in itself.

Atlantic halibut survival times data

One conservation measure suggested for trawl fishing is a minimum size limit for halibut (32 inches). However, this size limit would only be effective if captured fish below the limit survived until the time of their release. An experiment was conducted to evaluate the survival rates of halibut caught by trawls or longlines, and to assess other factors which might contribute to survival (duration of trawling, maximum depth fished, size of fish, and handling time).

halibut = read.csv("http://www.karlin.mff.cuni.cz/~pesta/prednasky/NMFM404/Data/halibut.csv", head=T)
dim(halibut)
## [1] 294   8
head(halibut)
##   No Time Death Towd Deldepth Length Handtime Logcat
## 1  1  209     1   30       13     41        8  6.992
## 2  2  209     1   30       13     44        8  6.992
## 3  3  209     1   30       13     47       10  6.992
## 4  4  209     1   30       13     34       10  6.992
## 5  5   38     1   30       13     40       11  6.992
## 6  6  209     1   30       13     42       11  6.992
# Time = Survival time, in minutes
# Death = Event indicator
# Length =  length of fish (cm)
# Handtime = handling time (min)
# Logcat = log total catch

### Univariate analysis
# Towing duration:
table(halibut$Towd)
## 
##  30 100 
## 107 187
kmtow = survfit(Surv(Time, Death) ~ Towd, data=halibut)
# Length of fish:
summary(halibut$Length)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   29.00   38.00   43.00   42.51   46.75   57.00
halibut$Lencat = (halibut$Length > 42) + 0
table(halibut$Lencat)
## 
##   0   1 
## 134 160
kmlen = survfit(Surv(Time, Death) ~ Lencat, data=halibut)
# Depth range of towing:
table(halibut$Deldepth)
## 
##  1  2  3  4  5  6  8 10 13 15 23 49 58 
##  6 17  8  6 45 46 40 68  8 12 32  3  3
summary(halibut$Deldepth)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   5.000   8.000   9.966  10.000  58.000
halibut$Depthcat = cut(halibut$Deldepth, breaks=c(0,5,9,11,60))
kmdep = survfit(Surv(Time, Death) ~ Depthcat, data=halibut)
# Handling time:
summary(halibut$Handtime)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    6.00   10.00   12.51   17.00   38.00
halibut$Handcat = cut(halibut$Handtime, breaks = c(0,6,10,17,40))
kmhan = survfit(Surv(Time, Death) ~ Handcat, data=halibut)
# Log total catch:
summary(halibut$Logcat)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.904   4.059   4.203   4.886   5.323   8.690
halibut$Catchcat = cut(halibut$Logcat, breaks=c(0,4.06,4.89,5.33,9))
kmlgc = survfit(Surv(Time, Death) ~ Catchcat, data = halibut)

par(mfrow=c(3,2))
plot(kmtow, lty=1:2, col=2:3, xlab="Minutes", ylab="Survival", main="Towing Duration")
plot(kmlen, lty=1:2, col=2:3, xlab="Minutes", ylab="Survival", main="Fish Length")
plot(kmdep, lty=1:4, col=1:4, xlab="Minutes", ylab="Survival", main="Depth of Towing")
plot(kmhan, lty=1:4, col=1:4, xlab="Minutes", ylab="Survival", main="Handling Time")
plot(kmlgc, lty=1:4, col=1:4, xlab="Minutes", ylab="Survival", main="Total Catch")
par(mfrow=c(1,1))

plot of chunk ph-halibut

Model selection approach

  1. Fit a univariate model for each covariate, and identify the predictors significant at some level \(p_1\), say 0.20 (Hosmer and Lemeshow recommend \(p_1 = 0.25\)).
  2. Fit a multivariate model with all significant univariate predictors, and use backward selection to eliminate nonsignificant variables at some level \(p_2\), say 0.10.
  3. Starting with final step 2. model, consider each of the non-significant variables from step 1. using forward selection, with significance level \(p_3\), say 0.10.
  4. Do final pruning of main-effects model (omit variables that are non-significant, add any that are significant), using stepwise regression with significance level \(p_4\). At this stage, you may also consider adding interactions between any of the main effects currently in the model, under the hierarchical principle.

Collett recommends using a likelihood ratio test for all variable inclusion/exclusion decisions.

Each step uses a 'greedy' approach:

library(MASS)
args(stepAIC)
## function (object, scope, scale = 0, direction = c("both", "backward", 
##     "forward"), trace = 1, keep = NULL, steps = 1000, use.start = FALSE, 
##     k = 2, ...) 
## NULL
# Model selection for Halibut data
fit = coxph(Surv(Time, Death) ~ Towd + Length + Deldepth + Handtime + Logcat, data=halibut)
# Backward selection using AIC
fitb = stepAIC(fit, direction="backward", k=2)
## Start:  AIC=2520.15
## Surv(Time, Death) ~ Towd + Length + Deldepth + Handtime + Logcat
## 
##            Df    AIC
## - Deldepth  1 2519.7
## <none>        2520.2
## - Towd      1 2531.7
## - Logcat    1 2532.8
## - Length    1 2533.0
## - Handtime  1 2544.9
## 
## Step:  AIC=2519.7
## Surv(Time, Death) ~ Towd + Length + Handtime + Logcat
## 
##            Df    AIC
## <none>        2519.7
## - Logcat    1 2530.8
## - Length    1 2531.2
## - Towd      1 2532.7
## - Handtime  1 2547.7
summary(fitb)
## Call:
## coxph(formula = Surv(Time, Death) ~ Towd + Length + Handtime + 
##     Logcat, data = halibut)
## 
##   n= 294, number of events= 273 
## 
##               coef exp(coef)  se(coef)      z Pr(>|z|)    
## Towd      0.007744  1.007774  0.002017  3.839 0.000123 ***
## Length   -0.036960  0.963715  0.010028 -3.686 0.000228 ***
## Handtime  0.054947  1.056485  0.009870  5.567 2.59e-08 ***
## Logcat   -0.183373  0.832458  0.050982 -3.597 0.000322 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##          exp(coef) exp(-coef) lower .95 upper .95
## Towd        1.0078     0.9923    1.0038    1.0118
## Length      0.9637     1.0377    0.9450    0.9828
## Handtime    1.0565     0.9465    1.0362    1.0771
## Logcat      0.8325     1.2013    0.7533    0.9199
## 
## Concordance= 0.683  (se = 0.02 )
## Rsquare= 0.25   (max possible= 1 )
## Likelihood ratio test= 84.5  on 4 df,   p=0
## Wald test            = 90.71  on 4 df,   p=0
## Score (logrank) test = 94.51  on 4 df,   p=0
# Forward selection using AIC
fit0 = coxph(Surv(Time, Death) ~ 1, data=halibut)
  # (starting model)
fitf = stepAIC(fit0, scope=formula(fit), direction="forward", k=2)
## Warning in is.na(fit$coefficients): is.na() applied to non-(list or
## vector) of type 'NULL'
## Start:  AIC=2596.2
## Surv(Time, Death) ~ 1
## Warning in is.na(fit$coefficients): is.na() applied to non-(list or
## vector) of type 'NULL'
## Warning in is.na(fit$coefficients): is.na() applied to non-(list or
## vector) of type 'NULL'
##            Df    AIC
## + Handtime  1 2556.8
## + Towd      1 2568.0
## + Length    1 2586.1
## + Deldepth  1 2594.5
## <none>        2596.2
## + Logcat    1 2596.8
## 
## Step:  AIC=2556.85
## Surv(Time, Death) ~ Handtime
## 
##            Df    AIC
## + Logcat    1 2540.3
## + Towd      1 2543.6
## + Length    1 2549.9
## <none>        2556.8
## + Deldepth  1 2558.8
## 
## Step:  AIC=2540.3
## Surv(Time, Death) ~ Handtime + Logcat
## 
##            Df    AIC
## + Towd      1 2531.2
## + Length    1 2532.7
## <none>        2540.3
## + Deldepth  1 2541.4
## 
## Step:  AIC=2531.23
## Surv(Time, Death) ~ Handtime + Logcat + Towd
## 
##            Df    AIC
## + Length    1 2519.7
## <none>        2531.2
## + Deldepth  1 2533.0
## 
## Step:  AIC=2519.7
## Surv(Time, Death) ~ Handtime + Logcat + Towd + Length
## 
##            Df    AIC
## <none>        2519.7
## + Deldepth  1 2520.2
summary(fitf)
## Call:
## coxph(formula = Surv(Time, Death) ~ Handtime + Logcat + Towd + 
##     Length, data = halibut)
## 
##   n= 294, number of events= 273 
## 
##               coef exp(coef)  se(coef)      z Pr(>|z|)    
## Handtime  0.054947  1.056485  0.009870  5.567 2.59e-08 ***
## Logcat   -0.183373  0.832458  0.050982 -3.597 0.000322 ***
## Towd      0.007744  1.007774  0.002017  3.839 0.000123 ***
## Length   -0.036960  0.963715  0.010028 -3.686 0.000228 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##          exp(coef) exp(-coef) lower .95 upper .95
## Handtime    1.0565     0.9465    1.0362    1.0771
## Logcat      0.8325     1.2013    0.7533    0.9199
## Towd        1.0078     0.9923    1.0038    1.0118
## Length      0.9637     1.0377    0.9450    0.9828
## 
## Concordance= 0.683  (se = 0.02 )
## Rsquare= 0.25   (max possible= 1 )
## Likelihood ratio test= 84.5  on 4 df,   p=0
## Wald test            = 90.71  on 4 df,   p=0
## Score (logrank) test = 94.51  on 4 df,   p=0
# same final model as backward selection!
# Stepwise model selection (backward and forward)
fits = stepAIC(fit, direction="both", k=2)
## Start:  AIC=2520.15
## Surv(Time, Death) ~ Towd + Length + Deldepth + Handtime + Logcat
## 
##            Df    AIC
## - Deldepth  1 2519.7
## <none>        2520.2
## - Towd      1 2531.7
## - Logcat    1 2532.8
## - Length    1 2533.0
## - Handtime  1 2544.9
## 
## Step:  AIC=2519.7
## Surv(Time, Death) ~ Towd + Length + Handtime + Logcat
## 
##            Df    AIC
## <none>        2519.7
## + Deldepth  1 2520.2
## - Logcat    1 2530.8
## - Length    1 2531.2
## - Towd      1 2532.7
## - Handtime  1 2547.7
summary(fits)
## Call:
## coxph(formula = Surv(Time, Death) ~ Towd + Length + Handtime + 
##     Logcat, data = halibut)
## 
##   n= 294, number of events= 273 
## 
##               coef exp(coef)  se(coef)      z Pr(>|z|)    
## Towd      0.007744  1.007774  0.002017  3.839 0.000123 ***
## Length   -0.036960  0.963715  0.010028 -3.686 0.000228 ***
## Handtime  0.054947  1.056485  0.009870  5.567 2.59e-08 ***
## Logcat   -0.183373  0.832458  0.050982 -3.597 0.000322 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##          exp(coef) exp(-coef) lower .95 upper .95
## Towd        1.0078     0.9923    1.0038    1.0118
## Length      0.9637     1.0377    0.9450    0.9828
## Handtime    1.0565     0.9465    1.0362    1.0771
## Logcat      0.8325     1.2013    0.7533    0.9199
## 
## Concordance= 0.683  (se = 0.02 )
## Rsquare= 0.25   (max possible= 1 )
## Likelihood ratio test= 84.5  on 4 df,   p=0
## Wald test            = 90.71  on 4 df,   p=0
## Score (logrank) test = 94.51  on 4 df,   p=0
# Stepwise selection with alpha=k=3
fitk3 = stepAIC(fit, direction="both", k=3)
## Start:  AIC=2525.15
## Surv(Time, Death) ~ Towd + Length + Deldepth + Handtime + Logcat
## 
##            Df    AIC
## - Deldepth  1 2523.7
## <none>        2525.2
## - Towd      1 2535.7
## - Logcat    1 2536.8
## - Length    1 2537.0
## - Handtime  1 2548.9
## 
## Step:  AIC=2523.7
## Surv(Time, Death) ~ Towd + Length + Handtime + Logcat
## 
##            Df    AIC
## <none>        2523.7
## + Deldepth  1 2525.2
## - Logcat    1 2533.8
## - Length    1 2534.2
## - Towd      1 2535.7
## - Handtime  1 2550.7
summary(fitk3)
## Call:
## coxph(formula = Surv(Time, Death) ~ Towd + Length + Handtime + 
##     Logcat, data = halibut)
## 
##   n= 294, number of events= 273 
## 
##               coef exp(coef)  se(coef)      z Pr(>|z|)    
## Towd      0.007744  1.007774  0.002017  3.839 0.000123 ***
## Length   -0.036960  0.963715  0.010028 -3.686 0.000228 ***
## Handtime  0.054947  1.056485  0.009870  5.567 2.59e-08 ***
## Logcat   -0.183373  0.832458  0.050982 -3.597 0.000322 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##          exp(coef) exp(-coef) lower .95 upper .95
## Towd        1.0078     0.9923    1.0038    1.0118
## Length      0.9637     1.0377    0.9450    0.9828
## Handtime    1.0565     0.9465    1.0362    1.0771
## Logcat      0.8325     1.2013    0.7533    0.9199
## 
## Concordance= 0.683  (se = 0.02 )
## Rsquare= 0.25   (max possible= 1 )
## Likelihood ratio test= 84.5  on 4 df,   p=0
## Wald test            = 90.71  on 4 df,   p=0
## Score (logrank) test = 94.51  on 4 df,   p=0

Notes:

Model diagnostics - Assessing overall model fit

How do we know if the model fits well?

Generalized residuals

# Analysis of halibut data, continued

# 1. Compute cummulative hazard Lambda_0:
Lambda0 = basehaz(fitb, centered=F)
Lambda0[1:3,]
##        hazard time
## 1 0.009323417  1.1
## 2 0.018739005  1.2
## 3 0.057162782  1.3
# 2. Compute Lambda_0(X_i) for all i:
predLambda0 = function(Lambda0, t)
{
    u = length(Lambda0$time[Lambda0$time0, Lambda0$hazard[u], 0) )
}
n = length(halibut$Time)
L0 = rep(NA, n)   # for storing Lambda_0(X_i)
for(i in 1:n) L0[i] = predLambda0(Lambda0, halibut$Time[i])
# 3. Compute exp(beta Z_i)
ebetaZ =  exp(as.matrix(halibut[,c(4,6,7,8)]) %*% fitb$coef)
# 4. Compute e_i for all i:
e = L0 * ebetaZ
# 5. Run KM for (e, delta)
resfit = survfit(Surv(e,halibut$Death) ~ 1)

# 6.1 Plot log survival function against time
plot(resfit$time, log(resfit$surv), type="l")

plot of chunk ph-generalized-res

# 6.2 Plot log-log surv against log time
plot(log(resfit$time), log(-log(resfit$surv)), type="l", xlab = "log time", ylab = "log-log Survival", main = "Halibut: Cox-Snell residuals")

plot of chunk ph-generalized-res

Martingale residuals

# Halibut data, continued

# Calculates martingale, deviance, score or Schoenfeld residuals for a Cox proportional hazards model.
martres = residuals(fitb, type="martingale")  # Martingale residuals
martres[1:3]
##          1          2          3 
## -0.2384461 -0.1084664 -0.1073743
boxplot(martres ~ halibut$Towd, xlab="Towing duration", ylab="Martingale Residuals", boxwex=0.4)
abline(h=0, lty=2)

plot of chunk unnamed-chunk-16

plot(halibut$Length, martres, xlab="Halibut Length", ylab="Martingale Residuals", col=2)
abline(h=0, lty=2)

plot of chunk unnamed-chunk-16

plot(halibut$Handtime, martres, xlab="Handling Time", ylab="Martingale Residuals", col=3)
abline(h=0, lty=2)

plot of chunk unnamed-chunk-16

plot(halibut$Logcat, martres, xlab="Log Catch Size", ylab="Martingale Residuals", col=4)
abline(h=0, lty=2)

plot of chunk unnamed-chunk-16

plot(fitb$linear.predictors, martres, xlab="Linear Predictor", ylab="Martingale Residuals", col=6)
abline(h=0, lty=2)

plot of chunk unnamed-chunk-16

Deviance residuals

# Halibut data, continued
devres = residuals(fitb, type="deviance")  # Deviance residuals
devres[1:3]
##          1          2          3 
## -0.2217595 -0.1047756 -0.1037553
boxplot(devres ~ halibut$Towd, xlab="Towing duration", ylab="Deviance Residuals", boxwex=0.4)
abline(h=0, lty=2)

plot of chunk unnamed-chunk-17

plot(halibut$Length, devres, xlab="Halibut Length", ylab="Deviance Residuals", col=2)
abline(h=0, lty=2)

plot of chunk unnamed-chunk-17

plot(halibut$Handtime, devres, xlab="Handling Time", ylab="Deviance Residuals", col=3)
abline(h=0, lty=2)

plot of chunk unnamed-chunk-17

plot(halibut$Logcat, devres, xlab="Log Catch Size", ylab="Deviance Residuals", col=4)
abline(h=0, lty=2)

plot of chunk unnamed-chunk-17

plot(fitb$linear.predictors, devres, xlab="Linear Predictor", ylab="Deviance Residuals", col=6)
abline(h=0, lty=2)

plot of chunk unnamed-chunk-17

Schoenfeld residuals

# Halibut data, continued
schoeres = residuals(fitb, type="schoenfeld")  # Schoenfeld residuals
schoeres[1:3,]
##         Towd     Length Handtime     Logcat
## 1.1 13.59661 -10.584914 4.165367 -0.7233896
## 1.2 13.73103  -6.689555 3.206545 -0.7305410
## 1.3 14.00863   1.193347 2.279487 -0.7453105
evtime = sort(halibut$Time[halibut$Death==1]) # event times: 1.1,1.2,1.3 etc
plot(log(evtime), schoeres[,"Towd"], xlab="Log Survival time", ylab="Schoenfeld Residuals, Towing Duration")
abline(h=0, lty=2)

plot of chunk unnamed-chunk-18

plot(log(evtime), schoeres[,"Length"], xlab="Log Survival time", ylab="Schoenfeld Residuals, Fish Length", col=2)
abline(h=0, lty=2)

plot of chunk unnamed-chunk-18

plot(log(evtime), schoeres[,"Handtime"], xlab="Log Survival time", ylab="Schoenfeld Res, Handling Time", col=3)
abline(h=0, lty=2)

plot of chunk unnamed-chunk-18

plot(log(evtime), schoeres[,"Logcat"], xlab="Log Survival time", ylab="Schoenfeld Res, Log Catch Size", col=4)
abline(h=0, lty=2)