Proportional Hazards

We will explore the relationship between survival and explanatory variables by modeling. In this class, we consider a broad class of regression models: Proportional Hazards (PH) models. \[ \log\lambda(t;{\bf Z})=\log\lambda_0(t)+{\boldsymbol\beta}{\bf Z} \] Suppose \(Z = 1\) for treated subjects and \(Z = 0\) for untreated subjects. Then this model says that the hazard is increased by a factor of \(e^{\beta}\) for treated subjects versus untreated subjects (\(e^{\beta}\) might be < 1).

This group of PH models divides further into:

library(survival)
library(KMsurv)

Cox Proportional Hazards model

\[ \lambda(t;{\bf Z})=\lambda_0(t)\exp\{{\boldsymbol\beta}{\bf Z}\} \]

Why do we call it proportional hazards? Think of the first example, where \(Z = 1\) for treated and \(Z = 0\) for control. Then if we think of \(\lambda_1(t)\) as the hazard rate for the treated group, and \(\lambda_0(t)\) as the hazard for control, then we can write: \[ \lambda_1(t) = \lambda(t;Z=1)=\lambda_0(t)\exp\{\beta Z\} = \lambda_0(t)\exp\{\beta\} \] This implies that the ratio of the two hazards is a constant, \(\phi\), which does NOT depend on time, t. In other words, the hazards of the two groups remain proportional over time. \[ \phi = \frac{\lambda_1(t)}{\lambda_0(t)} = e^{\beta} \] \(\phi\) is referred to as the hazard ratio.

Baseline Hazard Function

In the example of comparing two treatment groups, \(\lambda_0(t)\) is the hazard rate for the control group.

In general, \(\lambda_0(t)\) is called the baseline hazard function, and reflects the underlying hazard for subjects with all covariates \(Z_1,\ldots,Z_p\) equal to 0 (i.e., the 'reference group').

The general form is: \[ \lambda(t;{\bf Z})=\lambda_0(t)\exp\{\beta_1Z_1+\ldots+\beta_pZ_p\} \] So when we substitute all of the \(Z_j\) equal to 0, we get \(\lambda(t;{\bf Z})=\lambda_0(t)\).

In the general case, we think of the i-th individual having a set of covariates \({\bf Z}_i = (Z_{1i}, Z_{2i},\ldots, Z_{pi})\), and we model their hazard rate as some multiple of the baseline hazard rate: \[ \lambda(t;{\bf Z}_i)=\lambda_0(t)\exp\{\beta_1Z_{1i}+\ldots+\beta_pZ_{pi}\} \] This means we can write the log of the hazard ratio for the i-th individual to the reference group as: \[ \log\frac{\lambda(t;{\bf Z}_i)}{\lambda_0(t)}=\beta_1Z_{1i}+\ldots+\beta_pZ_{pi} \] The Cox Proportional Hazards model is a linear model for the log of the hazard ratio.

One of the biggest advantages of the framework of the Cox PH model is that we can estimate the parameters \({\boldsymbol\beta}\) which reflect the effects of treatment and other covariates without having to make any assumptions about the form of \(\lambda_0(t)\).

In other words, we don???t have to assume that \(\lambda_0(t)\) follows an exponential model, or a Weibull model, or any other particular parametric model. That???s what makes the model semi-parametric.

Questions:

  1. Why don't we just model the hazard ratio, \(\phi = \frac{\lambda_i(t)}{\lambda_0(t)}\), directly as a linear function of the covariates \({\bf Z}\)?
  2. Why doesn't the model have an intercept?

Parameters of interest, \({\boldsymbol\beta}\), are estimated via partial likelihood. \(\lambda_0(\cdot)\) is treated as a nuisance parameter.

leuk = read.table("http://www.karlin.mff.cuni.cz/~pesta/prednasky/NMFM404/Data/leuk.dat", sep=" ", head=T, skip=7) # read data
head(leuk)
##   weeks remiss trtmt
## 1     6      0     1
## 2     6      1     1
## 3     6      1     1
## 4     6      1     1
## 5     7      1     1
## 6     9      0     1
# Cox PH Model
leuk.ph = coxph(Surv(weeks,remiss) ~ trtmt, data=leuk)
# Note: default = Efron method for handling ties
summary(leuk.ph)
## Call:
## coxph(formula = Surv(weeks, remiss) ~ trtmt, data = leuk)
## 
##   n= 42, number of events= 30 
## 
##          coef exp(coef) se(coef)      z Pr(>|z|)    
## trtmt -1.5721    0.2076   0.4124 -3.812 0.000138 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##       exp(coef) exp(-coef) lower .95 upper .95
## trtmt    0.2076      4.817   0.09251    0.4659
## 
## Concordance= 0.69  (se = 0.053 )
## Rsquare= 0.322   (max possible= 0.988 )
## Likelihood ratio test= 16.35  on 1 df,   p=5.261e-05
## Wald test            = 14.53  on 1 df,   p=0.0001378
## Score (logrank) test = 17.25  on 1 df,   p=3.283e-05
# Breslow handling of ties
leuk.phb = coxph(Surv(weeks, remiss) ~ trtmt, data=leuk, method="breslow")
summary(leuk.phb)
## Call:
## coxph(formula = Surv(weeks, remiss) ~ trtmt, data = leuk, method = "breslow")
## 
##   n= 42, number of events= 30 
## 
##          coef exp(coef) se(coef)      z Pr(>|z|)    
## trtmt -1.5092    0.2211   0.4096 -3.685 0.000229 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##       exp(coef) exp(-coef) lower .95 upper .95
## trtmt    0.2211      4.523   0.09907    0.4934
## 
## Concordance= 0.69  (se = 0.053 )
## Rsquare= 0.304   (max possible= 0.989 )
## Likelihood ratio test= 15.21  on 1 df,   p=9.615e-05
## Wald test            = 13.58  on 1 df,   p=0.0002288
## Score (logrank) test = 15.93  on 1 df,   p=6.571e-05
# Exact handling of ties
leuk.phe = coxph(Surv(weeks, remiss) ~ trtmt, data=leuk, method="exact")
summary(leuk.phe)
## Call:
## coxph(formula = Surv(weeks, remiss) ~ trtmt, data = leuk, method = "exact")
## 
##   n= 42, number of events= 30 
## 
##          coef exp(coef) se(coef)      z Pr(>|z|)    
## trtmt -1.6282    0.1963   0.4331 -3.759  0.00017 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##       exp(coef) exp(-coef) lower .95 upper .95
## trtmt    0.1963      5.095   0.08398    0.4587
## 
## Rsquare= 0.321   (max possible= 0.98 )
## Likelihood ratio test= 16.25  on 1 df,   p=5.544e-05
## Wald test            = 14.13  on 1 df,   p=0.0001704
## Score (logrank) test = 16.79  on 1 df,   p=4.169e-05

Adjustments for ties

The proportional hazards model assumes a continuous hazard ??? ties are not possible. There are four proposed modifications to the likelihood to adjust for ties.

Bottom Line: Implications of Ties (See Allison (1995), p.127-137)

Confidence intervals and hypothesis tests

Many software packages provide estimates of \({\boldsymbol\beta}\), but the hazard ratio \(HR=\exp\{{\boldsymbol\beta}\}\) is usually the parameter of interest.

For each covariate of interest, the null hypothesis is \[ H_0:\,HR_j=1\,\Leftrightarrow\,\beta_j=0 \] Wald test is used. For nested models, a likelihood ratio test is constructed.

MAC prevention clinical trial data

ACTG 196 was a randomized clinical trial to study the effects of combination regimens on prevention of MAC (mycobacterium avium complex), one of the most common opportunistic infections in AIDS patients.

The treatment regimens were:

mac = read.csv("http://www.karlin.mff.cuni.cz/~pesta/prednasky/NMFM404/Data/mac.csv", head=F, col.names =
    c("patid", "age", "agecat", "sex", "karnof", "ivdrug",
    "antiret", "cd4", "cd4cat", "ctg", "dthstat", "dthtime",
    "macstat", "mactime", "swdrstat", "swdrtime", "rif",
    "clari", "toxstat", "toxtime"))
# mactime = time to mac
# macstat = event indicator
head(mac)     # Check out a few data rows:
##   patid age agecat sex karnof ivdrug antiret cd4 cd4cat ctg dthstat
## 1     1  42      1   0     90      0       1   8      0   1       0
## 2     2  33      0   0     90      0       0  30      1   1       0
## 3     3  39      1   0    100      0       1  80      1   1       1
## 4     4  35      0   0     80      0       0  58      1   1       0
## 5     5  42      1   0     90      0       0  59      1   1       0
## 6     6  45      1   0     90      0       0  18      0   1       1
##   dthtime macstat mactime swdrstat swdrtime rif clari toxstat toxtime
## 1     623       1     560        1      106   1     0       0     623
## 2     651       0     651        1       86   1     0       0     651
## 3     464       0      26        0       26   1     0       0      26
## 4     622       0     622        1      111   0     1       0     501
## 5     643       0     643        1      119   0     1       0     643
## 6     216       0     171        0      171   0     1       1     171
# Treatment arms:
#  rif + clari (N=389)
#  clari (N=397)
#  rif (N=391)
table(mac$clari, mac$rif)
##    
##       0   1
##   0 389 391
##   1 397   0
# Fit and plot KM survival curves
fit.km = survfit(Surv(mactime, macstat) ~ clari + rif, data=mac)

plot(fit.km, mark.time=F, lty=1:3, xlab="Days", ylab="MAC-Free Survival")
legend(lty=1:3, x=100, y=.4, legend=c("Rif + Clari", "Rifabutin", "Clarithromycin"), cex=0.8)

plot of chunk ph-mac-km

Model 1

# Fit Cox PH model
fit.mac1 = coxph(Surv(mactime,macstat) ~ karnof + clari + rif, data=mac)
summary(fit.mac1)
## Call:
## coxph(formula = Surv(mactime, macstat) ~ karnof + clari + rif, 
##     data = mac)
## 
##   n= 1177, number of events= 121 
## 
##            coef exp(coef) se(coef)      z Pr(>|z|)    
## karnof -0.04485   0.95614  0.01064 -4.217 2.48e-05 ***
## clari   0.27557   1.31728  0.25801  1.068 0.285509    
## rif     0.87197   2.39161  0.23694  3.680 0.000233 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##        exp(coef) exp(-coef) lower .95 upper .95
## karnof    0.9561     1.0459    0.9364    0.9763
## clari     1.3173     0.7591    0.7944    2.1842
## rif       2.3916     0.4181    1.5032    3.8051
## 
## Concordance= 0.649  (se = 0.028 )
## Rsquare= 0.027   (max possible= 0.73 )
## Likelihood ratio test= 32.02  on 3 df,   p=5.193e-07
## Wald test            = 32.29  on 3 df,   p=4.548e-07
## Score (logrank) test = 33.16  on 3 df,   p=2.977e-07

Model 2

# Fit Cox PH model adding baseline CD4 variable
fit.mac2 = coxph(Surv(mactime,macstat) ~ karnof + clari + rif + cd4, data=mac)
summary(fit.mac2)
## Call:
## coxph(formula = Surv(mactime, macstat) ~ karnof + clari + rif + 
##     cd4, data = mac)
## 
##   n= 1177, number of events= 121 
## 
##             coef exp(coef)  se(coef)      z Pr(>|z|)    
## karnof -0.036874  0.963798  0.010665 -3.457 0.000546 ***
## clari   0.252345  1.287041  0.258337  0.977 0.328664    
## rif     0.879749  2.410294  0.237092  3.711 0.000207 ***
## cd4    -0.018360  0.981807  0.003684 -4.984 6.23e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##        exp(coef) exp(-coef) lower .95 upper .95
## karnof    0.9638     1.0376    0.9439    0.9842
## clari     1.2870     0.7770    0.7757    2.1354
## rif       2.4103     0.4149    1.5145    3.8360
## cd4       0.9818     1.0185    0.9747    0.9889
## 
## Concordance= 0.716  (se = 0.028 )
## Rsquare= 0.053   (max possible= 0.73 )
## Likelihood ratio test= 63.77  on 4 df,   p=4.682e-13
## Wald test            = 55.59  on 4 df,   p=2.449e-11
## Score (logrank) test = 56.22  on 4 df,   p=1.806e-11

We can also compute the hazard ratio ourselves, by exponentiating the \(\beta\) coefficients: \[ HR_{cd4} = \exp\{???0.01835\} = 0.98 \] Why is this HR so close to 1, and yet still highly significant? What is the interpretation of this HR?

The likelihood ratio test for the effect of CD4 is twice the difference in minus log-likelihoods between the two models (on 1 df): \[ ??^2{LR} = 2???(63.8???32.0)=31.8 \] How does this test statistic compare to the Wald \(\chi^2\) test?

In the mac study, there were three treatment arms (rif, clari, and the rif+clari combination (for rif=clari=0). The combination therapy is the reference group, so the coefficients of rif and clari are in comparison to the combination group.

How do we compare the Rifa arm against the Clari arm? That is equivalent to testing \[ H_0:\,\beta_{clari}=\beta_{rifa} \] For that, we need the matrix \(\mathsf{var}\widehat{\boldsymbol\beta}\).

names(fit.mac2)   # check components of fit object
##  [1] "coefficients"      "var"               "loglik"           
##  [4] "score"             "iter"              "linear.predictors"
##  [7] "residuals"         "means"             "concordance"      
## [10] "method"            "n"                 "nevent"           
## [13] "terms"             "assign"            "wald.test"        
## [16] "y"                 "formula"           "call"
beta_hat = fit.mac2$coef
Var_beta = fit.mac2$var   # Var(beta_hat)
A = c(0,1,-1,0)   # contrast vector
beta_dif = t(A) %*% beta_hat     # beta_clari - beta_rif
var_dif = t(A) %*% Var_beta %*% A   # Var(beta_clari-beta_rif)
se_dif = sqrt(var_dif)        # SE(beta_clari-beta_rif)
t_dif = beta_dif/se_dif
t_dif     # Wald test t-statistic, df=n-p
##           [,1]
## [1,] -2.959871
2* pt( -abs(t_dif), df=1177-4) # p-value for difference in arms
##             [,1]
## [1,] 0.003139506

Testing Model 2 for overall treatment effect Two ways:

  1. Wald test. \(\beta_{rif} = 0\,\&\,\beta_{clari} = 0\). Null distribution is Chi-square with df=2 (why?).
    A = matrix(0, ncol=4, nrow=2)
    A[1,2] = A[2,3] = 1
    beta_trt = A %% beta_hat
    var_trt = A %% Var_beta %% t(A)
    (Wald_trt = t(beta_trt) %% solve(var_trt) %*% beta_trt)
    
    ##          [,1]
    ## [1,] 17.00252
    
    1-pchisq(Wald_trt, df=2)  # p-value for Wald test for treatment
    
    ##              [,1]
    ## [1,] 0.0002032125
    
  2. Likelihood ratio test. Fit model with and without rif, clari and compare log-likelihood, against Chi-square with df=2.
    fit.mac3 = update(fit.mac2, .~. -rif -clari)
    anova(fit.mac3, fit.mac2, test="Chisq")
    
    ## Analysis of Deviance Table
    ##  Cox model: response is  Surv(mactime, macstat)
    ##  Model 1: ~ karnof + cd4
    ##  Model 2: ~ karnof + clari + rif + cd4
    ##    loglik  Chisq Df P(>|Chi|)
    ## 1 -747.07
    ## 2 -738.62 16.914 2 0.0002125 *** ## --- ## Signif. codes: 0 '**' 0.001 '' 0.01 '' 0.05 '.' 0.1 ' ' 1

Predicted survival

For the i-th patient with covariates \({\bf Z}_i\), we have: \[ S_i(t)=[S_0(t)]^{\exp\{{\boldsymbol\beta}{\bf Z}_i\}} \] Say we are interested in the survival pattern for single males in the nursing home study. Based on the previous formula, if we had an estimate for the survival function in the reference group, i.e., \(\widehat{S}_0(t)\), we could get estimates of the survival function for any set of covariates \({\bf Z}_i\).

How can we estimate the survival function, \(S_0(t)\)? We could use the KM estimator, but there are a few disadvantages of that approach:

Instead, we will use a baseline hazard estimator which takes advantage of the proportional hazards assumption to get a smoother estimate. \[ \widehat{S}_i(t)=[\widehat{S}_0(t)]^{\exp\{\widehat{\boldsymbol\beta}{\bf Z}_i\}} \] Using the above formula, we substitute \(\widehat{\boldsymbol\beta}\) based on fitting the Cox PH model, and calculate \(\widehat{S}_0(t)\) by one of the following approaches:

Consider the Nursing Home study. We fit the model \[ \lambda_(t) = \lambda_0(t) \exp\{\beta_1[\mbox{married}] + \beta_2[\mbox{health status}]\} \] Let's predict the survival for single individuals with health status = 5 (worst), and for those with health status = 4 (second worst).

nurshome = read.table("http://www.karlin.mff.cuni.cz/~pesta/prednasky/NMFM404/Data/NursingHome.dat", head=F, skip=14, col.names = c("los", "age", "rx", "gender", "married", "health", "censor"))
nurshome$discharged = 1 - nurshome$censor # event indicator
head(nurshome)
##    los age rx gender married health censor discharged
## 1   37  86  1      0       0      2      0          1
## 2   61  77  1      0       0      4      0          1
## 3 1084  75  1      0       0      3      1          0
## 4 1092  77  1      0       1      2      1          0
## 5   23  86  1      0       0      4      0          1
## 6 1091  71  1      1       0      3      1          0
table(nurshome$health)
## 
##   2   3   4   5 
## 343 576 513 169
# Fit Cox PH model
fit.nurs = coxph(Surv(los,discharged) ~ married + health, data=nurshome)
# Predict survival for single patients with health = 5
newdat = data.frame(married=0, health=5)
newdat    # data frame with same predictors as fit.nurs
##   married health
## 1       0      5
ps5 = survfit(fit.nurs, newdata=newdat)    # predicted survival
# Compare with Kaplan-Meier
nursh5 = nurshome[nurshome$health==5 & nurshome$married==0,]
fit.km5 = survfit(Surv(los, discharged) ~ 1, data = nursh5)
# Predict survival for single patients, health =4
newdat[1,] = c(0,4)
ps4 =  survfit(fit.nurs, newdata=newdat)
# Compare with Kaplan-Meier
nursh4 = nurshome[nurshome$health==4 & nurshome$married==0,]
fit.km4 = survfit(Surv(los, discharged) ~ 1, data = nursh4)

plot(ps5, xlab= "Length of stay in nursing home (days)", ylab = "Proportion not discharged", col=2, conf.int=F)
lines(fit.km5, mark.time=F, col=2)     # add line to existing plot
lines(ps4, xlab= "Length of stay in nursing home (days)", ylab = "Proportion not discharged", col=4, conf.int=F)
lines(fit.km4, mark.time=F, col=4)     # add line to existing plot

plot of chunk ph-nurs-predict

Continuing the Nursing Home example, let's get and plot the baseline survival \(S_0(t)\) and cumulative hazard \(\Lambda_0(t)\). They correspond to married=0, health=0.

newdat[1,] = c(0,0)
ps0 = survfit(fit.nurs, newdata=newdat)   # S_0 = ps0$surv
bh = basehaz(fit.nurs, centered=F)
Lambda0 = bh$hazard   # it???s *cummulative* hazard!
# same as Lambda0 = -log(ps0$surv)

plot(bh$time, Lambda0, type="l", xlab="Length of stay in nursing home (days)", ylab="Cummulative hazard", col=4)

plot of chunk ph-nurs-baseline

Model selection

Collett (Section 3.6) has an excellent discussion of various approaches for model selection. In practice, model selection proceeds through a combination of

Many advocate the approach of first doing a univariate analysis to 'screen' out potentially significant variables for consideration in the multivariate model (see Collett).

Moreover, typically univariate analysis is a part of a larger analysis, in order to identify the importance of each predictor taken in itself.

Atlantic halibut survival times data

One conservation measure suggested for trawl fishing is a minimum size limit for halibut (32 inches). However, this size limit would only be effective if captured fish below the limit survived until the time of their release. An experiment was conducted to evaluate the survival rates of halibut caught by trawls or longlines, and to assess other factors which might contribute to survival (duration of trawling, maximum depth fished, size of fish, and handling time).

halibut = read.csv("http://www.karlin.mff.cuni.cz/~pesta/prednasky/NMFM404/Data/halibut.csv", head=T)
dim(halibut)
## [1] 294   8
head(halibut)
##   No Time Death Towd Deldepth Length Handtime Logcat
## 1  1  209     1   30       13     41        8  6.992
## 2  2  209     1   30       13     44        8  6.992
## 3  3  209     1   30       13     47       10  6.992
## 4  4  209     1   30       13     34       10  6.992
## 5  5   38     1   30       13     40       11  6.992
## 6  6  209     1   30       13     42       11  6.992
# Time = Survival time, in minutes
# Death = Event indicator
# Length =  length of fish (cm)
# Handtime = handling time (min)
# Logcat = log total catch

### Univariate analysis
# Towing duration:
table(halibut$Towd)
## 
##  30 100 
## 107 187
kmtow = survfit(Surv(Time, Death) ~ Towd, data=halibut)
# Length of fish:
summary(halibut$Length)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   29.00   38.00   43.00   42.51   46.75   57.00
halibut$Lencat = (halibut$Length > 42) + 0
table(halibut$Lencat)
## 
##   0   1 
## 134 160
kmlen = survfit(Surv(Time, Death) ~ Lencat, data=halibut)
# Depth range of towing:
table(halibut$Deldepth)
## 
##  1  2  3  4  5  6  8 10 13 15 23 49 58 
##  6 17  8  6 45 46 40 68  8 12 32  3  3
summary(halibut$Deldepth)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   5.000   8.000   9.966  10.000  58.000
halibut$Depthcat = cut(halibut$Deldepth, breaks=c(0,5,9,11,60))
kmdep = survfit(Surv(Time, Death) ~ Depthcat, data=halibut)
# Handling time:
summary(halibut$Handtime)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    6.00   10.00   12.51   17.00   38.00
halibut$Handcat = cut(halibut$Handtime, breaks = c(0,6,10,17,40))
kmhan = survfit(Surv(Time, Death) ~ Handcat, data=halibut)
# Log total catch:
summary(halibut$Logcat)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.904   4.059   4.203   4.886   5.323   8.690
halibut$Catchcat = cut(halibut$Logcat, breaks=c(0,4.06,4.89,5.33,9))
kmlgc = survfit(Surv(Time, Death) ~ Catchcat, data = halibut)

par(mfrow=c(3,2))
plot(kmtow, lty=1:2, col=2:3, xlab="Minutes", ylab="Survival", main="Towing Duration")
plot(kmlen, lty=1:2, col=2:3, xlab="Minutes", ylab="Survival", main="Fish Length")
plot(kmdep, lty=1:4, col=1:4, xlab="Minutes", ylab="Survival", main="Depth of Towing")
plot(kmhan, lty=1:4, col=1:4, xlab="Minutes", ylab="Survival", main="Handling Time")
plot(kmlgc, lty=1:4, col=1:4, xlab="Minutes", ylab="Survival", main="Total Catch")
par(mfrow=c(1,1))

plot of chunk ph-halibut

Model selection approach

  1. Fit a univariate model for each covariate, and identify the predictors significant at some level \(p_1\), say 0.20 (Hosmer and Lemeshow recommend \(p_1 = 0.25\)).
  2. Fit a multivariate model with all significant univariate predictors, and use backward selection to eliminate nonsignificant variables at some level \(p_2\), say 0.10.
  3. Starting with final step 2. model, consider each of the non-significant variables from step 1. using forward selection, with significance level \(p_3\), say 0.10.
  4. Do final pruning of main-effects model (omit variables that are non-significant, add any that are significant), using stepwise regression with significance level \(p_4\). At this stage, you may also consider adding interactions between any of the main effects currently in the model, under the hierarchical principle.

Collett recommends using a likelihood ratio test for all variable inclusion/exclusion decisions.

Each step uses a 'greedy' approach:

library(MASS)
args(stepAIC)
## function (object, scope, scale = 0, direction = c("both", "backward", 
##     "forward"), trace = 1, keep = NULL, steps = 1000, use.start = FALSE, 
##     k = 2, ...) 
## NULL
# Model selection for Halibut data
fit = coxph(Surv(Time, Death) ~ Towd + Length + Deldepth + Handtime + Logcat, data=halibut)
# Backward selection using AIC
fitb = stepAIC(fit, direction="backward", k=2)
## Start:  AIC=2520.15
## Surv(Time, Death) ~ Towd + Length + Deldepth + Handtime + Logcat
## 
##            Df    AIC
## - Deldepth  1 2519.7
## <none>        2520.2
## - Towd      1 2531.7
## - Logcat    1 2532.8
## - Length    1 2533.0
## - Handtime  1 2544.9
## 
## Step:  AIC=2519.7
## Surv(Time, Death) ~ Towd + Length + Handtime + Logcat
## 
##            Df    AIC
## <none>        2519.7
## - Logcat    1 2530.8
## - Length    1 2531.2
## - Towd      1 2532.7
## - Handtime  1 2547.7
summary(fitb)
## Call:
## coxph(formula = Surv(Time, Death) ~ Towd + Length + Handtime + 
##     Logcat, data = halibut)
## 
##   n= 294, number of events= 273 
## 
##               coef exp(coef)  se(coef)      z Pr(>|z|)    
## Towd      0.007744  1.007774  0.002017  3.839 0.000123 ***
## Length   -0.036960  0.963715  0.010028 -3.686 0.000228 ***
## Handtime  0.054947  1.056485  0.009870  5.567 2.59e-08 ***
## Logcat   -0.183373  0.832458  0.050982 -3.597 0.000322 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##          exp(coef) exp(-coef) lower .95 upper .95
## Towd        1.0078     0.9923    1.0038    1.0118
## Length      0.9637     1.0377    0.9450    0.9828
## Handtime    1.0565     0.9465    1.0362    1.0771
## Logcat      0.8325     1.2013    0.7533    0.9199
## 
## Concordance= 0.683  (se = 0.02 )
## Rsquare= 0.25   (max possible= 1 )
## Likelihood ratio test= 84.5  on 4 df,   p=0
## Wald test            = 90.71  on 4 df,   p=0
## Score (logrank) test = 94.51  on 4 df,   p=0
# Forward selection using AIC
fit0 = coxph(Surv(Time, Death) ~ 1, data=halibut)
  # (starting model)
fitf = stepAIC(fit0, scope=formula(fit), direction="forward", k=2)
## Warning in is.na(fit$coefficients): is.na() applied to non-(list or
## vector) of type 'NULL'
## Start:  AIC=2596.2
## Surv(Time, Death) ~ 1
## Warning in is.na(fit$coefficients): is.na() applied to non-(list or
## vector) of type 'NULL'
## Warning in is.na(fit$coefficients): is.na() applied to non-(list or
## vector) of type 'NULL'
##            Df    AIC
## + Handtime  1 2556.8
## + Towd      1 2568.0
## + Length    1 2586.1
## + Deldepth  1 2594.5
## <none>        2596.2
## + Logcat    1 2596.8
## 
## Step:  AIC=2556.85
## Surv(Time, Death) ~ Handtime
## 
##            Df    AIC
## + Logcat    1 2540.3
## + Towd      1 2543.6
## + Length    1 2549.9
## <none>        2556.8
## + Deldepth  1 2558.8
## 
## Step:  AIC=2540.3
## Surv(Time, Death) ~ Handtime + Logcat
## 
##            Df    AIC
## + Towd      1 2531.2
## + Length    1 2532.7
## <none>        2540.3
## + Deldepth  1 2541.4
## 
## Step:  AIC=2531.23
## Surv(Time, Death) ~ Handtime + Logcat + Towd
## 
##            Df    AIC
## + Length    1 2519.7
## <none>        2531.2
## + Deldepth  1 2533.0
## 
## Step:  AIC=2519.7
## Surv(Time, Death) ~ Handtime + Logcat + Towd + Length
## 
##            Df    AIC
## <none>        2519.7
## + Deldepth  1 2520.2
summary(fitf)
## Call:
## coxph(formula = Surv(Time, Death) ~ Handtime + Logcat + Towd + 
##     Length, data = halibut)
## 
##   n= 294, number of events= 273 
## 
##               coef exp(coef)  se(coef)      z Pr(>|z|)    
## Handtime  0.054947  1.056485  0.009870  5.567 2.59e-08 ***
## Logcat   -0.183373  0.832458  0.050982 -3.597 0.000322 ***
## Towd      0.007744  1.007774  0.002017  3.839 0.000123 ***
## Length   -0.036960  0.963715  0.010028 -3.686 0.000228 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##          exp(coef) exp(-coef) lower .95 upper .95
## Handtime    1.0565     0.9465    1.0362    1.0771
## Logcat      0.8325     1.2013    0.7533    0.9199
## Towd        1.0078     0.9923    1.0038    1.0118
## Length      0.9637     1.0377    0.9450    0.9828
## 
## Concordance= 0.683  (se = 0.02 )
## Rsquare= 0.25   (max possible= 1 )
## Likelihood ratio test= 84.5  on 4 df,   p=0
## Wald test            = 90.71  on 4 df,   p=0
## Score (logrank) test = 94.51  on 4 df,   p=0
# same final model as backward selection!
# Stepwise model selection (backward and forward)
fits = stepAIC(fit, direction="both", k=2)
## Start:  AIC=2520.15
## Surv(Time, Death) ~ Towd + Length + Deldepth + Handtime + Logcat
## 
##            Df    AIC
## - Deldepth  1 2519.7
## <none>        2520.2
## - Towd      1 2531.7
## - Logcat    1 2532.8
## - Length    1 2533.0
## - Handtime  1 2544.9
## 
## Step:  AIC=2519.7
## Surv(Time, Death) ~ Towd + Length + Handtime + Logcat
## 
##            Df    AIC
## <none>        2519.7
## + Deldepth  1 2520.2
## - Logcat    1 2530.8
## - Length    1 2531.2
## - Towd      1 2532.7
## - Handtime  1 2547.7
summary(fits)
## Call:
## coxph(formula = Surv(Time, Death) ~ Towd + Length + Handtime + 
##     Logcat, data = halibut)
## 
##   n= 294, number of events= 273 
## 
##               coef exp(coef)  se(coef)      z Pr(>|z|)    
## Towd      0.007744  1.007774  0.002017  3.839 0.000123 ***
## Length   -0.036960  0.963715  0.010028 -3.686 0.000228 ***
## Handtime  0.054947  1.056485  0.009870  5.567 2.59e-08 ***
## Logcat   -0.183373  0.832458  0.050982 -3.597 0.000322 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##          exp(coef) exp(-coef) lower .95 upper .95
## Towd        1.0078     0.9923    1.0038    1.0118
## Length      0.9637     1.0377    0.9450    0.9828
## Handtime    1.0565     0.9465    1.0362    1.0771
## Logcat      0.8325     1.2013    0.7533    0.9199
## 
## Concordance= 0.683  (se = 0.02 )
## Rsquare= 0.25   (max possible= 1 )
## Likelihood ratio test= 84.5  on 4 df,   p=0
## Wald test            = 90.71  on 4 df,   p=0
## Score (logrank) test = 94.51  on 4 df,   p=0
# Stepwise selection with alpha=k=3
fitk3 = stepAIC(fit, direction="both", k=3)
## Start:  AIC=2525.15
## Surv(Time, Death) ~ Towd + Length + Deldepth + Handtime + Logcat
## 
##            Df    AIC
## - Deldepth  1 2523.7
## <none>        2525.2
## - Towd      1 2535.7
## - Logcat    1 2536.8
## - Length    1 2537.0
## - Handtime  1 2548.9
## 
## Step:  AIC=2523.7
## Surv(Time, Death) ~ Towd + Length + Handtime + Logcat
## 
##            Df    AIC
## <none>        2523.7
## + Deldepth  1 2525.2
## - Logcat    1 2533.8
## - Length    1 2534.2
## - Towd      1 2535.7
## - Handtime  1 2550.7
summary(fitk3)
## Call:
## coxph(formula = Surv(Time, Death) ~ Towd + Length + Handtime + 
##     Logcat, data = halibut)
## 
##   n= 294, number of events= 273 
## 
##               coef exp(coef)  se(coef)      z Pr(>|z|)    
## Towd      0.007744  1.007774  0.002017  3.839 0.000123 ***
## Length   -0.036960  0.963715  0.010028 -3.686 0.000228 ***
## Handtime  0.054947  1.056485  0.009870  5.567 2.59e-08 ***
## Logcat   -0.183373  0.832458  0.050982 -3.597 0.000322 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##          exp(coef) exp(-coef) lower .95 upper .95
## Towd        1.0078     0.9923    1.0038    1.0118
## Length      0.9637     1.0377    0.9450    0.9828
## Handtime    1.0565     0.9465    1.0362    1.0771
## Logcat      0.8325     1.2013    0.7533    0.9199
## 
## Concordance= 0.683  (se = 0.02 )
## Rsquare= 0.25   (max possible= 1 )
## Likelihood ratio test= 84.5  on 4 df,   p=0
## Wald test            = 90.71  on 4 df,   p=0
## Score (logrank) test = 94.51  on 4 df,   p=0

Notes:

Model diagnostics - Assessing overall model fit

How do we know if the model fits well?

Generalized residuals

# Analysis of halibut data, continued

# 1. Compute cummulative hazard Lambda_0:
Lambda0 = basehaz(fitb, centered=F)
Lambda0[1:3,]
##        hazard time
## 1 0.009323417  1.1
## 2 0.018739005  1.2
## 3 0.057162782  1.3
# 2. Compute Lambda_0(X_i) for all i:
predLambda0 = function(Lambda0, t)
{
    u = length(Lambda0$time[Lambda0$time0, Lambda0$hazard[u], 0) )
}
n = length(halibut$Time)
L0 = rep(NA, n)   # for storing Lambda_0(X_i)
for(i in 1:n) L0[i] = predLambda0(Lambda0, halibut$Time[i])
# 3. Compute exp(beta Z_i)
ebetaZ =  exp(as.matrix(halibut[,c(4,6,7,8)]) %*% fitb$coef)
# 4. Compute e_i for all i:
e = L0 * ebetaZ
# 5. Run KM for (e, delta)
resfit = survfit(Surv(e,halibut$Death) ~ 1)

# 6.1 Plot log survival function against time
plot(resfit$time, log(resfit$surv), type="l")

plot of chunk ph-generalized-res

# 6.2 Plot log-log surv against log time
plot(log(resfit$time), log(-log(resfit$surv)), type="l", xlab = "log time", ylab = "log-log Survival", main = "Halibut: Cox-Snell residuals")

plot of chunk ph-generalized-res

Martingale residuals

# Halibut data, continued

# Calculates martingale, deviance, score or Schoenfeld residuals for a Cox proportional hazards model.
martres = residuals(fitb, type="martingale")  # Martingale residuals
martres[1:3]
##          1          2          3 
## -0.2384461 -0.1084664 -0.1073743
boxplot(martres ~ halibut$Towd, xlab="Towing duration", ylab="Martingale Residuals", boxwex=0.4)
abline(h=0, lty=2)

plot of chunk unnamed-chunk-16

plot(halibut$Length, martres, xlab="Halibut Length", ylab="Martingale Residuals", col=2)
abline(h=0, lty=2)

plot of chunk unnamed-chunk-16

plot(halibut$Handtime, martres, xlab="Handling Time", ylab="Martingale Residuals", col=3)
abline(h=0, lty=2)

plot of chunk unnamed-chunk-16

plot(halibut$Logcat, martres, xlab="Log Catch Size", ylab="Martingale Residuals", col=4)
abline(h=0, lty=2)

plot of chunk unnamed-chunk-16

plot(fitb$linear.predictors, martres, xlab="Linear Predictor", ylab="Martingale Residuals", col=6)
abline(h=0, lty=2)

plot of chunk unnamed-chunk-16

Deviance residuals

# Halibut data, continued
devres = residuals(fitb, type="deviance")  # Deviance residuals
devres[1:3]
##          1          2          3 
## -0.2217595 -0.1047756 -0.1037553
boxplot(devres ~ halibut$Towd, xlab="Towing duration", ylab="Deviance Residuals", boxwex=0.4)
abline(h=0, lty=2)

plot of chunk unnamed-chunk-17

plot(halibut$Length, devres, xlab="Halibut Length", ylab="Deviance Residuals", col=2)
abline(h=0, lty=2)

plot of chunk unnamed-chunk-17

plot(halibut$Handtime, devres, xlab="Handling Time", ylab="Deviance Residuals", col=3)
abline(h=0, lty=2)

plot of chunk unnamed-chunk-17

plot(halibut$Logcat, devres, xlab="Log Catch Size", ylab="Deviance Residuals", col=4)
abline(h=0, lty=2)

plot of chunk unnamed-chunk-17

plot(fitb$linear.predictors, devres, xlab="Linear Predictor", ylab="Deviance Residuals", col=6)
abline(h=0, lty=2)

plot of chunk unnamed-chunk-17

Schoenfeld residuals

# Halibut data, continued
schoeres = residuals(fitb, type="schoenfeld")  # Schoenfeld residuals
schoeres[1:3,]
##         Towd     Length Handtime     Logcat
## 1.1 13.59661 -10.584914 4.165367 -0.7233896
## 1.2 13.73103  -6.689555 3.206545 -0.7305410
## 1.3 14.00863   1.193347 2.279487 -0.7453105
evtime = sort(halibut$Time[halibut$Death==1]) # event times: 1.1,1.2,1.3 etc
plot(log(evtime), schoeres[,"Towd"], xlab="Log Survival time", ylab="Schoenfeld Residuals, Towing Duration")
abline(h=0, lty=2)

plot of chunk unnamed-chunk-18

plot(log(evtime), schoeres[,"Length"], xlab="Log Survival time", ylab="Schoenfeld Residuals, Fish Length", col=2)
abline(h=0, lty=2)

plot of chunk unnamed-chunk-18

plot(log(evtime), schoeres[,"Handtime"], xlab="Log Survival time", ylab="Schoenfeld Res, Handling Time", col=3)
abline(h=0, lty=2)

plot of chunk unnamed-chunk-18

plot(log(evtime), schoeres[,"Logcat"], xlab="Log Survival time", ylab="Schoenfeld Res, Log Catch Size", col=4)
abline(h=0, lty=2)

plot of chunk unnamed-chunk-18

Scaled (weighted) Schoenfeld residuals

# Halibut data, continued
scschres = residuals(fitb, type="scaledsch")  # Schoenfeld residuals
scschres[1:3,]
##           [,1]          [,2]      [,3]       [,4]
## 1.1 0.02160255 -0.3136602583 0.1220311 -0.8067589
## 1.2 0.02019807 -0.2117188889 0.1166763 -0.7854618
## 1.3 0.01532368 -0.0002137067 0.1327947 -0.7959172
evtime = sort(halibut$Time[halibut$Death==1]) # event times: 1.1,1.2,1.3 etc
plot(log(evtime), scschres[,1], xlab="Log Survival time", ylab="Scaled Sch Residuals, Towing Duration")
abline(h=0, lty=2)

plot of chunk unnamed-chunk-19

plot(log(evtime), scschres[,2], xlab="Log Survival time", ylab="Scaled Sch Residuals, Fish Length", col=2)
abline(h=0, lty=2)

plot of chunk unnamed-chunk-19

plot(log(evtime), scschres[,3], xlab="Log Survival time", ylab="Scaled Sch Res, Handling Time", col=3)
abline(h=0, lty=2)

plot of chunk unnamed-chunk-19

plot(log(evtime), scschres[,4], xlab="Log Survival time", ylab="Scaled Sch Res, Log Catch Size", col=4)
abline(h=0, lty=2)

plot of chunk unnamed-chunk-19

Using Residual plots to explore relationships

If you calculate martingale or deviance residuals without any covariates in the model and then plot them against covariates, you obtain a graphical impression of the relationship between the covariate and the hazard.

fit = coxph(Surv(Time, Death) ~ 1, data=halibut)
martres = residuals(fit, type="martingale")
plot(halibut$Deldepth, martres, xlab="Depth Range", ylab="Martingale Residuals")
lines(lowess(halibut$Deldepth, martres), col=2)

plot of chunk unnamed-chunk-20

plot(halibut$Length, martres, xlab="Fish Length", ylab= "Martingale Residuals")
lines(lowess(halibut$Length, martres), col=2)

plot of chunk unnamed-chunk-20

plot(halibut$Handtime, martres, xlab="Handling Time", ylab= "Martingale Residuals")
lines(lowess(halibut$Handtime, martres), col=2)

plot of chunk unnamed-chunk-20

plot(halibut$Logcat, martres, xlab="Log Catch Size", ylab= "Martingale Residuals")
lines(lowess(halibut$Logcat, martres), col=2)

plot of chunk unnamed-chunk-20

Assessing the PH assumption

There are several options for checking the assumption of pro- portional hazards:

  1. Graphical
  2. Tests of time-varying coefficient \(\beta=\beta(t)\)
  3. Overall goodness of fit tests I(b), II(b), III available in R via cox.zph.

How do we interpret the above? Kleinbaum (and other texts) suggest a strategy of assuming that PH holds unless there is very strong evidence to counter this assumption:

If PH doesn't exactly hold for a particular covariate but we fit the PH model anyway, then what we are getting is sort of an average HR, averaged over the event times.

In most cases, this is not such a bad estimate. Allison claims that too much emphasis is put on testing the PH assumption, and not enough to other important aspects of the model.

Example: Nursing Home - gender and marital status

# Fit KM curves for men and women separately
fitgen = survfit(Surv(los, discharged) ~ gender, data=nurshome)
names(fitgen)
##  [1] "n"         "time"      "n.risk"    "n.event"   "n.censor" 
##  [6] "surv"      "type"      "strata"    "std.err"   "upper"    
## [11] "lower"     "conf.type" "conf.int"  "call"
fitgen$time
##   [1]    0    1    2    3    4    5    6    7    8    9   10   11   12   13
##  [15]   14   15   16   17   18   19   20   21   22   23   24   25   26   27
##  [29]   28   29   30   31   32   33   34   35   36   37   38   39   40   41
##  [43]   42   43   44   45   46   47   48   49   50   51   52   53   54   55
##  [57]   56   57   59   60   61   62   63   64   66   67   68   69   70   71
##  [71]   72   74   75   76   77   78   80   81   82   83   84   85   86   88
##  [85]   89   90   91   92   93   94   95   96   97   98   99  100  101  102
##  [99]  104  105  106  107  108  109  110  111  113  114  115  116  118  119
## [113]  120  121  122  123  125  126  127  128  129  130  131  132  133  134
## [127]  135  137  139  140  141  142  143  144  146  147  148  149  150  152
## [141]  153  155  156  159  160  161  162  163  164  165  166  168  169  170
## [155]  172  174  175  176  178  180  182  184  185  187  189  190  191  192
## [169]  193  194  195  196  197  198  199  201  202  203  204  205  207  208
## [183]  209  211  214  215  217  218  221  222  223  224  225  226  227  230
## [197]  231  232  233  234  237  238  240  241  243  244  250  252  253  256
## [211]  258  259  261  262  263  265  268  269  270  271  273  274  276  277
## [225]  279  281  282  283  285  288  290  291  293  295  296  297  298  301
## [239]  302  305  306  307  309  310  311  312  314  315  317  318  322  326
## [253]  328  330  332  337  340  342  344  350  351  352  355  358  360  362
## [267]  363  364  365  366  367  369  370  372  373  374  375  376  377  378
## [281]  379  380  381  382  383  384  386  387  388  389  390  393  394  395
## [295]  396  397  401  402  403  405  407  408  409  410  411  413  415  416
## [309]  417  418  420  421  422  424  425  428  429  430  433  434  435  437
## [323]  439  442  444  446  449  450  451  452  453  456  457  461  462  463
## [337]  465  466  467  470  471  472  474  475  476  477  478  480  481  483
## [351]  487  488  489  492  494  495  497  498  500  505  506  507  508  512
## [365]  514  519  521  522  525  526  527  530  533  537  540  545  547  549
## [379]  553  554  556  561  562  563  569  570  572  576  579  584  585  589
## [393]  592  597  598  599  605  611  613  617  618  619  620  624  625  626
## [407]  627  628  634  635  639  642  645  648  653  654  660  661  664  665
## [421]  669  670  673  676  679  680  681  683  684  686  687  690  691  695
## [435]  696  697  698  702  707  708  709  710  711  717  719  721  722  724
## [449]  726  732  733  736  741  747  749  754  756  757  758  759  770  774
## [463]  775  780  783  788  790  792  803  810  811  812  817  818  823  832
## [477]  833  838  851  852  853  858  864  865  866  870  875  879  881  895
## [491]  896  900  903  906  909  911  913  921  924  929  936  941  942  945
## [505]  962  963  964  965  970  982  986  992  993  994  999 1005 1006 1008
## [519] 1010 1012 1022 1026 1028 1033 1034 1041 1043 1047 1053 1054 1055 1057
## [533] 1062 1074 1084 1088 1092    0    1    2    3    4    5    6    7    8
## [547]    9   10   11   12   13   14   15   16   17   18   19   20   21   22
## [561]   23   24   25   26   27   29   30   31   32   33   34   35   37   38
## [575]   39   40   41   42   43   44   45   46   47   48   49   50   51   54
## [589]   55   57   58   60   61   62   63   65   66   68   69   71   72   73
## [603]   76   77   80   81   82   83   84   85   86   88   89   90   91   92
## [617]   95   96   97  100  103  105  106  108  109  112  116  118  119  120
## [631]  121  122  123  126  131  133  138  143  144  146  147  148  151  152
## [645]  153  156  159  160  162  164  165  167  169  170  171  174  176  178
## [659]  182  185  189  190  193  195  202  205  206  207  212  216  227  234
## [673]  237  241  250  260  262  267  269  270  273  274  275  276  277  278
## [687]  279  290  294  297  302  303  306  310  326  335  365  370  374  375
## [701]  381  389  393  396  397  399  400  404  408  412  416  448  449  456
## [715]  463  465  467  476  487  492  495  501  541  548  561  565  568  574
## [729]  578  582  584  586  597  599  602  605  607  620  624  638  646  649
## [743]  651  663  669  674  709  725  738  740  743  752  756  773  797  816
## [757]  864  881  899  934  938  948  971  973  985 1060 1074 1091
fitgen$surv
##   [1] 0.99575552 0.97962649 0.97113752 0.96859083 0.95925297 0.94906621
##   [7] 0.93972835 0.92954160 0.92359932 0.91341256 0.90747029 0.89983022
##  [13] 0.89303905 0.88285229 0.87351443 0.86332767 0.85483871 0.85059423
##  [19] 0.84295416 0.83786078 0.83276740 0.82258065 0.81578947 0.80730051
##  [25] 0.80050934 0.79286927 0.79032258 0.78438031 0.77843803 0.77079796
##  [31] 0.76740238 0.76570458 0.75976231 0.75551783 0.75127334 0.74702886
##  [37] 0.74617997 0.74023769 0.73769100 0.73344652 0.73005093 0.72495756
##  [43] 0.72241087 0.71816638 0.71222411 0.70797963 0.70373514 0.69779287
##  [49] 0.69185059 0.69100170 0.69015280 0.68675722 0.68505942 0.68251273
##  [55] 0.67741935 0.67487267 0.66977929 0.66638370 0.66298812 0.66044143
##  [61] 0.65619694 0.65025467 0.64601019 0.64346350 0.64091681 0.64006791
##  [67] 0.63837012 0.63497453 0.63327674 0.62818336 0.62563667 0.62139219
##  [73] 0.61884550 0.61460102 0.61290323 0.61035654 0.60950764 0.60696095
##  [79] 0.60441426 0.60271647 0.59847199 0.59592530 0.59422750 0.59337861
##  [85] 0.58998302 0.58913413 0.58488964 0.58064516 0.57979626 0.57640068
##  [91] 0.57470289 0.56876061 0.56791171 0.56536503 0.56451613 0.56112054
##  [97] 0.56027165 0.55772496 0.55687606 0.55432937 0.55348048 0.55263158
## [103] 0.54838710 0.54753820 0.54584041 0.54414261 0.54244482 0.54074703
## [109] 0.53735144 0.53650255 0.53480475 0.53395586 0.53225806 0.53140917
## [115] 0.52886248 0.52546689 0.52376910 0.52292020 0.52122241 0.51867572
## [121] 0.51697793 0.51443124 0.51358234 0.51188455 0.51018676 0.50764007
## [127] 0.50679117 0.50509338 0.50254669 0.50169779 0.50084890 0.50000000
## [133] 0.49830221 0.49405772 0.49235993 0.49151104 0.48811545 0.48556876
## [139] 0.48471986 0.48387097 0.48217317 0.47962649 0.47707980 0.47538200
## [145] 0.47453311 0.47198642 0.46943973 0.46859083 0.46519525 0.46434635
## [151] 0.46349745 0.45925297 0.45840407 0.45755518 0.45415959 0.45246180
## [157] 0.45161290 0.45076401 0.44991511 0.44821732 0.44736842 0.44651952
## [163] 0.44567063 0.44397284 0.44312394 0.44142615 0.44057725 0.43972835
## [169] 0.43887946 0.43803056 0.43633277 0.43548387 0.43463497 0.43208829
## [175] 0.43123939 0.42954160 0.42784380 0.42614601 0.42359932 0.42275042
## [181] 0.42020374 0.41850594 0.41595925 0.41341256 0.41171477 0.41001698
## [187] 0.40916808 0.40831919 0.40747029 0.40577250 0.40322581 0.40067912
## [193] 0.39983022 0.39898132 0.39813243 0.39728353 0.39643463 0.39473684
## [199] 0.39388795 0.39219015 0.39134126 0.39049236 0.38879457 0.38794567
## [205] 0.38709677 0.38624788 0.38539898 0.38370119 0.38285229 0.38200340
## [211] 0.38115450 0.38030560 0.37775891 0.37691002 0.37606112 0.37521222
## [217] 0.37436333 0.37351443 0.37266553 0.37181664 0.37096774 0.36842105
## [223] 0.36757216 0.36502547 0.36417657 0.36332767 0.36247878 0.36078098
## [229] 0.35993209 0.35908319 0.35823430 0.35568761 0.35314092 0.35229202
## [235] 0.35059423 0.34974533 0.34804754 0.34719864 0.34634975 0.34550085
## [241] 0.34465195 0.34380306 0.34295416 0.34210526 0.34125637 0.34040747
## [247] 0.33955857 0.33786078 0.33531409 0.33361630 0.33276740 0.33191851
## [253] 0.33022071 0.32937182 0.32852292 0.32767402 0.32682513 0.32597623
## [259] 0.32427844 0.32342954 0.32258065 0.32173175 0.32088285 0.31918506
## [265] 0.31833616 0.31748727 0.31663837 0.31578947 0.31578947 0.31493368
## [271] 0.31493368 0.31407084 0.31320563 0.31234043 0.31234043 0.31060520
## [277] 0.30973759 0.30886754 0.30712253 0.30712253 0.30537254 0.30449755
## [283] 0.30449755 0.30449755 0.30449755 0.30360980 0.30272205 0.30183430
## [289] 0.30183430 0.30183430 0.30093865 0.30093865 0.29913662 0.29913662
## [295] 0.29823289 0.29732640 0.29732640 0.29641436 0.29549382 0.29457040
## [301] 0.29457040 0.29457040 0.29364116 0.29270896 0.29177379 0.29083862
## [307] 0.29083862 0.29083862 0.29083862 0.28989434 0.28895006 0.28895006
## [313] 0.28800268 0.28800268 0.28800268 0.28703946 0.28703946 0.28703946
## [319] 0.28606973 0.28510000 0.28413027 0.28413027 0.28023808 0.28023808
## [325] 0.27827837 0.27827837 0.27728806 0.27629774 0.27530387 0.27430639
## [331] 0.27430639 0.27430639 0.27430639 0.27329791 0.27329791 0.27329791
## [337] 0.27227814 0.27125454 0.27125454 0.26919177 0.26919177 0.26814839
## [343] 0.26814839 0.26709683 0.26604526 0.26604526 0.26393379 0.26393379
## [349] 0.26179668 0.26072812 0.26072812 0.26072812 0.26072812 0.25963721
## [355] 0.25854629 0.25854629 0.25631745 0.25631745 0.25631745 0.25519325
## [361] 0.25519325 0.25519325 0.25519325 0.25519325 0.25519325 0.25402798
## [367] 0.25402798 0.25402798 0.25285193 0.25167587 0.25049982 0.24932376
## [373] 0.24932376 0.24932376 0.24932376 0.24812509 0.24692642 0.24452907
## [379] 0.24333040 0.24333040 0.24333040 0.24333040 0.24333040 0.24333040
## [385] 0.24209522 0.24086004 0.24086004 0.24086004 0.24086004 0.23959899
## [391] 0.23833127 0.23833127 0.23833127 0.23833127 0.23833127 0.23702176
## [397] 0.23702176 0.23702176 0.23702176 0.23702176 0.23702176 0.23702176
## [403] 0.23702176 0.23702176 0.23563567 0.23563567 0.23563567 0.23423308
## [409] 0.23423308 0.23423308 0.23423308 0.23423308 0.23423308 0.23423308
## [415] 0.23275059 0.23125860 0.23125860 0.23125860 0.22974710 0.22823561
## [421] 0.22823561 0.22823561 0.22823561 0.22823561 0.22668299 0.22668299
## [427] 0.22668299 0.22668299 0.22668299 0.22507531 0.22507531 0.22507531
## [433] 0.22507531 0.22507531 0.22507531 0.22338301 0.22338301 0.22338301
## [439] 0.22338301 0.22158153 0.21978006 0.21978006 0.21978006 0.21978006
## [445] 0.21978006 0.21978006 0.21978006 0.21978006 0.21778206 0.21778206
## [451] 0.21778206 0.21570794 0.21363383 0.21363383 0.21363383 0.21363383
## [457] 0.21363383 0.21136113 0.20908843 0.20908843 0.20908843 0.20908843
## [463] 0.20908843 0.20908843 0.20908843 0.20908843 0.20908843 0.20650709
## [469] 0.20650709 0.20650709 0.20382518 0.20382518 0.20382518 0.20382518
## [475] 0.20382518 0.20382518 0.20382518 0.20382518 0.20382518 0.20382518
## [481] 0.20382518 0.20382518 0.20382518 0.20382518 0.20382518 0.20042809
## [487] 0.20042809 0.20042809 0.20042809 0.20042809 0.20042809 0.19657371
## [493] 0.19657371 0.19657371 0.19657371 0.19239129 0.19239129 0.19239129
## [499] 0.19239129 0.19239129 0.19239129 0.18769882 0.18288603 0.18288603
## [505] 0.18288603 0.18288603 0.18288603 0.18288603 0.18288603 0.18288603
## [511] 0.18288603 0.18288603 0.18288603 0.18288603 0.18288603 0.18288603
## [517] 0.18288603 0.18288603 0.18288603 0.18288603 0.18288603 0.18288603
## [523] 0.18288603 0.18288603 0.18288603 0.18288603 0.18288603 0.18288603
## [529] 0.18288603 0.18288603 0.18288603 0.18288603 0.18288603 0.18288603
## [535] 0.18288603 0.18288603 0.18288603 0.98817967 0.98108747 0.96453901
## [541] 0.95035461 0.93617021 0.91252955 0.90307329 0.88888889 0.87234043
## [547] 0.86288416 0.85106383 0.83451537 0.82269504 0.81087470 0.79432624
## [553] 0.78486998 0.78014184 0.76122931 0.75177305 0.73995272 0.72813239
## [559] 0.72576832 0.71631206 0.70921986 0.70212766 0.68321513 0.67375887
## [565] 0.65957447 0.65011820 0.64302600 0.63593381 0.62884161 0.62174941
## [571] 0.61938534 0.61465721 0.60992908 0.60283688 0.59338061 0.58628842
## [577] 0.58392435 0.58156028 0.57919622 0.57210402 0.56973995 0.56501182
## [583] 0.56028369 0.55791962 0.55082742 0.54846336 0.54137116 0.53900709
## [589] 0.53664303 0.53427896 0.53191489 0.52955083 0.52009456 0.51773050
## [595] 0.51536643 0.50591017 0.50118203 0.49881797 0.49408983 0.48699764
## [601] 0.48226950 0.47517730 0.47281324 0.47044917 0.46335697 0.46099291
## [607] 0.45862884 0.45626478 0.45153664 0.44917258 0.44444444 0.44208038
## [613] 0.43498818 0.43262411 0.42553191 0.42316785 0.41843972 0.41607565
## [619] 0.41371158 0.40661939 0.39952719 0.39716312 0.39479905 0.39243499
## [625] 0.39007092 0.38770686 0.38534279 0.38297872 0.37825059 0.37352246
## [631] 0.37115839 0.36643026 0.36406619 0.35697400 0.35224586 0.34988180
## [637] 0.34751773 0.34278960 0.34042553 0.33806147 0.33333333 0.33096927
## [643] 0.32624113 0.32387707 0.32151300 0.31914894 0.31678487 0.31442080
## [649] 0.31205674 0.30969267 0.30732861 0.30260047 0.29787234 0.29550827
## [655] 0.29314421 0.29078014 0.28841608 0.28605201 0.28368794 0.28132388
## [661] 0.27659574 0.27423168 0.27186761 0.26950355 0.26713948 0.26477541
## [667] 0.26241135 0.26004728 0.25768322 0.25531915 0.25295508 0.25059102
## [673] 0.24822695 0.24586288 0.24349882 0.23877069 0.23640662 0.23404255
## [679] 0.23167849 0.22931442 0.22695035 0.22458629 0.22222222 0.21985816
## [685] 0.21749409 0.21513002 0.21276596 0.21040189 0.20803783 0.20567376
## [691] 0.20330969 0.20094563 0.19621749 0.19385343 0.19148936 0.18912530
## [697] 0.18676123 0.18439716 0.18439716 0.18200239 0.17960763 0.17721286
## [703] 0.17481809 0.17481809 0.16996203 0.16753400 0.16510597 0.16510597
## [709] 0.16510597 0.16260437 0.16010276 0.15760116 0.15760116 0.15760116
## [715] 0.15760116 0.15760116 0.15760116 0.15483622 0.15483622 0.15202102
## [721] 0.15202102 0.15202102 0.15202102 0.15202102 0.15202102 0.15202102
## [727] 0.15202102 0.14871622 0.14541141 0.14210661 0.14210661 0.13872312
## [733] 0.13525504 0.13178696 0.12831888 0.12831888 0.12831888 0.12831888
## [739] 0.12831888 0.12831888 0.12831888 0.12831888 0.12404159 0.11976429
## [745] 0.11976429 0.11532857 0.11089286 0.11089286 0.11089286 0.11089286
## [751] 0.10534822 0.10534822 0.10534822 0.10534822 0.10534822 0.10534822
## [757] 0.10534822 0.08779018 0.08779018 0.08779018 0.08779018 0.08779018
## [763] 0.08779018 0.08779018 0.08779018 0.08779018 0.08779018 0.08779018
(n = fitgen$strata)  # nr fail times for each level of "gender"
## gender=0 gender=1 
##      537      231

# Compute and plot log cumulative hazard for each stratum
Lambda = -log(fitgen$surv)
plot( log(fitgen$time), log(Lambda), type="n", xlab = "Log Length of Stay", ylab = "Log Cum Hazard")
lines( log(fitgen$time)[1:n[1]], log(Lambda)[1:n[1]], col=2 )
lines( log(fitgen$time)[(n[1]+1):sum(n)], log(Lambda)[(n[1]+1):sum(n)], col=3, lty=2)
legend(4,-3, legend=c("Females", "Males"), lty=1:2, col=2:3)

plot of chunk ph-ph-assump

# Fit KM curves for single and married separately
fitmar = survfit(Surv(los, discharged) ~ married, data=nurshome)
names(fitmar)
##  [1] "n"         "time"      "n.risk"    "n.event"   "n.censor" 
##  [6] "surv"      "type"      "strata"    "std.err"   "upper"    
## [11] "lower"     "conf.type" "conf.int"  "call"
(n = fitmar$strata)  # nr fail times for each level of marital status
## married=0 married=1 
##       563       174

# Compute and plot log cumulative hazard for each stratum
Lambda = -log(fitmar$surv)
plot( log(fitmar$time), log(Lambda), type="n", xlab = "Log Length of Stay", ylab = "Log Cum Hazard")
lines( log(fitmar$time)[1:n[1]], log(Lambda)[1:n[1]], col=2 )
lines( log(fitmar$time)[(n[1]+1):sum(n)], log(Lambda)[(n[1]+1):sum(n)], col=3, lty=2)
legend(4,-3, legend=c("Not Married", "Married"), lty=1:2, col=2:3)

plot of chunk ph-ph-assump2

# Check PH for model including both gender and marital status
fitboth = survfit(Surv(los, discharged) ~ gender + married, data=nurshome)
fitboth
## Call: survfit(formula = Surv(los, discharged) ~ gender + married, data = nurshome)
## 
##                     records n.max n.start events median 0.95LCL 0.95UCL
## gender=0, married=0    1065  1065    1065    811  148.0     128     168
## gender=0, married=1     113   113     113     96   96.0      63     149
## gender=1, married=0     261   261     261    228   66.0      49     100
## gender=1, married=1     162   162     162    144   68.5      39      89

Testing PH using cox.zph

# Nursing Home example, continued
fitnh = coxph(Surv(los, discharged) ~ married + gender + age, data=nurshome)
(testnh = cox.zph(fitnh))
##              rho  chisq     p
## married  0.00476 0.0299 0.863
## gender  -0.03721 1.8414 0.175
## age      0.00281 0.0104 0.919
## GLOBAL        NA 2.0614 0.560

par(mfrow=c(2,2))
plot(testnh)  # Schoenfeld residuals - see plot

plot of chunk ph-nurs-assess

# Halibut data, revisited
fithal = coxph(Surv(Time, Death) ~ Towd + Length + Deldepth + Handtime + Logcat, data=halibut)
(testhal = cox.zph(fithal))
##               rho   chisq        p
## Towd     -0.12120  3.6198 0.057096
## Length    0.00719  0.0123 0.911639
## Deldepth -0.06451  0.7290 0.393202
## Handtime -0.03716  0.3819 0.536596
## Logcat   -0.16300  6.4651 0.011002
## GLOBAL         NA 20.8640 0.000859

par(mfrow=c(3,2))
plot(testhal) # see plot

plot of chunk ph-halibut-assess

What if proportional hazards fails?

The second two options relate to time-dependent covariates.

We will focus on the first alternative, and then the second two options will be briefly described.

Example: PH assumption for gender (nursing home data):

In the above example, we make the PH assumption for married and health, but not for gender.

This is like getting a KM survival estimate for each gen- der without assuming PH, but is more flexible since we can control for other covariates.

We would repeat the stratification for each variable for which we wanted to check the PH assumption.

# Nursing Home analysis, continued
fitnh = coxph(Surv(los, discharged) ~ married + health + gender, data=nurshome)
fitnh
## Call:
## coxph(formula = Surv(los, discharged) ~ married + health + gender, 
##     data = nurshome)
## 
## 
##          coef exp(coef) se(coef)    z       p
## married 0.161      1.17   0.0767 2.10 3.6e-02
## health  0.169      1.18   0.0311 5.42 6.0e-08
## gender  0.355      1.43   0.0660 5.37 7.7e-08
## 
## Likelihood ratio test=74.1  on 3 df, p=5.55e-16  n= 1601, number of events= 1279
fitnhs = coxph(Surv(los, discharged) ~ married + health + strata(gender), data = nurshome)
fitnhs
## Call:
## coxph(formula = Surv(los, discharged) ~ married + health + strata(gender), 
##     data = nurshome)
## 
## 
##          coef exp(coef) se(coef)    z       p
## married 0.163      1.18   0.0768 2.12 3.4e-02
## health  0.169      1.18   0.0312 5.43 5.6e-08
## 
## Likelihood ratio test=34.5  on 2 df, p=3.3e-08  n= 1601, number of events= 1279

# Plot log cum. base. haz. for strata
bh = basehaz(fitnhs)  # cum. base. hazards
plot(log(bh$time), log(bh$hazard), type="n", xlab = "log Time", ylab="log Cumulative Hazard")
lines(log(bh$time)[bh$strata=="gender=0"], log(bh$hazard)[bh$strata=="gender=0"], lty=2, col=2)
lines(log(bh$time)[bh$strata=="gender=1"], log(bh$hazard)[bh$strata=="gender=1"], lty=3, col=3)
legend(4,-3, legend=c("Females", "Males"), lty=2:3, col=2:3)

plot of chunk ph-stratify

References