Exercise 1: Parametric models with covariates


Back to course page


Parametric regression models

Consider an ordinary regression model for log survival time of the form \[ Y_i=\log T_i=-{\bf X}_i{\boldsymbol\beta}+\sigma{\bf W}_i \] where the error term \(W_i\) has a suitable distribution, e.g., extreme value, generalized extreme value, normal, or logistic. This leads to Weibull, generalized gamma, log-normal or log-logistic models for \(T_i\).

Using glm function from R and ingoring censoring

Atlantic halibut survival times data

One conservation measure suggested for trawl fishing is a minimum size limit for halibut (32 inches). However, this size limit would only be effective if captured fish below the limit survived until the time of their release. An experiment was conducted to evaluate the survival rates of halibut caught by trawls or longlines, and to assess other factors which might contribute to survival (duration of trawling, maximum depth fished, size of fish, and handling time).

halibut = read.csv("http://www.karlin.mff.cuni.cz/~pesta/prednasky/NMFM404/Data/halibut.csv", head=T)
dim(halibut)
## [1] 294   8
head(halibut)
##   No Time Death Towd Deldepth Length Handtime Logcat
## 1  1  209     1   30       13     41        8  6.992
## 2  2  209     1   30       13     44        8  6.992
## 3  3  209     1   30       13     47       10  6.992
## 4  4  209     1   30       13     34       10  6.992
## 5  5   38     1   30       13     40       11  6.992
## 6  6  209     1   30       13     42       11  6.992
summary(halibut)
##        No              Time             Death             Towd       
##  Min.   :  1.00   Min.   :   1.10   Min.   :0.0000   Min.   : 30.00  
##  1st Qu.: 74.25   1st Qu.:  13.20   1st Qu.:1.0000   1st Qu.: 30.00  
##  Median :147.50   Median :  29.80   Median :1.0000   Median :100.00  
##  Mean   :147.50   Mean   :  88.91   Mean   :0.9286   Mean   : 74.52  
##  3rd Qu.:220.75   3rd Qu.:  82.28   3rd Qu.:1.0000   3rd Qu.:100.00  
##  Max.   :294.00   Max.   :1200.00   Max.   :1.0000   Max.   :100.00  
##     Deldepth          Length         Handtime         Logcat     
##  Min.   : 1.000   Min.   :29.00   Min.   : 1.00   Min.   :2.904  
##  1st Qu.: 5.000   1st Qu.:38.00   1st Qu.: 6.00   1st Qu.:4.059  
##  Median : 8.000   Median :43.00   Median :10.00   Median :4.203  
##  Mean   : 9.966   Mean   :42.51   Mean   :12.51   Mean   :4.886  
##  3rd Qu.:10.000   3rd Qu.:46.75   3rd Qu.:17.00   3rd Qu.:5.323  
##  Max.   :58.000   Max.   :57.00   Max.   :38.00   Max.   :8.690
# Time = survival time (minutes)
# Death = event indicator
# Towd = towing duration
# Deldepth = depth range of towing
# Length =  length of fish (cm)
# Handtime = handling time (min)
# Logcat = log total catch

Gamma distribution of log survival time

summary(halibut.gamma <- glm(log(Time) ~ Towd + Deldepth + Length + Handtime + Logcat, family = Gamma(identity), data = halibut))
## Warning: glm.fit: algorithm did not converge
## 
## Call:
## glm(formula = log(Time) ~ Towd + Deldepth + Length + Handtime + 
##     Logcat, family = Gamma(identity), data = halibut)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -2.02542  -0.30063   0.00607   0.22418   1.08881  
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.926200   0.617453   4.739 3.38e-06 ***
## Towd        -0.014068   0.003175  -4.431 1.33e-05 ***
## Deldepth    -0.016685   0.009073  -1.839 0.066952 .  
## Length       0.048097   0.012928   3.720 0.000239 ***
## Handtime    -0.055115   0.011497  -4.794 2.63e-06 ***
## Logcat       0.066403   0.071847   0.924 0.356143    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Gamma family taken to be 0.1874759)
## 
##     Null deviance: 101.027  on 293  degrees of freedom
## Residual deviance:  79.962  on 288  degrees of freedom
## AIC: 1095
## 
## Number of Fisher Scoring iterations: 25
### Remark: canonical link = inverse, not identity or log
names(halibut.gamma)
##  [1] "coefficients"      "residuals"         "fitted.values"    
##  [4] "effects"           "R"                 "rank"             
##  [7] "qr"                "family"            "linear.predictors"
## [10] "deviance"          "aic"               "null.deviance"    
## [13] "iter"              "weights"           "prior.weights"    
## [16] "df.residual"       "df.null"           "y"                
## [19] "converged"         "boundary"          "model"            
## [22] "call"              "formula"           "terms"            
## [25] "data"              "offset"            "control"          
## [28] "method"            "contrasts"         "xlevels"
halibut.gamma$deviance
## [1] 79.96172
halibut.gamma$null.deviance
## [1] 101.0268
### summary(halibut.gamma <- glm(Time ~ Towd + Deldepth + Length + Handtime + Logcat, family = Gamma(log), data = halibut))
### Why not?

Exponential distribution of log survival time

halibut.exp <- glm(log(Time) ~ Towd + Deldepth + Length + Handtime + Logcat, family = Gamma(identity), data = halibut, control = list(maxit = 500))
summary(halibut.exp, dispersion = 1)
## 
## Call:
## glm(formula = log(Time) ~ Towd + Deldepth + Length + Handtime + 
##     Logcat, family = Gamma(identity), data = halibut, control = list(maxit = 500))
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -2.02519  -0.30160   0.00546   0.22557   1.08931  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)  
## (Intercept)  2.927170   1.428074   2.050   0.0404 *
## Towd        -0.014133   0.007334  -1.927   0.0540 .
## Deldepth    -0.015612   0.020705  -0.754   0.4508  
## Length       0.048473   0.029843   1.624   0.1043  
## Handtime    -0.054857   0.026564  -2.065   0.0389 *
## Logcat       0.060973   0.165771   0.368   0.7130  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Gamma family taken to be 1)
## 
##     Null deviance: 101.027  on 293  degrees of freedom
## Residual deviance:  79.954  on 288  degrees of freedom
## AIC: 1095
## 
## Number of Fisher Scoring iterations: 61

Normal distribution of log survival time

summary(halibut.lm <- lm(log(Time) ~ Towd + Deldepth + Length + Handtime + Logcat, data = halibut))
## 
## Call:
## lm(formula = log(Time) ~ Towd + Deldepth + Length + Handtime + 
##     Logcat, data = halibut)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.9516 -0.9400  0.0826  0.8058  3.9494 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.338289   0.602193   3.883 0.000128 ***
## Towd        -0.013225   0.002587  -5.113 5.81e-07 ***
## Deldepth    -0.031566   0.011064  -2.853 0.004643 ** 
## Length       0.042868   0.012805   3.348 0.000923 ***
## Handtime    -0.059059   0.011797  -5.006 9.69e-07 ***
## Logcat       0.258838   0.070134   3.691 0.000267 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.275 on 288 degrees of freedom
## Multiple R-squared:  0.3155, Adjusted R-squared:  0.3036 
## F-statistic: 26.55 on 5 and 288 DF,  p-value: < 2.2e-16
### OR summary(halibut.normal <- glm(log(Time) ~ Towd + Deldepth + Length + Handtime + Logcat, family = gaussian, data = halibut))
### WHY NOT ? summary(halibut.normal <- glm(Time ~ Towd + Deldepth + Length + Handtime + Logcat, family = gaussian(log), data = halibut))

Using glm function from R and taking into account censoring

Exponential regression with arbitrary random censoring

Let \(T_1,\ldots,T_n\) be distributed according to \(\mathsf{Exp}(\lambda_i)\) and let \(C_1,\ldots,C_n\) be independent of each other and independent of \(T_1,\ldots,T_n\), with arbitrary distributions. Introduce covariate vectors \({\bf Z}_i\) of dimension \(p\) representing a random sample from an arbitrary distribution. We observe independent triplets \[ (X_1, {\bf Z}_1, \delta_1), \ldots, (X_n, {\bf Z}_n, \delta_n), \] where \(X_i =\min\{T_i,C_i\}\) and \(\delta_i =\mathcal{I}(T_i\leq C_i\}\). Suppose there exists a \(p\)-vector \({\boldsymbol\beta}\) of regression parameters such that \[ \lambda_i=\exp\left\{{\bf Z}_i{\boldsymbol\beta}\right\}. \] The score statistic of the above defined model is equivalent to the score statistic of a Poisson loglinear model with \(\delta_i\) as the response and \(\log X_i\) as the offset. Algorithms for finding the maximum likelihood estimator of \({\boldsymbol\beta}\), calculating the observed information matrix, approximating the distribution of the estimated \({\boldsymbol\beta}\), performing tests about \({\boldsymbol\beta}\) and building confidence intervals are all taken from the Poisson loglinear model.

### Death is censoring indicator !!!
summary(halibut.cens <- glm(Death ~ Towd + Deldepth + Length + Handtime + Logcat + offset(log(Time)), family=poisson, data = halibut))
## 
## Call:
## glm(formula = Death ~ Towd + Deldepth + Length + Handtime + Logcat + 
##     offset(log(Time)), family = poisson, data = halibut)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -5.4391  -0.1859   0.6323   1.3659   2.4642  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -2.581185   0.519335  -4.970 6.69e-07 ***
## Towd         0.008305   0.002023   4.105 4.05e-05 ***
## Deldepth     0.016099   0.007578   2.124   0.0336 *  
## Length      -0.046115   0.010234  -4.506 6.60e-06 ***
## Handtime     0.072703   0.010106   7.194 6.29e-13 ***
## Logcat      -0.292355   0.052443  -5.575 2.48e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 683.77  on 293  degrees of freedom
## Residual deviance: 516.20  on 288  degrees of freedom
## AIC: 1074.2
## 
## Number of Fisher Scoring iterations: 7

Task 1

Build up a suitable regression model for the halibut survival times within the exponential family framework and taking into account censoring.

Due date: Nov. 15