Probit regression, also called a probit model, is used to model dichotomous or binary outcome variables. In the probit model, the inverse standard normal distribution of the probability is modeled as a linear combination of the predictors.
library(aod) library(ggplot2)
Example 1. Suppose that we are interested in the factors that influence whether a political candidate wins an election. The outcome (response) variable is binary (0/1); win or lose. The predictor variables of interest are the amount of money spent on the campaign, the amount of time spent campaigning negatively and whether or not the candidate is an incumbent.
Example 2. A researcher is interested in how variables, such as GRE (Graduate Record Exam scores), GPA (grade point average) and prestige of the undergraduate institution, effect admission into graduate school. The response variable, admit/don't admit, is a binary variable.
Probit regression is a special type of the Generalized Linear Models (GLM; will be explained later). Here, the bivariate outcome \(Y\) has a Bernoulli distribution with parameter \(p\) (success probability \(p\in(0,1)\)). Recall that \(\mathsf{E}Y=p\). The probit link function \[ probit(\mathsf{E}Y)=\Phi^{-1}(p)=\Phi^{-1}(\mathsf{P}[Y=1]) \] is used to transform the expectation of this 0/1 dependent variable. Then, the probit of the mean is modelled as a linear combination of the covariates (regressors) \({\boldsymbol X}\), i.e., we have a linear predictor \[ probit(\mathsf{E}{\bf Y})={\boldsymbol X}{\boldsymbol\beta}, \] where \({\boldsymbol\beta}\) is a vector of unknown parameters. The maximum likelihood based approach is used for the parameter estimation, where a version of the IRLS algorithm is applied (Newton???Raphson method, Fisher's scoring method, ...).
Predicted probability can be obtained by the inverse probit (i.e., standard normal cdf) transformation \[ \widehat{\mathsf{P}}[Y_i=1]=\Phi({\boldsymbol X}_{i,\bullet}\widehat{\boldsymbol\beta}). \]
For our data analysis below, we are going to expand on Example 2 about getting into graduate school. We have generated hypothetical data.
mydata <- read.csv("http://www.karlin.mff.cuni.cz/~pesta/prednasky/NMFM404/Data/binary.csv") ## convert rank to a factor (categorical variable) mydata$rank <- factor(mydata$rank) ## view the first few rows of the data head(mydata)
## admit gre gpa rank ## 1 0 380 3.61 3 ## 2 1 660 3.67 3 ## 3 1 800 4.00 1 ## 4 1 640 3.19 4 ## 5 0 520 2.93 4 ## 6 1 760 3.00 2
This dataset has a binary response (outcome, dependent) variable called admit
. There are three predictor variables: gre
, gpa
and rank
. We will treat the variables gre
and gpa
as continuous. The variable rank
takes on the values 1 through 4. Institutions with a rank of 1 have the highest prestige, while those with a rank of 4 have the lowest. We can get basic descriptives for the entire data set by using summary
. To get the standard deviations, we use sapply
to apply the sd
function to each variable in the dataset.
summary(mydata)
## admit gre gpa rank ## Min. :0.0000 Min. :220.0 Min. :2.260 1: 61 ## 1st Qu.:0.0000 1st Qu.:520.0 1st Qu.:3.130 2:151 ## Median :0.0000 Median :580.0 Median :3.395 3:121 ## Mean :0.3175 Mean :587.7 Mean :3.390 4: 67 ## 3rd Qu.:1.0000 3rd Qu.:660.0 3rd Qu.:3.670 ## Max. :1.0000 Max. :800.0 Max. :4.000
sapply(mydata, sd)
## admit gre gpa rank ## 0.4660867 115.5165364 0.3805668 0.9444602
## two-way contingency table of categorical outcome and predictors we want ## to make sure there are not 0 cells xtabs(~admit + rank, data = mydata)
## rank ## admit 1 2 3 4 ## 0 28 97 93 55 ## 1 33 54 28 12
Below is a list of some analysis methods you may have encountered. Some of the methods listed are quite reasonable while others have either fallen out of favor or have limitations.
The code below estimates a probit regression model using the glm
(generalized linear model) function. Since we stored our model output in the object myprobit
, R will not print anything to the console. We can use the summary
function to get a summary of the model and all the estimates.
myprobit <- glm(admit ~ gre + gpa + rank, family = binomial(link = "probit"), data = mydata) ## model summary summary(myprobit)
## ## Call: ## glm(formula = admit ~ gre + gpa + rank, family = binomial(link = "probit"), ## data = mydata) ## ## Deviance Residuals: ## Min 1Q Median 3Q Max ## -1.6163 -0.8710 -0.6389 1.1560 2.1035 ## ## Coefficients: ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) -2.386836 0.673946 -3.542 0.000398 *** ## gre 0.001376 0.000650 2.116 0.034329 * ## gpa 0.477730 0.197197 2.423 0.015410 * ## rank2 -0.415399 0.194977 -2.131 0.033130 * ## rank3 -0.812138 0.208358 -3.898 9.71e-05 *** ## rank4 -0.935899 0.245272 -3.816 0.000136 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## (Dispersion parameter for binomial family taken to be 1) ## ## Null deviance: 499.98 on 399 degrees of freedom ## Residual deviance: 458.41 on 394 degrees of freedom ## AIC: 470.41 ## ## Number of Fisher Scoring iterations: 4
gre
, gpa
, and the three terms for rank
are statistically significant. The probit regression coefficients give the change in the z-score or probit index for a one unit change in the predictor.gre
, the z-score increases by 0.001.gpa
, the z-score increases by 0.478.rank
have a slightly different interpretation. For example, having attended an undergraduate institution of rank
of 2, versus an institution with a rank
of 1 (the reference group), decreases the z-score by 0.415.We can use the confint function to obtain confidence intervals for the coefficient estimates. These will be profiled confidence intervals by default, created by profiling the likelihood function. As such, they are not necessarily symmetric.
confint(myprobit)
## 2.5 % 97.5 % ## (Intercept) -3.7201050682 -1.076327713 ## gre 0.0001104101 0.002655157 ## gpa 0.0960654793 0.862610221 ## rank2 -0.7992113929 -0.032995019 ## rank3 -1.2230955861 -0.405008112 ## rank4 -1.4234218227 -0.459538829
wald.test
function of the aod
library. The order in which the coefficients are given in the table of coefficients is the same as the order of the terms in the model. This is important because the wald.test
function refers to the coefficients by their order in the model. We use the wald.test
function. b
supplies the coefficients, while Sigma
supplies the variance covariance matrix of the error terms, finally Terms
tells R which terms in the model are to be tested, in this case, terms 4, 5, and 6, are the three terms for the levels of rank
.
wald.test(b = coef(myprobit), Sigma = vcov(myprobit), Terms = 4:6)
## Wald test: ## ---------- ## ## Chi-squared test: ## X2 = 21.4, df = 3, P(> X2) = 8.9e-05
rank
is statistically significant.
We can also test additional hypotheses about the differences in the coefficients for different levels of rank
. Below we test that the coefficient for rank
=2 is equal to the coefficient for rank
=3. The first line of code below creates a vector l
that defines the test we want to perform. In this case, we want to test the difference (subtraction) of the terms for rank
=2 and rank
=3 (i.e. the 4th and 5th terms in the model). To contrast these two terms, we multiply one of them by 1, and the other by -1. The other terms in the model are not involved in the test, so they are multiplied by 0. The second line of code below uses L=l
to tell R that we wish to base the test on the vector l
(rather than using the Terms
option as we did above).
l <- cbind(0, 0, 0, 1, -1, 0) wald.test(b = coef(myprobit), Sigma = vcov(myprobit), L = l)
## Wald test: ## ---------- ## ## Chi-squared test: ## X2 = 5.6, df = 1, P(> X2) = 0.018
rank
=2 and the coefficient for rank
=3 is statistically significant.
You can also use predicted probabilities to help you understand the model. To do this, we first create a data frame containing the values we want for the independent variables.
newdata <- data.frame(gre = rep(seq(from = 200, to = 800, length.out = 100), 4 * 4), gpa = rep(c(2.5, 3, 3.5, 4), each = 100 * 4), rank = factor(rep(rep(1:4, each = 100), 4))) head(newdata)
## gre gpa rank ## 1 200.0000 2.5 1 ## 2 206.0606 2.5 1 ## 3 212.1212 2.5 1 ## 4 218.1818 2.5 1 ## 5 224.2424 2.5 1 ## 6 230.3030 2.5 1
gre
scores. We create four plots, one for each level of gpa
we used (2.5, 3, 3.5, 4) with the colour of the lines indicating the rank
the predicted probabilities were for.
newdata[, c("p", "se")] <- predict(myprobit, newdata, type = "response", se.fit = TRUE)[-3] ggplot(newdata, aes(x = gre, y = p, colour = rank)) + geom_line() + facet_wrap(~gpa)
We may also wish to see measures of how well our model fits. This can be particularly useful when comparing competing models. The output produced by summary(mylogit)
included indices of fit (shown below the coefficients), including the null and deviance residuals and the AIC. One measure of model fit is the significance of the overall model. This test asks whether the model with predictors fits significantly better than a model with just an intercept (i.e. a null model). The test statistic is the difference between the residual deviance for the model with predictors and the null model. The test statistic is distributed chi-squared with degrees of freedom equal to the differences in degrees of freedom between the current and the null model (i.e. the number of predictor variables in the model). To find the difference in deviance for the two models (i.e. the test statistic) we can compute the change in deviance, and test it using a chi square test---the change in deviance distributed as chi square on the change in degrees of freedom.
## change in deviance with(myprobit, null.deviance - deviance)
## [1] 41.56335
## change in degrees of freedom with(myprobit, df.null - df.residual)
## [1] 5
## chi square test p-value with(myprobit, pchisq(null.deviance - deviance, df.null - df.residual, lower.tail = FALSE))
## [1] 7.218932e-08
logLik(myprobit)
## 'log Lik.' -229.2066 (df=6)