The enrichwith
package provides the enrich
method to enrich list-like R
objects with new, relevant components. The resulting objects preserve
their class, so all methods associated with them still apply.
This vignette is a short case study demonstrating how enriched
glm
objects can be used to implement a quasi Fisher
scoring procedure for computing reduced-bias estimates in generalized
linear models (GLMs). Kosmidis and Firth
(2010) describe a parallel quasi Newton-Raphson
procedure.
Heinze and Schemper (2002) used a logistic regression model to analyse data from a study on endometrial cancer. Agresti (2015, sec. 5.7) provide details on the data set. Below, we fit a probit regression model with the same linear predictor as the logistic regression model in Heinze and Schemper (2002).
# Get the data from the online supplmementary material of Agresti (2015)
data("endometrial", package = "enrichwith")
modML <- glm(HG ~ NV + PI + EH, family = binomial("probit"), data = endometrial)
theta_mle <- coef(modML)
summary(modML)
##
## Call:
## glm(formula = HG ~ NV + PI + EH, family = binomial("probit"),
## data = endometrial)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.18093 0.85732 2.544 0.010963 *
## NV 5.80468 402.23641 0.014 0.988486
## PI -0.01886 0.02360 -0.799 0.424066
## EH -1.52576 0.43308 -3.523 0.000427 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 104.90 on 78 degrees of freedom
## Residual deviance: 56.47 on 75 degrees of freedom
## AIC: 64.47
##
## Number of Fisher Scoring iterations: 17
As is the case for the logistic regression in Heinze and Schemper (2002), the maximum
likelihood (ML) estimate of the parameter for NV
is
actually infinite. The reported, apparently finite value is merely due
to false convergence of the iterative estimation procedure. The same is
true for the estimated standard error, and, hence the value
r round(coef(summary(modML))["NV", "z value"], 3)
for the
z-statistic cannot be trusted
for inference on the size of the effect for NV
.
In categorical-response models like the above, the bias reduction method in Firth (1993) has been found to result in finite estimates even when the ML ones are infinite (see, Heinze and Schemper 2002, for logistic regressions; Kosmidis and Firth 2011, for multinomial regressions; Kosmidis 2014, for cumulative link models).
One of the variants of that bias reduction method is implemented in the brglm R package, which estimates binomial-reponse GLMs using iterative ML fits on binomial pseudo-data (see, Kosmidis 2007, chap. 5, for details). The reduced-bias estimates for the probit regression on the endometrial data can be computed as follows.
## Loading required package: profileModel
## 'brglm' will gradually be superseded by the 'brglm2' R package (https://cran.r-project.org/package=brglm2), which provides utilities for mean and median bias reduction for all GLMs.
## Methods for the detection of separation and infinite estimates in binomial-response models are provided by the 'detectseparation' R package (https://cran.r-project.org/package=detectseparation).
modBR <- brglm(HG ~ NV + PI + EH, family = binomial("probit"), data = endometrial)
theta_brglm <- coef(modBR)
summary(modBR)
##
## Call:
## brglm(formula = HG ~ NV + PI + EH, family = binomial("probit"),
## data = endometrial)
##
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.91460 0.78877 2.427 0.015210 *
## NV 1.65892 0.74730 2.220 0.026427 *
## PI -0.01520 0.02089 -0.728 0.466793
## EH -1.37988 0.40329 -3.422 0.000623 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 93.983 on 78 degrees of freedom
## Residual deviance: 57.587 on 75 degrees of freedom
## AIC: 65.587
The z-statistic for
NV
has value 2.22 when based on the reduced-bias estimates,
providing some evidence for the existence of an effect.
In the following, we use enrichwith to implement two variants of the bias reduction method via a unifying quasi Fisher scoring estimation procedure.
Consider a parametric statistical model 𝒫θ with unknown parameter θ ∈ ℜp and the iteration θ(k + 1) := θ(k) + {i(θ(k))}−1s(θ(k)) − c(θ(k))b(θ(k)) where θ(k) is the value of θ at the kth iteration, s(θ) is the gradient of the log-likelihood for 𝒫θ, i(θ) is the expected information matrix, and b(θ) is the O(n−1) term in the expansion of the bias of the ML estimator of θ (see, for example, Cox and Snell 1968).
The above iteration defines a quasi Fisher scoring estimation procedure in general, and reduces to exact Fisher scoring for ML estimation when c(θ(k)) = 0p, where 0p is a vector of p zeros.
For either c(θ) = Ip or c(θ) = {i(θ)}−1j(θ), where Ip is the p × p identity matrix and j(θ) is the observed information, θ(∞) (if it exists) is a reduced-bias estimate, in the sense that it corresponds an estimator with bias of smaller asymptotic order than that of the ML estimator (see, Firth 1993; Kosmidis and Firth 2010). The brglm estimates correspond to c(θ) = Ip.
The asymptotic distribution of the reduced-bias estimators is the same to that of the ML estimator (see, Firth 1993 for details). So, the reduced-bias estimates can be readily used to calculate z-statistics.
For implementing the iteration for bias reduction, we need functions that can compute the gradient of the log-likelihood, the observed and expected information matrix, and b(θ) at arbitrary values of θ.
The enrichwith package can produce those functions
for any glm
object through the
auxiliary_functions
enrichment option (to see all available
enrichment options for glm
objects run
get_enrichment_options(modML)
.
Let’s extract the functions from the enriched_modML
object.
# Extract the ingredients for the quasi Fisher scoring iteration from the enriched glm object
gradient <- enriched_modML$auxiliary_functions$score # gradient of the log-likelihood
information <- enriched_modML$auxiliary_functions$information # information matrix
bias <- enriched_modML$auxiliary_functions$bias # first-order bias
For the more technically minded, note here that the above functions
are specific to modML
in the sense that they look into a
special environment for necessary objects like the model matrix, the
model weights, the response vector, and so on.
This stems from the way enrichwith has been
implemented. In particular, if create_enrichwith_skeleton
is used, the user/developer can directly implement enrichment options to
enrich objects with functions that directly depend on other components
in the object to be enriched.
The following code chunk uses enriched_modML
to
implement the quasi Fisher scoring procedure for the analysis of the
endometrial cancer data. For p <- length(theta_mle)
,
the starting value for the parameter vector is set to
theta_current <- rep(0, p)
, and the maximum number of
iterations to maxit <- 100
. As stopping criterion, we
use the absolute change in each parameter value with tolerance
epsilon <- 1e-06
# The quasi Fisher scoring iteration using c(theta) = identity
for (k in seq.int(maxit)) {
s_vector <- gradient(theta_current)
i_matrix <- information(theta_current, type = "expected")
b_vector <- bias(theta_current)
step <- solve(i_matrix) %*% s_vector - b_vector
theta_current <- theta_current + step
# A stopping criterion
if (all(abs(step) < epsilon)) {
break
}
}
(theta_e <- drop(theta_current))
## (Intercept) NV PI EH
## 1.91460348 1.65892018 -0.01520487 -1.37987837
## attr(,"coefficients")
## [1] 0 0 0 0
## attr(,"dispersion")
## [1] 1
The estimation procedure took 8 iterations to converge, and, as expected, the estimates are numerically the same to the ones that brglm returned.
## [1] TRUE
A set of alternative reduced-bias estimates can be obtained using
c(θ) = {i(θ)}−1j(θ).
Starting again at theta_current <- rep(0, p)
# The quasi Fisher scoring iteration using c(theta) = solve(i(theta)) %*% j(theta)
for (k in seq.int(maxit)) {
s_vector <- gradient(theta_current)
i_matrix <- information(theta_current, type = "expected")
j_matrix <- information(theta_current, type = "observed")
b_vector <- bias(theta_current)
step <- solve(i_matrix) %*% (s_vector - j_matrix %*% b_vector)
theta_current <- theta_current + step
# A stopping criterion
if (all(abs(step) < epsilon)) {
break
}
}
(theta_o <- drop(theta_current))
## (Intercept) NV PI EH
## 1.89707065 1.72815655 -0.01471219 -1.37254188
The estimation procedure took 9 iterations to converge.
The ML estimates and the estimates from the two variants of the bias reduction method are
## theta_mle theta_e theta_o
## (Intercept) 2.181 1.915 1.897
## NV 5.805 1.659 1.728
## PI -0.019 -0.015 -0.015
## EH -1.526 -1.380 -1.373
Note that the reduced-bias estimates have shrunk towards zero. This is typical for reduced-bias estimation in binomial-response GLMs (see, for example, Cordeiro and McCullagh 1991, sec. 8; Kosmidis 2007, sec. 5.2; Kosmidis 2014 for shrinkage in cumulative link models).
The corresponding z-statistics are
se_theta_mle <- sqrt(diag(solve(information(theta_mle, type = "expected"))))
se_theta_e <- sqrt(diag(solve(information(theta_e, type = "expected"))))
se_theta_o <- sqrt(diag(solve(information(theta_o, type = "expected"))))
round(data.frame(z_mle = theta_mle/se_theta_mle,
z_br_e = theta_e/se_theta_e,
z_br_o = theta_o/se_theta_o), 3)
## z_mle z_br_e z_br_o
## (Intercept) 2.544 2.427 2.407
## NV 0.009 2.220 2.215
## PI -0.799 -0.728 -0.701
## EH -3.523 -3.422 -3.411
The two variants for bias reduction result in slightly different
reduced-bias estimates and z-statistics, though the z-statistics from both variants
provide some evidence for the existence of an effect for
NV
.
A general family of bias reduction methods is described in Kosmidis and Firth (2009).
The quasi Fisher scoring iteration that has been described here is at the core of the brglm2 R package, which provides various bias reduction methods for GLMs.