The brglm2 package

brglm2 provides tools for the estimation and inference from generalized linear models using various methods for bias reduction or maximum penalized likelihood with powers of the Jeffreys prior as penalty. Reduction of estimation bias is achieved either through the mean-bias reducing adjusted score equations in Firth (1993) and I. Kosmidis and Firth (2009), or through the direct subtraction of an estimate of the bias of the maximum likelihood estimator from the maximum likelihood estimates as prescribed in Cordeiro and McCullagh (1991), or through the median-bias reducing adjusted score equations in Kenne Pagui, Salvan, and Sartori (2017).

In the special case of generalized linear models for binomial, Poisson and multinomial responses (both nominal and ordinal), mean and median bias reduction and maximum penalized likelihood return estimates with improved frequentist properties, that are also always finite, even in cases where the maximum likelihood estimates are infinite, like in complete and quasi-complete separation as defined in Albert and Anderson (1984).

The workhorse function is brglmFit(), which can be passed directly to the method argument of the glm function. brglmFit implements a quasi-Fisher scoring procedure, whose special cases result in various explicit and implicit bias reduction methods for generalized linear models (the classification of bias reduction methods into explicit and implicit is given in I. Kosmidis 2014).

This vignette

presents the supported bias-reducing adjustments to the score functions for generalized linear models
describes the fitting algorithm at the core of brglm2

Other resources

The bias-reducing quasi Fisher scoring iteration is also described in detail in the bias vignette of the enrichwith R package. I. Kosmidis and Firth (2010) describe a parallel quasi Newton-Raphson procedure.

Most of the material in this vignette comes from a presentation by Ioannis Kosmidis at the useR! 2016 international conference at University of Stanford on 16 June 2016. The presentation was titled “Reduced-bias inference in generalized linear models” and can be watched online at this link.

Generalized linear models

Model

Suppose that y₁, …, y_n are observations on independent random variables Y₁, …, Y_n, each with probability density/mass function of the form $$ f_{Y_i}(y) = \exp\left\{\frac{y \theta_i - b(\theta_i) - c_1(y)}{\phi/m_i} - \frac{1}{2}a\left(-\frac{m_i}{\phi}\right) + c_2(y) \right\} $$ for some sufficiently smooth functions b(.), c₁(.), a(.) and c₂(.), and fixed observation weights m₁, …, m_n. The expected value and the variance of Y_i are then Hence, in this parameterization, ϕ is a dispersion parameter.

A generalized linear model links the mean μ_i to a linear predictor η_i as $$ g(\mu_i) = \eta_i = \sum_{t=1}^p \beta_t x_{it} $$ where g(.) is a monotone, sufficiently smooth link function, taking values on ℜ, x_it is the (i, t)th component of a model matrix X, and β = (β₁, …, β_p)^⊤.

Score functions and information matrix

Suppressing the dependence of the various quantities on the model parameters and the data, the derivatives of the log-likelihood about β and ϕ (score functions) are with y = (y₁, …, y_n)^⊤, μ = (μ₁, …, μ_n)^⊤, $W = {\rm diag}\left\{w_1, \ldots, w_n\right\}$ and $D = {\rm diag}\left\{d_1, \ldots, d_n\right\}$, where w_i = m_id_i²/v_i is the ith working weight, with d_i = dμ_i/dη_i and v_i = V(μ_i). Furthermore, q_i = −2m_i{y_iθ_i − b(θ_i) − c₁(y_i)} and ρ_i = m_ia′_i with a′_i = a′(−m_i/ϕ). The expected information matrix about β and ϕ is $$ i = \left[ \begin{array}{cc} i_{\beta\beta} & 0_p \\ 0_p^\top & i_{\phi\phi} \end{array} \right] = \left[ \begin{array}{cc} \frac{1}{\phi} X^\top W X & 0_p \\ 0_p^\top & \frac{1}{2\phi^4}\sum_{i = 1}^n m_i^2 a''_i \end{array} \right]\,, $$ where 0_p is a p-vector of zeros, and a″_i = a″(−m_i/ϕ).

Maximum likelihood estimation

The maximum likelihood estimators β̂ and ϕ̂ of β and ϕ, respectively, can be found by the solution of the score equations s_β = 0_p and s_ϕ = 0.

Mean bias-reducing adjusted score functions

Let A_β = −i_ββb_β and A_ϕ = −i_ϕϕb_ϕ, where b_β and b_ϕ are the first terms in the expansion of the mean bias of the maximum likelihood estimator of the regression parameters β and dispersion ϕ, respectively. The results in Firth (1993) can be used to show that the solution of the adjusted score equations results in estimators β̃ and ϕ̃ with bias of smaller asymptotic order than the maximum likelihood estimator.

The results in either I. Kosmidis and Firth (2009) or Cordeiro and McCullagh (1991) can then be used to re-express the adjustments in forms that are convenient for implementation. In particular, and after some algebra the bias-reducing adjustments for generalized linear models are where ξ = (ξ₁, …, ξ_n)^T with ξ_i = h_id_i′/(2d_iw_i), d_i′ = d²μ_i/dη_i², a″_i = a″(−m_i/ϕ), a‴_i = a‴(−m_i/ϕ), and h_i is the “hat” value for the ith observation (see, e.g. ?hatvalues).

Median bias-reducing adjusted score functions

The results in Kenne Pagui, Salvan, and Sartori (2017) can be used to show that if then the solution of the adjusted score equations s_β + A_β = 0_p and s_ϕ + A_ϕ = 0 results in estimators β̃ and ϕ̃ with median bias of smaller asymptotic order than the maximum likelihood estimator. In the above expression, u = (u₁, …, u_p)^⊤ with where [A]_j denotes the jth row of matrix A as a column vector, v′_i = V′(μ_i), and h̃_j, i is the ith diagonal element of XK_jX^TW, with K_j = [(X^⊤WX)⁻¹]_j[(X^⊤WX)⁻¹]_j^⊤/[(X^⊤WX)⁻¹]_jj.

Mixed adjustments

The results in Ioannis Kosmidis, Kenne Pagui, and Sartori (2020) can be used to show that if then the solution of the adjusted score equations s_β + A_β = 0_p and s_ϕ + A_ϕ = 0 results in estimators β̃ with mean bias of small asymptotic order than the maximum likelihood estimator and ϕ̃ with median bias of smaller asymptotic order than the maximum likelihood estimator.

Maximum penalized likelihood with powers of Jeffreys prior as penalty

The likelihood penalized by a power of the Jeffreys prior |i_ββ|^a|i_ϕϕ|^a a > 0 can be maximized by solving the adjusted score equations s_β + A_β = 0_p and s_ϕ + A_ϕ = 0 with where ρ = (ρ₁, …, ρ_n)^T with ρ_i = h_i{2d_i′/(d_iw_i) − v_i′d_i/(v_iw_i)}.

Fitting algorithm in `brglmFit`

brglmFit() implements a quasi Fisher scoring procedure for solving the adjusted score equations s_β + A_β = 0_p and s_ϕ + A_ϕ = 0. The iteration consists of an outer loop and an inner loop that implements step-halving. The algorithm is as follows:

Input

s_β, i_ββ, A_β
s_ϕ, i_ϕϕ, A_ϕ
Starting values β⁽⁰⁾ and ϕ⁽⁰⁾
ϵ > 0: tolerance for the L^∞ norm of the search direction before reporting convergence
M: maximum number of halving steps that can be taken

Output

β̃, ϕ̃

Iteration

Initialize outer loop

k ← 0
υ_β⁽⁰⁾ ← {i_ββ(β⁽⁰⁾, ϕ⁽⁰⁾)}⁻¹{s_β(β⁽⁰⁾, ϕ⁽⁰⁾) + A_β(β⁽⁰⁾, ϕ⁽⁰⁾)}
υ_ϕ⁽⁰⁾ ← {i_ϕϕ(β⁽⁰⁾, ϕ⁽⁰⁾)}⁻¹{s_ϕ(β⁽⁰⁾, ϕ⁽⁰⁾) + A_ϕ(β⁽⁰⁾, ϕ⁽⁰⁾)}

Initialize inner loop

m ← 0
b^(m) ← β^(k)
f^(m) ← ϕ^(k)
v_β^(m) ← υ_β^(k)
v_ϕ^(m) ← υ_ϕ^(k)
d ← ||(v_β^(m), v_ϕ^(m))||_∞

Update parameters

b^(m + 1) ← b^(m) + 2^−mv_β^(m)
f^(m + 1) ← f^(m) + 2^−mv_ϕ^(m)

Update direction

v_β^(m + 1) ← {i_ββ(b^(m + 1), f^(m + 1))}⁻¹{s_β(b^(m + 1), f^(m + 1)) + A_β(b^(m + 1), f^(m + 1))}
v_ϕ^(m + 1) ← {i_ϕϕ(b^(m + 1), f^(m + 1))}⁻¹{s_ϕ(b^(m + 1), f^(m + 1)) + A_ϕ(b^(m + 1), f^(m + 1))}

Continue or break halving within inner loop

if m + 1 < M and ||(v_β^(m + 1), v_ϕ^(m + 1))||_∞ > d

14.1. m ← m + 1

14.2. GO TO 10
else

15.1. β^(k + 1) ← b^(m + 1)

15.2. ϕ^(k + 1) ← f^(m + 1)

15.3. υ_β^(k + 1) ← v_b^(m + 1)

15.4. υ_ϕ^(k + 1) ← v_f^(m + 1)

Continue or break outer loop

if k + 1 < K and ||(υ_β^(k + 1), υ_ϕ^(k + 1))||_∞ > ϵ

16.1 k ← k + 1

16.2. GO TO 4
else

17.1. β̃ ← β^(k + 1)

17.2. ϕ̃ ← ϕ^(k + 1)

17.3. STOP

Notes

For K = M = 1, β⁽⁰⁾ = β̂ and ϕ⁽⁰⁾ = ϕ̂, the above iteration computes the bias-corrected estimates proposed in Cordeiro and McCullagh (1991). This is achieved when the brglmFit() function is called with type = "correction" (see ?brglmFit).
The mean-bias reducing adjusted score functions are solved when the brglmFit() function is called with type = "AS_mean", and the median-bias reducing adjusted score functions with type = "AS_median" (see ?brglmFit). Estimation using mixed adjustments is through type = "AS_mixed". type = "MPL_Jeffreys" does maximum penalized likelihood with a power of the Jeffreys prior as penalty.
The steps where ϕ and the ϕ direction are updated are ignored for generalized linear models with known dispersion parameter, like in models with binomial and Poisson responses. Also, in that case, v_ϕ^(.) and υ_ϕ^(.) in steps 9, 14 and 16 are set to zero.
The implementation of the adjusted score functions requires ready implementations of d²μ_i/dη_i², a′(.), a″(.) and a‴(.). The enrichwith R package is used internally to enrich the base family and link-glm objects with implementations of those functions (see ?enrich.family and ?enrich.link-glm).
The above iteration can be used to implement a variety of additive adjustments to the score function, by supplying the algorithm with appropriate adjustment functions A_β and A_ϕ

Contributions to this vignette

The first version of the vignette has been written by Ioannis Kosmidis. Eugene Clovis Kenne Pagui and Nicola Sartori contributed the first version of the section “Median bias-reducing adjusted score functions”, and Ioannis Kosmidis brought the expressions for the median bias-reducing adjustments in the reduced form that is shown above and is implemented in brglmFit().

Ioannis Kosmidis, Kenne Pagui, and Sartori (2020) provides more details about mean and median bias reduction in generalized linear models.

Citation

If you found this vignette or brglm2, in general, useful, please consider citing brglm2 and the associated paper. You can find information on how to do this by typing citation("brglm2").

References

Albert, A., and J. A. Anderson. 1984. “On the Existence of Maximum Likelihood Estimates in Logistic Regression Models.” Biometrika 71 (1): 1–10.

Cordeiro, G. M., and P. McCullagh. 1991. “Bias Correction in Generalized Linear Models.” Journal of the Royal Statistical Society, Series B: Methodological 53 (3): 629–43.

Firth, D. 1993. “Bias Reduction of Maximum Likelihood Estimates.” Biometrika 80 (1): 27–38.

Kenne Pagui, E. C., A. Salvan, and N. Sartori. 2017. “Median Bias Reduction of Maximum Likelihood Estimates.” Biometrika 104: 923–38. https://doi.org/10.1093/biomet/asx046.

Kosmidis, I. 2014. “Bias in Parametric Estimation: Reduction and Useful Side-Effects.” Wiley Interdisciplinary Reviews: Computational Statistics 6 (3): 185–96. https://doi.org/10.1002/wics.1296.

Kosmidis, I., and D. Firth. 2009. “Bias Reduction in Exponential Family Nonlinear Models.” Biometrika 96 (4): 793–804. https://doi.org/10.1093/biomet/asp055.

———. 2010. “A Generic Algorithm for Reducing Bias in Parametric Estimation.” Electronic Journal of Statistics 4: 1097–1112. https://doi.org/10.1214/10-EJS579.

Kosmidis, Ioannis, Euloge Clovis Kenne Pagui, and Nicola Sartori. 2020. “Mean and Median Bias Reduction in Generalized Linear Models.” Statistics and Computing 30: 43–59. https://doi.org/10.1007/s11222-019-09860-6.

Bias reduction in generalized linear models