--- title: "Enriching `family` objects: exponential family of distirbutions" author: "[Ioannis Kosmidis](http://www.ikosmidis.com)" date: "`r Sys.Date()`" output: rmarkdown::html_vignette bibliography: enrichwith.bib vignette: > %\VignetteIndexEntry{enriching family objects} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- # Introduction The [**enrichwith**](https://github.com/ikosmidis/enrichwith) R package provides the `enrich` method to enrich list-like R objects with new, relevant components. The resulting objects preserve their class, so all methods associated with them still apply. This vignette is a demo of the available enrichment options for `family` objects, focusing on objects that correspond to members of the exponential family of distributions. # Exponential family and `family` objects `family` objects specify characteristics of the models used by functions such as `glm`. The families implemented in the `stats` package include `binomial`, `gaussian`, `Gamma`, `inverse.gaussian`, and `poisson`, which obvious corresponding distributions. These distributions are all special cases of the exponential family of distributions with probability mass or density function of the form $$ f_{Y}(y) = \exp\left\{\frac{y \theta - b(\theta) - c_1(y)}{\phi/m} - \frac{1}{2}a\left(-\frac{m}{\phi}\right) + c_2(y) \right\} $$ for some sufficiently smooth functions $b(.)$, $c_1(.)$, $a(.)$ and $c_2(.)$, and a fixed weight $m$. The expected value and the variance of $Y$ are then \begin{align*} E(Y) & = \mu = b'(\theta) \\ Var(Y) & = \frac{\phi}{m}b''(\theta) = \frac{\phi}{m}V(\mu) \end{align*} where $V(\mu)$ and $\phi$ are the variance function and the dispersion parameter, respectively. Below we list the characteristics of the distributions supported by `family` objects. #### [Normal](https://en.wikipedia.org/wiki/Normal_distribution) with mean $\mu$ and variance $\phi/m$ $\theta = \mu$, $\displaystyle b(\theta) = \frac{\theta^2}{2}$, $\displaystyle c_1(y) = \frac{y^2}{2}$, $\displaystyle a(\zeta) = -\log(-\zeta)$, $\displaystyle c_2(y) = -\frac{1}{2}\log(2\pi)$ #### [Binomial](https://en.wikipedia.org/wiki/Binomial_distribution) with index $m$ and probability $\mu$ $\displaystyle \theta = \log\frac{\mu}{1- \mu}$, $\displaystyle b(\theta) = \log(1 + e^\theta)$, $\displaystyle \phi = 1$, $\displaystyle c_1(y) = 0$, $\displaystyle a(\zeta) = 0$, $\displaystyle c_2(y) = \log{m\choose{my}}$ #### [Poisson](https://en.wikipedia.org/wiki/Poisson_distribution) with mean $\mu$ $\displaystyle \theta = \log\mu$, $\displaystyle b(\theta) = e^\theta$, $\displaystyle \phi = 1$, $\displaystyle c_1(y) = 0$, $\displaystyle a(\zeta) = 0$, $\displaystyle c_2(y) = -\log\Gamma(y + 1)$ #### [Gamma](https://en.wikipedia.org/wiki/Gamma_distribution) with mean $\mu$ and shape $1/\phi$ $\displaystyle \theta = -\frac{1}{\mu}$, $\displaystyle b(\theta) = -\log(-\theta)$, $\displaystyle c_1(y) = -\log y$, $\displaystyle a(\zeta) = 2 \log \Gamma(-\zeta) + 2 \zeta \log\left(-\zeta\right)$, $\displaystyle c_2(y) = -\log y$ #### [Inverse Gaussian](https://en.wikipedia.org/wiki/Inverse_Gaussian_distribution) with mean $\mu$ and variance $\phi\mu^3$ $\displaystyle \theta = -\frac{1}{2\mu^2}$, $\displaystyle b(\theta) = -\sqrt{-2\theta}$, $\displaystyle c_1(y) = \frac{1}{2y}$, $\displaystyle a(\zeta) = -\log(-\zeta)$, $\displaystyle c_2(y) = -\frac{1}{2}\log\left(\pi y^3\right)$ # Components in `family` objects `family` objects provide functions for the variance function (`variance`), a speficiation of deviance residuals (`dev.resids`) and the Akaike information criterion (`aic`). For example ```{r, echo = TRUE, eval = TRUE} inverse.gaussian()$dev.resids inverse.gaussian()$variance inverse.gaussian()$aic ``` # Enrichment options for `family` objects The **enrichwith** R package provides methods for the enrichment of `family` objects with a function that links the natural parameter $\theta$ with $\mu$, the function $b(\theta)$, the first two derivatives of $V(\mu)$, $a(\zeta)$ and its first four derivatives, and $c_1(y)$ and $c_2(y)$. To illustrate, let's write a function that reconstructs the densitities and probability mass functions from the components that result from enrichment ```{r, echo = TRUE, eval = TRUE} library("enrichwith") dens <- function(y, m = 1, mu, phi, family) { object <- enrich(family) with(object, { c2 <- if (family == "binomial") c2fun(y, m) else c2fun(y) exp(m * (y * theta(mu) - bfun(theta(mu)) - c1fun(y))/phi - 0.5 * afun(-m/phi) + c2) }) } ``` The following chunks test `dens` for a few distributions against the standard `d*` functions ```{r, echo = TRUE, eval = TRUE} ## Normal all.equal(dens(y = 0.2, m = 3, mu = 1, phi = 3.22, gaussian()), dnorm(x = 0.2, mean = 1, sd = sqrt(3.22/3))) ## Gamma all.equal(dens(y = 3, m = 1.44, mu = 2.3, phi = 1.3, Gamma()), dgamma(x = 3, shape = 1.44/1.3, 1.44/(1.3 * 2.3))) ## Inverse gaussian all.equal(dens(y = 0.2, m = 7.23, mu = 1, phi = 3.22, inverse.gaussian()), SuppDists::dinvGauss(0.2, nu = 1, lambda = 7.23/3.22)) ## Binomial all.equal(dens(y = 0.34, m = 100, mu = 0.32, phi = 1, binomial()), dbinom(x = 34, size = 100, prob = 0.32)) ```