Numerical Methods

Localized Metropolis Algorithm

The efficiency of a Markov chain Monte Carlo (MCMC) algorithm depends on a few factors: the calculation of the objective function, or the logarithm of the joint likelihood, and the quality of the proposal distribution. The localized Metropolis algorithm (LMA) focuses on improving the quality of the proposal distribution by locating the local maximum of the full conditional distribution. Most developments in Metropolis-Hastings algorithm focus on constructing an efficient normal proposal distribution. If the parameter of interest’s support is not $\mathbb{R}$, there is always a suitable transformation to $\mathbb{R}$. The LMA also assumes that $\mathrm{supp} (\theta)=\mathbb{R}$.

To simplify the notation, let $\theta$ be the parameter of interest and let $p(\theta)$ be the density of the full conditional distribution. The LMA proceeds as follows:

  1. Let $\theta^{(t)}$ be the value of $\theta$ at the $t$th iteration.
  2. Compute $\widehat{\theta}=\arg\max_\theta \log p(\theta)$.
  3. Compute the curvature: \[ \widehat{s}^{-2} = -\left.\dfrac{\mathrm{d}^2 \log p(\theta)}{\mathrm{d}\theta^2}\right|_{\theta=\widehat{\theta}}. \]
  4. Draw $\theta^* \sim \mathcal{N}(\widehat{\theta},\widehat{s}^2)$.
  5. Accept $\theta^*$ with probability \[ \alpha(\theta^{(t)}\to \theta^*) = \min\left\{1, \dfrac{p(\theta^*)\phi((\theta^{(t)}-\widehat{\theta})/\widehat{s})}{p(\theta^{(t)})\phi((\theta^*-\widehat{\theta})/\widehat{s})} \right\}, \] where $\phi(\cdot)$ is the density of the standard normal distribution. The evaluation of the second derivative in Step 3 is generally analytically intractable. As a workaround, we fit a quadratic curve around the mode $\widehat{\theta}$ of $\log p(\theta)$. First, select a sample of evaluation points of size $S$, $(\theta_1,\theta_2,\ldots,\theta_S)$ around $\widehat{\theta}$, and evaluate the corresponding function values $l_s = \log p(\theta_s)$ for $s=1,\ldots,S$. Usually, the determination of the evaluation points is conducted through fixing a step size (or perturbation) $\epsilon$ and taking steps back and forth from $\widehat{\theta}$. That is, \[ \theta_s = \widehat{\theta} + (\lfloor S/2\rfloor - s)\epsilon,\;s=1,\ldots,S. \] With $\{(\theta_1,l_1),\ldots,(\theta_S,l_S)\}$, a second-degree polynomial regression model is fitted by solving the following minimization problem: \[ \min_{\beta_0,\beta_1,\beta_2} \sum_{s=1}^S (l_s - \beta_0 - \beta_1 \theta_s - \beta_2 \theta_s^2)^2. \] The variance of the proposal normal density can be approximated by $\widehat{s}^2 \approx -1/(2\widehat{\beta}_2)$.

Robust Adaptive Metropolis Algorithm

The current value $\theta^{(t)}$ fixed as the mean of the normal proposal density, the size and the orientation of the spatial jump are determined by the variance, or covariance matrix in the multivariate cases, reducing the issue of finding an effective proposal distribution to constructing a desirable covariance matrix.

The robust adaptive Metropolis (RAM) algorithm is by nature adaptive, whose covariance matrix is designed to evolve to be optimal by “learning the properties of the target distribution on the fly”. The RAM algorithm is defined iteratively as follows:

  1. Draw a candidate parameter through $\theta^* \gets \theta^{(t)} + S^{(t)}u^{(t)}$ where $u^{(t)} \sim q$ where $q$ is a spherically symmetric standard distribution, and $u^{(t)}$ is a random vector.
  2. Accept $\theta^*$ with probability \[ \alpha(\theta^{(t)}\to\theta^*) = \min\left\{1, \dfrac{p(\theta^*)}{p(\theta^{(t)})} \right\}, \] where $p$ denotes the density of the full conditional distribution.
  3. Update the lower-triangular matrix $S^{(t)}$ by solving the following equation: \[ (S^{(t+1)})(S^{(t+1)})^\prime = S^{(t)}\left(I + \eta^{(t)}\left[\alpha(\theta^{(t)}\to \theta^*) - \alpha_* \right] \dfrac{(u^{(t)})(u^{(t)})^\prime}{\|u^{(t)}\|^{2}} \right)(S^{(t)})^\prime, \] where $I$ is the identity matrix, $\eta^{(t)}$ is the step size sequence, and $\alpha_*$ is the target acceptance rate. In metapack, $\eta^{(t)}\gets t^{-\gamma}$ where $\gamma \in (1/2,1]$ as advised in Vihola (2012), and the target acceptance rate $\alpha_*$ is set to 0.234 following Roberts, Gelman, Gilks et al. (1997).

Delayed Rejection

The Metropolis-Hastings algorithm is a powerful tool but can be problematic if too many candidates are rejected. It is tempting to continue generating candidates until acceptance to increase the acceptance probability. Although this intuition about the so-called delayed rejection (DR) is correct, it should be implemented carefully to satisfy the detailed balance condition (See Tierney and Mira (1999); Mira et al.; Green and Mira (2001)). The general DR algorithm withholds the rejection $m$ times so that the MCMC algorithm can make a move with higher probability than without delayed rejection. For this section only, the parameter is denoted by latin letters, for which $x$ denotes the current value of the parameter. An appealing case of the DR algorithm is introduced in Green and Mira (2001): the two-stage random walk delayed rejection algorithm. This method proposes a candidate $y_1$ in the first stage from an isotrpoic density $q_1(x, y_1)=q_1(y_1-x)$ where $q_1(z)=q_1(-z)$. The candidate is accepted with probability \[ \alpha_1(x\to y_1) = \min\left(1, \pi(y_1)/\pi(x) \right), \] where $\pi(\cdot)$ is the target function proportional to the density of the target density up to a multiplicative constant.If $y_1$ is rejected, a second candidate $y_2$ is drawn from a different (or possibly the same) isotropic density $q_2(x,y_1,y_2)=q_2(y_2-x)$ where $q_2(z)=q_2(-z)$. The second candidate $y_2$ is accepted with probability \[ \alpha_2((x,y_1)\to y_2) = \min\left\{1, \dfrac{\pi(y_2)q_1(y_1-y_2)[1-\pi(y_1)/\pi(y_2)]_+}{\pi(x)q_1(y_1-x)[1-\pi(y_1)/\pi(x)]_+} \right\}, \] where $a_+ = \max(a,0)$.

This special case of the DR algorithm blends well with the robust adaptive Metrpolis (RAM) algorithm since the RAM algorithm has an symmetric and isotropic density and counts as a random-walk Metropolis-Hastings algorithm.

Double Exponential (DE) Quadrature

The formulation of the DE quadrature starts with integrals of the form \[ \mathcal{I} = \int_{-1}^1 f(x)\,\mathrm{d}x. \] The DE quadrature then assumes $x = \phi(t)$, mapping $x \in (-1,1)$ onto the entire real line $t \in (-\infty,\infty)$, where $\phi$ is analytic over $t\in(-\infty, \infty)$, $\phi(-\infty)=-1$, and $\phi(\infty)=1$: \[ \mathcal{I} = \int_{-\infty}^\infty g(t)\,\mathrm{d}t,\; g(t)=f(\phi(t))\phi^\prime(t). \] It is known that integration over the whole real line whose integrand exponentially decays as $|t|\to\infty$ is most efficiently approximated through the (equidistant) trapezoidal rule, also known as Sinc quadrature (Okayama et al. (2013); Trefethen et al. (2014)). The approximated integral for $\mathcal{I}$ with the trapezoidal rule of step size $h$ becomes \[ \widehat{\mathcal{I}}_h^n = \sum_{i=-n}^{n}hf(\phi(t_i))\phi^\prime(t_i),\; t_i = ih, \] which involves $2n+1$ function evaluations. If the integral $\mathcal{I}$ is over the whole real line, Takahasi and Mori (1973) proposed \[ \phi(t) = \tanh\left(\dfrac{\pi}{2}\sinh t \right). \] The rationale behind this transformation lies in the fact that \[ \phi^\prime(t) = \dfrac{\frac{\pi}{2}\cosh t}{\cosh^2(\frac{\pi}{2}\sinh t)} \in O\left(\exp\left(-\dfrac{\pi}{2}\exp|t| \right) \right)\quad |t|\to\infty. \] The fast decay of the derivative guarantees that the endpoint singularities will vanish and that it will lend itself to trapezoidal rule as given by the Euler-Maclaurin summation formula. The name Double Exponential quadrature originates from this double exponential decay of the derivative $\phi^\prime$. If function of interest has a range of $(0,\infty)$, i.e., \[ \mathcal{I} = \int_0^\infty g(t)\,\mathrm{d}t,\; g(t)=f(\phi(t))\phi^\prime(t), \] the DE rule yields $\phi(t) = \exp(\pi \sinh t)$. For more details on the discovery of the quadrature, refer to Mori (2005).

References