bayes.parobs is the function is based on Yao, H., Kim, S., Chen, M. H., Ibrahim, J. G., Shah, A. K., & Lin, J. (2015). Bayesian inference for multivariate meta-regression with a partially observed within-study sample covariance matrix. Journal of the American Statistical Association, 110(510), 528-544., but provides more flexibility in the modeling as well as in the MCMC specification. This page serves as a self-contained tutorial for bayes.parobs.


We relay the models introduced in Bayesian inference for multivariate meta-regression with a partially observed within-study sample covariance matrix. The following expository paragraphs will proceed without defining notations for standard symbols such as $\boldsymbol{X}$ being the design matrix for fixed effects and $\boldsymbol{\gamma}_k$ being the random effects for the $k$th trial.

The function bayes.parobs assumes $K$ randomized controlled trials, all of which include $T$ treatment arms. The model starts from assuming $\boldsymbol{y}_{itk} = \boldsymbol{X}_{tk}\boldsymbol{\beta} + \boldsymbol{W}_{tk}\boldsymbol{\gamma}_k + \boldsymbol{\epsilon}_{itk}$ for the $i$th participant and $t$th treatment of the $k$th trial. $\boldsymbol{\epsilon}_{itk} = (\epsilon_{itk1},\ldots,\epsilon_{itkJ})^\prime \sim \mathcal{N}(\boldsymbol{0},\Sigma_{tk})$ where $\Sigma_{tk} \in \mathbf{R}^{J\times J}$, $\boldsymbol{X}_{tk} = \mathrm{blockdiag}(\boldsymbol{x}_{tk1}^\prime,\boldsymbol{x}_{tk2}^\prime,\ldots,\boldsymbol{x}_{tkJ}^\prime)$, $\boldsymbol{\beta}=(\boldsymbol{\beta}_1^\prime,\ldots,\boldsymbol{\beta}_{J}^\prime)^\prime$, $\boldsymbol{W}_{tk}=\mathrm{blockdiag}(\boldsymbol{w}_{tk1}^\prime, \boldsymbol{w}_{tk2}^\prime,\ldots,\boldsymbol{w}_{tkJ}^\prime)$, and $\boldsymbol{\gamma}_k = (\boldsymbol{\gamma}_{k1}^\prime, \boldsymbol{\gamma}_{k2}^\prime, \ldots,\boldsymbol{\gamma}_{kJ}^\prime)^\prime$. Taking the sample mean across $i=1,\ldots,n_{tk}$ for every $(t,k)$ will result in the following aggregate meta-analysis model: \[ \overline{\boldsymbol{y}}_{\cdot tk} = \boldsymbol{X}_{tk}\boldsymbol{\beta} + \boldsymbol{W}_{tk}\boldsymbol{\gamma}_{k}+\overline{\boldsymbol{\epsilon}}_{\cdot tk}, \] where $\overline{\boldsymbol{\epsilon}}_{\cdot tk} \sim \mathcal{N}(\boldsymbol{0},\Sigma_{tk}/n_{tk})$. We also have the model for the sample covariance matrix, i.e., $(n_{tk}-1)S_{tk} \sim \mathcal{W}_{n_{tk}-1}(\Sigma_{tk})$ where $\mathcal{W}_{\nu}(\Sigma)$ denotes the Wishart distribution with $\nu$ degrees of freedom and a scale matrix $\Sigma$. By the well-known Basu’s theorem, $\overline{\boldsymbol{y}}_{\cdot tk}$ and $S_{tk}$ are statistically independent.

To accommodate the possible distinction between a first-line and a second-line therapy, a binary grouping variable $u_{tk}$ is introduced: \[ \overline{y}_{\cdot tkj} = \boldsymbol{x}_{tkj}^\prime\boldsymbol{\beta} + (1-u_{tk})\boldsymbol{z}_{tkj}^\prime\boldsymbol{\gamma}_{kj}^0 + u_{tk}\boldsymbol{z}_{tk}^\prime\boldsymbol{\gamma}_{kj}^1 + \overline{\epsilon}_{\cdot tkj}. \] The random effects assume a normal distribution, i.e., $\boldsymbol{\gamma}_{kj}^l \sim \mathcal{N}(\boldsymbol{\gamma}_j^{l*},\Omega_j^l)$ and $(\Omega_j^l)^{-1} \sim \mathcal{W}_{d_{0j}}(\Omega_{0j})$ for $l\in{0,1}$, where $\boldsymbol{\gamma}_{kj}^0$ and $\boldsymbol{\gamma}_{kj}^1$ are independent. By stacking the vectors, $\boldsymbol{\gamma}_k^l = (\boldsymbol{\gamma}_{k1}^l,\ldots,\boldsymbol{\gamma}_{kJ}^l)^\prime \sim \mathcal{N}(\boldsymbol{\gamma}^{l*},\Omega^l)$ for which $\boldsymbol{\gamma}^{l*}=(\boldsymbol{\gamma}_1^{l*},\ldots,\boldsymbol{\gamma}_{J}^{l*})^\prime$. With the noncentered parametrization, $\boldsymbol{\gamma}_{k,o}^l := \boldsymbol{\gamma}_k^l - \boldsymbol{\gamma}^{l*}$. Now, denote $\boldsymbol{W}_{tk}^* = [(1-u_{tk})\boldsymbol{W}_{tk}, u_{tk}\boldsymbol{W}_{tk}]$, $\boldsymbol{X}_{tk}^* = [\boldsymbol{X}_{tk}, \boldsymbol{W}_{tk}^]$ and $\boldsymbol{\theta} = (\boldsymbol{\beta}^\prime,{{\boldsymbol{\gamma}^0}^}^\prime, {{\boldsymbol{\gamma}^1}^*}^\prime)^\prime$,

\[ \overline{\boldsymbol{y}}_{\cdot tk} = \boldsymbol{X}_{tk}^*\boldsymbol{\theta} + \boldsymbol{W}_{tk}^* \boldsymbol{\gamma}_{k,o} + \overline{\boldsymbol{\epsilon}}_{\cdot tk}, \] where $\boldsymbol{\gamma}_{k,o} = ((\boldsymbol{\gamma}_{k,o}^0)^\prime, (\boldsymbol{\gamma}_{k,o}^1)^\prime)^\prime$. If there is no grouping of first-line and second-line therapies, $u_{tk}=0$ for all $(t,k)$ will eliminate the distinction.

The prior specification of the covariance matrix $\Sigma_{tk}$ is key to bayes.parobs. The main objective is to recover the unreported correlation information in $S_{tk}$. bayes.parobs accepts integers from one to five as the value for the argument fmodel. Each number indicates a different prior for $\Sigma_{tk}$. The objects enclosed in parentheses at the end of every bullet point are the hyperparameters associated with each model.

  • fmodel=1 - $\Sigma_{tk} = \mathrm{diag}(\sigma_{tk,11}^2,\ldots,\sigma_{tk,JJ}^2)$ where $\sigma_{tk,jj}^2 \sim \mathcal{IG}(a_0,b_0)$ and $\mathcal{IG}(a,b)$ is the inverse-gamma distribution. This specification is useful if the user does not care about the correlation recovery. (c0, dj0, a0, b0, Omega0)
  • fmodel=2 - $\Sigma_{tk}=\Sigma$ for every combination of $(t,k)$ and $\Sigma^{-1}\sim\mathcal{W}_{s_0}(\Sigma_0)$. This specification assumes that the user has prior knowledge that the correlation structure does not change across the arms included. (c0, dj0, s0, Omega0, Sigma0)
  • fmodel=3 - $\Sigma_{tk}=\Sigma_t$ and $\Sigma_t^{-1}\sim \mathcal{W}_{s_0}(\Sigma_0)$. This is a relaxed version of fmodel=2, allowing the correlation structure to differ across trials but forcing it to stay identical within a trial. (c0, dj0, s0, Omega0, Sigma0)
  • fmodel=4 - $\Sigma_{tk}=\boldsymbol{\delta}_{tk}\boldsymbol{\rho}\boldsymbol{\delta}_{tk}$ where $\boldsymbol{\delta}_{tk}=\mathrm{diag}(\Sigma_{tk,11}^{1/2},\ldots,\Sigma_{tk,JJ}^{1/2})$, and $\boldsymbol{\rho}$ is the correlation matrix. This specification allows the variances to vary across arms but requires that the correlations be the same. This is due to the lack of correlation information in the data, which would in turn lead to the nonidentifiability of the correlations if they were allowed to vary. However, this still is an ambitious model which permits maximal degrees of freedom in terms of variance and correlation estimation. (c0, dj0, a0, b0, Omega0)
  • fmodel=5 - The fifth model is hierarchical and thus may require more data than the others: $(\Sigma_{tk}^{-1}\mid \Sigma)\sim \mathcal{W}_{\nu_0}((\nu_0-J-1)^{-1}\Sigma^{-1})$ and $\Sigma \sim \mathcal{W}_{d_0}(\Sigma_0)$. $\Sigma_{tk}$ encodes the within-treatment-arm variation while $\Sigma$ captures the between-treatment-arm variation. The hierarchical structure allows the “borrowing of strength” across treatment arms. (c0, dj0, d0, nu0, Sigma0, Omega0)

The following prior distribution completes the model specification: $\boldsymbol{\theta} \sim \mathcal{N}(\boldsymbol{0},c_{0}\boldsymbol{I})$ where $\boldsymbol{I}$ indicates the identity matrix of a suitable dimension.


It is helpful to map each of the math symbols to its representation in the package. The following table gives a good example of the required input arguments as well as the formatting of the data:

Outcome SD XCovariate WCovariate Trial Treat Npt
$\overline{\boldsymbol{y}}_{\cdot 13}^\prime$ $\sqrt{\mathrm{diag}(S_{13})}$ $\boldsymbol{x}_{13}^\prime$ $\boldsymbol{w}_{13}^\prime$ 1 3 1000
$\overline{\boldsymbol{y}}_{\cdot 10}^\prime$ $\sqrt{\mathrm{diag}(S_{10})}$ $\boldsymbol{x}_{10}^\prime$ $\boldsymbol{w}_{20}^\prime$ 1 0 1000
$\overline{\boldsymbol{y}}_{\cdot 20}^\prime$ $\sqrt{\mathrm{diag}(S_{20})}$ $\boldsymbol{x}_{20}^\prime$ $\boldsymbol{w}_{20}^\prime$ 2 0 1000
$\vdots$ $\vdots$ $\vdots$ $\vdots$ $\vdots$ $\vdots$ $\vdots$

Notice that the $J$-dimensional aggregate response vector $\overline{\boldsymbol{y}}_{\cdot tk}$ is not unpacked, which would have otherwise added an additional column containing the subscript $j$ and bloated the data set even further. Instead, the package requires the vectors to be packaged into row vectors appropriately. The same principle applies to the standard deviations (SD), and covariate vectors (XCovariate, WCovariate). Npt stands for “number of participants per trial” but it contains the number of participants for the unique combination of $(t,k)$, i.e., each arm.

The remaining optional arguments are as follows:

  • mcmc - a list for MCMC specification. ndiscard is the number of burn-in iterations. nskip configures the thinning of the MCMC. For instance, if nskip=5, bayes.parobs will save the posterior every 5 iterations. nkeep is the size of the posterior sample. The total number of iterations will be ndiscard + nskip * nkeep.
  • group - a vector containing binary variables for $u_{tk}$. If not provided, bayes.parobs will assume that there is no grouping and set $u_{tk}=0$ for all $(t,k)$.
  • group_order - a vector of unique group labels. The first element will be assigned zero. If not provided, the numbering will default to an alphabetical/numerical order. group_order will take effect only if group is provided by the user.
  • prior - a list of hyperparameters. Despite $\boldsymbol{\theta}$ in every model, each fmodel, along with the group argument, requires a different set of hyperparameters. Refer to the itemized model specifications.
  • init - a list of initial values for the parameters to be sampled: theta, gamR, Omega, and Rho. The initial value for Rho will be effective only if fmodel=4.
  • control - a list of tuning parameters for the Metropolis-Hastings algorithm. Rho, R, and delta are sampled through either localized Metropolis algorithm or delayed rejection robust adaptive Metropolis algorithm. *_stepsize with the asterisk replaced with one of the names above specifies the stepsize for determining the sample evaluation points in the localized Metropolis algorithm. sample_Rho can be set to FALSE to suppress the sampling of Rho for fmodel=4. When sample_Rho is FALSE, $\boldsymbol{\rho}$ will be fixed using the value given by the init argument, which defaults to $\boldsymbol{\rho} = 0.5\boldsymbol{I}+0.5\boldsymbol{1}\boldsymbol{1}^\prime$ where $\boldsymbol{1}$ is the vector of ones.
  • scale_x - a Boolean indicating whether XCovariate should be scaled/standardized. The effect of setting this to TRUE is not limited to merely standardizing XCovariate. The following generic functions will scale the posterior sample of theta back to its original unit: plot, fitted, summary, and print.
  • Treat_order - a vector of unique treatments to be used for renumbering the Treat vector. The first element will be assigned treatment zero, potentially indicating placebo. If not provided, the numbering will default to an alphabetical/numerical order.
  • Trial_order - a vector of unique trials. The first element will be assigned zero. If not provided, the numbering will default to an alphabetical/numerical order.
  • verbose - a Boolean indicating whether to print the progress bar during the MCMC sampling.


metapack ships with a data set, cholesterol, consisting of 26 Merck-sponsored double-blind, randomized, active, or placebo-controlled clinical trials on patients with primary hypercholesterolemia.

  • Trial - trial identifiers
  • onstat - indicators for first-line and second-line therapies. onstat=1 implies that the patients in the corresponding arm were on statin prior to the trial. onstat=0 indicates the patients had not been on statin.
  • trt - a binary variable indicating the administration of statin or statin+ezetimibe.
  • bldlc - baseline low-density lipoprotein cholesterol (LDL-C).
  • bhdlc - baseline high-density lipoprotein cholesterol (LDL-C).
  • btg - baseline triglycerides (TG)
  • age - age in years.
  • white - the proportion of white participants.
  • male - the proportion of male participants.
  • dm - the proportion of participants with diabetes mellitus.
  • durat - the duration in weeks.
  • pldlc - mean percent difference in the low-density lipoprotein cholesterol
  • phdlc - mean percent difference in the high-density lipoprotein cholesterol
  • ptg - mean percent difference in the triglycerides
  • sdldl - the standard deviation of the percent difference in the low-density lipoprotein cholesterol
  • sdhdl - the standard deviation of the percent difference in the high-density lipoprotein cholesterol
  • sdtg - the standard deviation of the percent difference in the triglycerides
  • Npt - the number of participants for the arm of a trial

The cholesterol data can be preprocessed as follows prior to passing onto bayes.parobs:

Outcome <- model.matrix(~ pldlc + phdlc + ptg, data = cholesterol)
SD <- model.matrix(~ sdldl + sdhdl + sdtg, data = cholesterol)
Trial <- cholesterol$Trial
Treat <- cholesterol$trt
Npt <- cholesterol$Npt
XCovariate <- model.matrix(~ 0 + bldlc + bhdlc + btg + age + durat +
                         white + male + dm, data = cholesterol)
WCovariate <- model.matrix(~ trt, data = cholesterol)

Use set.seed to reproduce the output, and run the following code to fit the model. The user may specify fmodel.

fit <- bayes.parobs(Outcome, SD, XCovariate, WCovariate,
            Treat, Trial, Npt, fmodel, group = cholesterol$onstat,
            mcmc = list(ndiscard=20000, nkeep=20000),
            scale_x = TRUE, verbose = TRUE)