Select parameters from a single (sampling) iteration of the Markov chain

Using the selected parameters and the model, simulate a data set with the same size (number of observations/variables)

From the simulated data set, calculate selected summary statistics (e.g. mean)

Repeat steps 1-3 for a fixed number of iterations (perhaps across the whole chain)

When done, compare position of observed summary statistics to that of the distribution of summary statitsics from simulated data sets (predictive distribution)

Example PPMC

For our model, we have one observed variable that is in the model (\(Y_p\))

Note, the observations in \(\textbf{X}\) are not modeled, so we do not examine these

First, let’s assemble the posterior draws:

# here, we use format = "draws_matrix" to remove the draws from the array format they default toposteriorSample = model06_Samples$draws(variables =c("beta", "sigma"), format ="draws_matrix")posteriorSample

Relative model fit: Used to compare two (or more) competing models

In non-Bayesian models, Information Criteria are often used to make comparisons

AIC, BIC, etc.

The model with the lowest index is the model that fits best

Bayesian model fit is similar

Uses an index value

The model with the lowest index is the model that fits best

Relative Model Fit

Recent advances in Bayesian model fit use indices that are tied to making cross-validation predictions:

Fit model leaving one observation out

Calculate statistics related to prediction (for instance, log-likelihood of that observation conditional on model parameters)

Do for all observations

Newer Bayesian indices try to mirror these leave-one-out predictions (but approximate these due to time constraints)

Bayesian Model Fit Indices

Way back when (late 1990s and early 2000s), the Deviance Information Criterion was used for relative Bayesian model fit comparisons

\[\text{DIC} = p_D + \overline{D\left(\theta\right)},\] where the estimated number of parameters is:

\[p_D = \overline{D\left(\theta\right)} - D\left(\bar{\theta}\right), \] and where

\[ D\left( \theta\right) = -2 \log \left(p\left(y|\theta\right)\right)+C\] C is a constant that cancels out when model comparisons are made

Bayesian Model Fit Indices

Here,

\(\overline{D\left(\theta\right)}\) is the average log likelihood of the data (\(y\)) given the parameters (\(\theta\)) computed across all samples

\(D\left(\bar{\theta}\right)\) is the log likelihood of the data (\(y\)) computed at the average of the parameters (\(\bar{\theta}\)) computed across all samples

Newer Methods

The DIC has fallen out of favor recently

Has issues when parameters are discrete

Not fully Bayesian (point estimate of average of parameter values)

Can give negative values for estimated numbers of parameters in a model

WAIC (Widely applicable or Watanabe-Akaike information criterion, Watanabe, 2010) corrects some of the problems with DIC:

LOO: Leave One Out via Pareto Smoothed Importance Sampling

More recently, approximations to LOO have gained popularity

LOO via Pareto Smoothed Importance Sampling attempts to approximate the process of leave-one-out cross-validation using a sampling based-approach

Gives a finite-sample approximation

Implemented in Stan

Can quickly compare models

Gives warnings when it may be less reliable to use

The details are very technical, but are nicely compiled in Vehtari, Gelman, and Gabry (2017) (see preprint on arXiv at https://arxiv.org/pdf/1507.04544.pdf)

Big picture:

Can compute WAIC and/or LOO via Stan’s loo package

LOO also gives variability for model comparisons

Model Comparisons via Stan

The core of WAIC and LOO is the log-likelihood of the data conditional on the model parameters, as calculated for each observation in the sample of the model:

We can calculate this using the generated quantities block in Stan:

generated quantities{// general quantities used below: vector[N] y_pred; y_pred = X*beta; // predicted value (conditional mean)// WAIC and LOO for model comparison array[N] real log_lik;for (person in1:N){ log_lik[person] =normal_lpdf(y[person] | y_pred[person], sigma); }}

WAIC in Stan

Using the loo package, we can calculate WAIC with the waic() function

waic(model06_Samples$draws("log_lik"))

Computed from 8000 by 30 log-likelihood matrix.
Estimate SE
elpd_waic -109.4 6.7
p_waic 6.6 2.6
waic 218.9 13.5
4 (13.3%) p_waic estimates greater than 0.4. We recommend trying loo instead.

Here:

elpd_waic is the expected log pointwise predictive density for WAIC

p_waic is the WAIC calculation of number of model parameters (a penalty to the likelihood for more parameters)

waic is the WAIC index used for model comparisons (lowest value is best fitting; -2*elpd_waic)

WAIC in Stan

Note: WAIC needs a “log_lik” variable in the model analysis to be calculated correctly

That is up to you to provide via generated quantities

LOO in Stan

Using the loo package, we can calculate the PSIS-LOO using cmdstanr objects with the $loo() function:

model06_Samples$loo()

Computed from 8000 by 30 log-likelihood matrix.
Estimate SE
elpd_loo -110.0 7.0
p_loo 7.2 2.8
looic 220.0 14.0
------
MCSE of elpd_loo is NA.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.4, 0.8]).
Pareto k diagnostic values:
Count Pct. Min. ESS
(-Inf, 0.7] (good) 29 96.7% 314
(0.7, 1] (bad) 1 3.3% <NA>
(1, Inf) (very bad) 0 0.0% <NA>
See help('pareto-k-diagnostic') for details.

LOO in Stan

Note: LOO needs a “log_lik” variable in the model analysis to be calculated correctly

That is up to you to provide!!

If named “log_lik” then you don’t need to provide the function an argument

Notes about LOO

Computed from 8000 by 30 log-likelihood matrix.
Estimate SE
elpd_loo -110.0 7.0
p_loo 7.2 2.8
looic 220.0 14.0
------
MCSE of elpd_loo is NA.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.4, 0.8]).
Pareto k diagnostic values:
Count Pct. Min. ESS
(-Inf, 0.7] (good) 29 96.7% 314
(0.7, 1] (bad) 1 3.3% <NA>
(1, Inf) (very bad) 0 0.0% <NA>
See help('pareto-k-diagnostic') for details.

Here:

elpd_loo is the expected log pointwise predictive density for LOO

p_loo is the LOO calculation of number of model parameters (a penalty to the likelihood for more parameters)

looic is the LOO index used for model comparisons (lowest value is best fitting; -2*elpd_loo)

Also, you get a warning if to many of your sampled values have bad diagnostic values

To compare models with WAIC, take the one with the lowest value:

waic(model06_Samples$draws("log_lik"))

Computed from 8000 by 30 log-likelihood matrix.
Estimate SE
elpd_waic -109.4 6.7
p_waic 6.6 2.6
waic 218.9 13.5
4 (13.3%) p_waic estimates greater than 0.4. We recommend trying loo instead.

waic(model06b_Samples$draws("log_lik"))

Computed from 8000 by 30 log-likelihood matrix.
Estimate SE
elpd_waic -370.3 30.9
p_waic 17.5 4.2
waic 740.5 61.9
12 (40.0%) p_waic estimates greater than 0.4. We recommend trying loo instead.

Comparing Models with LOO

To compare models with LOO, the loo package has a built-in comparison function:

# comparing two models with loo:loo_compare(list(uniformative = model06_Samples$loo(), informative = model06b_Samples$loo()))