Select parameters from a single (sampling) iteration of the Markov chain
Using the selected parameters and the model, simulate a data set with the same size (number of observations/variables)
From the simulated data set, calculate selected summary statistics (e.g. mean)
Repeat steps 1-3 for a fixed number of iterations (perhaps across the whole chain)
When done, compare position of observed summary statistics to that of the distribution of summary statitsics from simulated data sets (predictive distribution)
Example PPMC
For our model, we have one observed variable that is in the model (\(Y_p\))
Note, the observations in \(\textbf{X}\) are not modeled, so we do not examine these
First, let’s assemble the posterior draws:
# here, we use format = "draws_matrix" to remove the draws from the array format they default toposteriorSample = model06_Samples$draws(variables =c("beta", "sigma"), format ="draws_matrix")posteriorSample
Relative model fit: Used to compare two (or more) competing models
In non-Bayesian models, Information Criteria are often used to make comparisons
AIC, BIC, etc.
The model with the lowest index is the model that fits best
Bayesian model fit is similar
Uses an index value
The model with the lowest index is the model that fits best
Relative Model Fit
Recent advances in Bayesian model fit use indices that are tied to making cross-validation predictions:
Fit model leaving one observation out
Calculate statistics related to prediction (for instance, log-likelihood of that observation conditional on model parameters)
Do for all observations
Newer Bayesian indices try to mirror these leave-one-out predictions (but approximate these due to time constraints)
Bayesian Model Fit Indices
Way back when (late 1990s and early 2000s), the Deviance Information Criterion was used for relative Bayesian model fit comparisons
\[\text{DIC} = p_D + \overline{D\left(\theta\right)},\] where the estimated number of parameters is:
\[p_D = \overline{D\left(\theta\right)} - D\left(\bar{\theta}\right), \] and where
\[ D\left( \theta\right) = -2 \log \left(p\left(y|\theta\right)\right)+C\] C is a constant that cancels out when model comparisons are made
Bayesian Model Fit Indices
Here,
\(\overline{D\left(\theta\right)}\) is the average log likelihood of the data (\(y\)) given the parameters (\(\theta\)) computed across all samples
\(D\left(\bar{\theta}\right)\) is the log likelihood of the data (\(y\)) computed at the average of the parameters (\(\bar{\theta}\)) computed across all samples
Newer Methods
The DIC has fallen out of favor recently
Has issues when parameters are discrete
Not fully Bayesian (point estimate of average of parameter values)
Can give negative values for estimated numbers of parameters in a model
WAIC (Widely applicable or Watanabe-Akaike information criterion, Watanabe, 2010) corrects some of the problems with DIC:
LOO: Leave One Out via Pareto Smoothed Importance Sampling
More recently, approximations to LOO have gained popularity
LOO via Pareto Smoothed Importance Sampling attempts to approximate the process of leave-one-out cross-validation using a sampling based-approach
Gives a finite-sample approximation
Implemented in Stan
Can quickly compare models
Gives warnings when it may be less reliable to use
The details are very technical, but are nicely compiled in Vehtari, Gelman, and Gabry (2017) (see preprint on arXiv at https://arxiv.org/pdf/1507.04544.pdf)
Big picture:
Can compute WAIC and/or LOO via Stan’s loo package
LOO also gives variability for model comparisons
Model Comparisons via Stan
The core of WAIC and LOO is the log-likelihood of the data conditional on the model parameters, as calculated for each observation in the sample of the model:
We can calculate this using the generated quantities block in Stan:
generated quantities{// general quantities used below: vector[N] y_pred; y_pred = X*beta; // predicted value (conditional mean)// WAIC and LOO for model comparison array[N] real log_lik;for (person in1:N){ log_lik[person] =normal_lpdf(y[person] | y_pred[person], sigma); }}
WAIC in Stan
Using the loo package, we can calculate WAIC with the waic() function
waic(model06_Samples$draws("log_lik"))
Computed from 8000 by 30 log-likelihood matrix.
Estimate SE
elpd_waic -109.4 6.7
p_waic 6.6 2.6
waic 218.9 13.5
4 (13.3%) p_waic estimates greater than 0.4. We recommend trying loo instead.
Here:
elpd_waic is the expected log pointwise predictive density for WAIC
p_waic is the WAIC calculation of number of model parameters (a penalty to the likelihood for more parameters)
waic is the WAIC index used for model comparisons (lowest value is best fitting; -2*elpd_waic)
WAIC in Stan
Note: WAIC needs a “log_lik” variable in the model analysis to be calculated correctly
That is up to you to provide via generated quantities
LOO in Stan
Using the loo package, we can calculate the PSIS-LOO using cmdstanr objects with the $loo() function:
model06_Samples$loo()
Computed from 8000 by 30 log-likelihood matrix.
Estimate SE
elpd_loo -110.0 7.0
p_loo 7.2 2.8
looic 220.0 14.0
------
MCSE of elpd_loo is NA.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.4, 0.8]).
Pareto k diagnostic values:
Count Pct. Min. ESS
(-Inf, 0.7] (good) 29 96.7% 314
(0.7, 1] (bad) 1 3.3% <NA>
(1, Inf) (very bad) 0 0.0% <NA>
See help('pareto-k-diagnostic') for details.
LOO in Stan
Note: LOO needs a “log_lik” variable in the model analysis to be calculated correctly
That is up to you to provide!!
If named “log_lik” then you don’t need to provide the function an argument
Notes about LOO
Computed from 8000 by 30 log-likelihood matrix.
Estimate SE
elpd_loo -110.0 7.0
p_loo 7.2 2.8
looic 220.0 14.0
------
MCSE of elpd_loo is NA.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.4, 0.8]).
Pareto k diagnostic values:
Count Pct. Min. ESS
(-Inf, 0.7] (good) 29 96.7% 314
(0.7, 1] (bad) 1 3.3% <NA>
(1, Inf) (very bad) 0 0.0% <NA>
See help('pareto-k-diagnostic') for details.
Here:
elpd_loo is the expected log pointwise predictive density for LOO
p_loo is the LOO calculation of number of model parameters (a penalty to the likelihood for more parameters)
looic is the LOO index used for model comparisons (lowest value is best fitting; -2*elpd_loo)
Also, you get a warning if to many of your sampled values have bad diagnostic values
To compare models with WAIC, take the one with the lowest value:
waic(model06_Samples$draws("log_lik"))
Computed from 8000 by 30 log-likelihood matrix.
Estimate SE
elpd_waic -109.4 6.7
p_waic 6.6 2.6
waic 218.9 13.5
4 (13.3%) p_waic estimates greater than 0.4. We recommend trying loo instead.
waic(model06b_Samples$draws("log_lik"))
Computed from 8000 by 30 log-likelihood matrix.
Estimate SE
elpd_waic -370.3 30.9
p_waic 17.5 4.2
waic 740.5 61.9
12 (40.0%) p_waic estimates greater than 0.4. We recommend trying loo instead.
Comparing Models with LOO
To compare models with LOO, the loo package has a built-in comparison function:
# comparing two models with loo:loo_compare(list(uniformative = model06_Samples$loo(), informative = model06b_Samples$loo()))