Bayesian analysis is all about estimating the posterior distribution
Example Data: https://stats.idre.ucla.edu/spss/library/spss-libraryhow-do-i-handle-interactions-of-continuous-andcategorical-variables/
The file DietData.csv contains data from 30 respondents who participated in a study regarding the effectiveness of three types of diets.
Variables in the data set are:
Now, your turn to answer questions:
WeightLB
) is appropriate as-is for such analysis or does it need transformed?Let’s play with models for data…
# center predictors for reasonable numbers
DietData$HeightIN60 = DietData$HeightIN-60
# full analysis model suggested by data:
FullModel = lm(formula = WeightLB ~ 1, data = DietData)
# examining assumptions and leverage of fit
# plot(FullModel)
# looking at ANOVA table
# anova(FullModel)
# looking at parameter summary
summary(FullModel)
Call:
lm(formula = WeightLB ~ 1, data = DietData)
Residuals:
Min 1Q Median 3Q Max
-62.00 -36.75 -24.00 49.00 98.00
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 171.000 9.041 18.91 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 49.52 on 29 degrees of freedom
\[\text{WeightLB}_p = \beta_0 + e_p,\] Where: \(e_p \sim N(0, \sigma^2_e)\)
Questions:
Like many compiled languages, Stan expects you to declare what type of data/parameters you are defining:
int
: Integer values (no decimals)real
: Floating point numbersvector
: A one-dimensional set of real valued numbersSometimes, additional definitions are provided giving the range of the variable (or restricting the set of starting values):
real<lower=0> sigma;
See: https://mc-stan.org/docs/reference-manual/data-types.html for more information
y ~ normal(beta0, sigma); // model for observed data
sigma ~ uniform(0, 100000); // prior for sigma
See: https://mc-stan.org/docs/functions-reference/index.html for more information
cmdstanr
and rstan
differ
cmdstanr
wants you to compile first, then run the Markov chainrstan
conducts compilation (if needed) then runs the Markov chainN
and a vector named y
cmdstanr
and rstan
cmdstanr
cmdstanr
, running the chain comes from the $sample()
function that is a part of the compiled program objectNext, we must determine if the chains converged to their posterior distribution
Two most common methods: visual in spection and Gelman-Rubin Potential Scale Reduction Factor (PSRF; quick reference)
Visual inspection
Gelman-Rubin PSRF (denoted with \(\hat{R}\))
# A tibble: 3 × 10
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 lp__ -129. -128. 1.03 0.737 -131. -128. 1.00 17204. 21369.
2 beta0 171. 171. 9.52 9.40 155. 187. 1.00 26136. 23619.
3 sigma 51.8 51.0 7.25 6.89 41.5 64.9 1.00 25343. 23597.
lp__
is posterior log likelihood–does not necessarily need examinedess_
columns show effect sample size for chain (factoring in autocorrelation between correlations)
# A tibble: 3 × 10
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 lp__ -129. -128. 1.03 0.737 -131. -128. 1.00 17204. 21369.
2 beta0 171. 171. 9.52 9.40 155. 187. 1.00 26136. 23619.
3 sigma 51.8 51.0 7.25 6.89 41.5 64.9 1.00 25343. 23597.
lower upper
154.917 186.013
attr(,"credMass")
[1] 0.9
lower upper
40.3082 63.1433
attr(,"credMass")
[1] 0.9
SMIP Summer School 2025, Multilevel Measurement Models, Lecture 04