Today’s example is from a bootstrap resample of 177 undergraduate students at a large state university in the Midwest. The survey was a measure of 10 questions about their beliefs in various conspiracy theories that were being passed around the internet in the early 2010s. Additionally, gender was included in the survey. All items responses were on a 5- point Likert scale with:
Our purpose in using this instrument is to provide a context that we all may find relevant as many of these conspiracy theories are still prevalent today.
Questions:
Questions:
Today, we will revert back to the graded response model assumptions to discuss how to estimate the latent variable standard deviation
\[P\left(Y_{ic } = c \mid \theta_p \right) = \left\{ \begin{array}{lr} 1-P\left(Y_{i1} \gt 1 \mid \theta_p \right) & \text{if } c=1 \\ P\left(Y_{i{c-1}} \gt c-1 \mid \theta_p \right) - P\left(Y_{i{c}} \gt c \mid \theta_p \right) & \text{if } 1<c<C_i \\ P\left(Y_{i{C_i -1} } \gt C_i-1 \mid \theta_p \right) & \text{if } c=C_i \\ \end{array} \right.\]
Where:
\[ P\left(Y_{i{c}} > c \mid \theta \right) = \frac{\exp(-\tau_{ic}+\lambda_i\theta_p)}{1+\exp(-\tau_{ic}+\lambda_i\theta_p)}\]
With:
We can convert thresholds to intercepts by multiplying by negative one: \(\mu_c = -\tau_c\)
We will fit two models to the data:
We know from previous lectures that the two-dimensional model had a correlation very close to one
Posterior Predictive Model Checking for Absolute Fit in Bayesian Psychometric Models
Psychometric models can use posterior predictive model checking (PPMC) to assess how well they fit the data in an absolute sense
Problems with PPMC include
Notes:
ordered_logistic_rng
goes with ordered_logistic
Notes:
ordered_logistic_rng
functionStan generated a lot of data–but now we must take it from the format of Stan and process it:
# setting up PPMC
simData = modelOrderedLogit_samples$draws(variables = "simY", format = "draws_matrix")
colnames(simData)
dim(simData)
# set up object for storing each iteration's PPMC data
nPairs = choose(10, 2)
pairNames = NULL
for (col in 1:(nItems-1)){
for (row in (col+1):nItems){
pairNames = c(pairNames, paste0("item", row, "_item", col))
}
}
PPMCsamples = list()
PPMCsamples$correlation = NULL
PPMCsamples$mean = NULL
# loop over each posterior sample's simulated data
for (sample in 1:nrow(simData)){
# create data frame that has observations (rows) by items (columns)
sampleData = data.frame(matrix(data = NA, nrow = nObs, ncol = nItems))
for (item in 1:nItems){
itemColumns = colnames(simData)[grep(pattern = paste0("simY\\[", item, "\\,"), x = colnames(simData))]
sampleData[,item] = t(simData[sample, itemColumns])
}
# with data frame created, apply functions of the data:
# calculate item means
PPMCsamples$mean = rbind(PPMCsamples$mean, apply(X = sampleData, MARGIN = 2, FUN = mean))
# calculate pearson correlations
temp=cor(sampleData)
PPMCsamples$correlation = rbind(PPMCsamples$correlation, temp[lower.tri(temp)])
}
# next, build distributions for each type of statistic
meanSummary = NULL
# for means
for (item in 1:nItems){
tempDist = ecdf(PPMCsamples$mean[,item])
ppmcMean = mean(PPMCsamples$mean[,item])
observedMean = mean(conspiracyItems[,item])
meanSummary = rbind(
meanSummary,
data.frame(
item = paste0("Item", item),
ppmcMean = ppmcMean,
observedMean = observedMean,
residual = observedMean - ppmcMean,
observedMeanPCT = tempDist(observedMean)
)
)
}
# for pearson correlations
corrSummary = NULL
# for means
for (column in 1:ncol(PPMCsamples$correlation)){
# get item numbers from items
items = unlist(strsplit(x = colnames(PPMCsamples$correlation)[column], split = "_"))
item1num = as.numeric(substr(x = items[1], start = 5, stop = nchar(items[1])))
item2num = as.numeric(substr(x = items[2], start = 5, stop = nchar(items[2])))
tempDist = ecdf(PPMCsamples$correlation[,column])
ppmcCorr = mean(PPMCsamples$correlation[,column])
observedCorr = cor(conspiracyItems[,c(item1num, item2num)])[1,2]
pct = tempDist(observedCorr)
if (pct > .975 | pct < .025){
inTail = TRUE
} else {
inTail = FALSE
}
corrSummary = rbind(
corrSummary,
data.frame(
item1 = paste0("Item", item1num),
item2 = paste0("Item", item2num),
ppmcCorr = ppmcCorr,
observedCorr = observedCorr,
residual = observedCorr - ppmcCorr,
observedCorrPCT = pct,
inTail = inTail
)
)
}
View(corrSummary)
One \(\theta\)
item ppmcMean observedMean residual observedMeanPCT
1 Item1 2.352560 2.367232 0.014671186 0.58340
2 Item2 1.958342 1.954802 -0.003539831 0.47495
3 Item3 1.862719 1.875706 0.012987571 0.58610
4 Item4 2.008068 2.011299 0.003231638 0.53915
5 Item5 1.978207 1.983051 0.004843503 0.53005
6 Item6 1.880974 1.892655 0.011681356 0.61725
7 Item7 1.748335 1.723164 -0.025171469 0.37290
8 Item8 1.840580 1.841808 0.001228249 0.51550
9 Item9 1.825317 1.807910 -0.017407062 0.43230
10 Item10 1.535198 1.519774 -0.015424294 0.45225
Two \(\theta\)s
item ppmcMean observedMean residual observedMeanPCT
1 Item1 2.357626 2.367232 0.0096059322 0.559500
2 Item2 1.969614 1.954802 -0.0148114407 0.418500
3 Item3 1.874785 1.875706 0.0009216102 0.528000
4 Item4 2.016655 2.011299 -0.0053552260 0.493000
5 Item5 1.988220 1.983051 -0.0051687853 0.465375
6 Item6 1.895842 1.892655 -0.0031864407 0.500375
7 Item7 1.758951 1.723164 -0.0357874294 0.326125
8 Item8 1.853244 1.841808 -0.0114357345 0.424500
9 Item9 1.835992 1.807910 -0.0280819209 0.373625
10 Item10 1.553063 1.519774 -0.0332888418 0.376125
item1 item2 observedCorr ppmcCorr.x residual.x observedCorrPCT.x inTail.x
1 Item10 Item1 0.2568334 0.4131131 -0.156279686 0.01655 TRUE
2 Item10 Item2 0.4121264 0.4917030 -0.079576626 0.14375 FALSE
3 Item10 Item3 0.5670156 0.4522000 0.114815615 0.93835 FALSE
4 Item10 Item4 0.4516124 0.4890239 -0.037411565 0.28790 FALSE
5 Item10 Item5 0.5822203 0.5567640 0.025456304 0.62005 FALSE
6 Item10 Item6 0.5092003 0.5501939 -0.040993565 0.25890 FALSE
7 Item10 Item7 0.4420205 0.4992130 -0.057192487 0.22315 FALSE
8 Item10 Item8 0.5457127 0.5434373 0.002275470 0.48760 FALSE
9 Item10 Item9 0.4244916 0.4940797 -0.069588089 0.18525 FALSE
10 Item2 Item1 0.6024186 0.5357365 0.066682114 0.85585 FALSE
11 Item3 Item1 0.3473072 0.4931202 -0.145812943 0.01745 TRUE
12 Item3 Item2 0.5176819 0.5569093 -0.039227387 0.26420 FALSE
13 Item4 Item1 0.5700901 0.5439901 0.026099947 0.64290 FALSE
14 Item4 Item2 0.5815808 0.6090530 -0.027472228 0.30375 FALSE
15 Item4 Item3 0.4973987 0.5615192 -0.064120482 0.15380 FALSE
16 Item5 Item1 0.5526073 0.6008370 -0.048229709 0.19820 FALSE
17 Item5 Item2 0.6512768 0.6835423 -0.032265485 0.25645 FALSE
18 Item5 Item3 0.5942163 0.6288536 -0.034637387 0.26495 FALSE
19 Item5 Item4 0.6267160 0.6877014 -0.060985378 0.11495 FALSE
20 Item6 Item1 0.5435771 0.5996223 -0.056045208 0.16460 FALSE
21 Item6 Item2 0.6835282 0.6802492 0.003279028 0.49990 FALSE
22 Item6 Item3 0.6601292 0.6276426 0.032486611 0.68855 FALSE
23 Item6 Item4 0.6251455 0.6853371 -0.060191526 0.11720 FALSE
24 Item6 Item5 0.7366890 0.7696183 -0.032929309 0.18445 FALSE
25 Item7 Item1 0.4084389 0.5196236 -0.111184764 0.04695 FALSE
26 Item7 Item2 0.4721628 0.5988845 -0.126721659 0.02785 FALSE
27 Item7 Item3 0.6193269 0.5520486 0.067278349 0.83690 FALSE
28 Item7 Item4 0.4676834 0.6002957 -0.132612336 0.02010 TRUE
29 Item7 Item5 0.6204485 0.6779012 -0.057452711 0.14370 FALSE
30 Item7 Item6 0.6870303 0.6740562 0.012974158 0.57020 FALSE
31 Item8 Item1 0.5425166 0.5990322 -0.056515657 0.16485 FALSE
32 Item8 Item2 0.6317705 0.6774062 -0.045635698 0.17995 FALSE
33 Item8 Item3 0.6182736 0.6266373 -0.008363667 0.42020 FALSE
34 Item8 Item4 0.6373299 0.6835823 -0.046252488 0.17135 FALSE
35 Item8 Item5 0.7641703 0.7672351 -0.003064875 0.44260 FALSE
36 Item8 Item6 0.7557355 0.7697220 -0.013986520 0.33825 FALSE
37 Item8 Item7 0.6611475 0.6720005 -0.010853042 0.39345 FALSE
38 Item9 Item1 0.4420860 0.5114050 -0.069318985 0.14180 FALSE
39 Item9 Item2 0.5998366 0.5909206 0.008915976 0.53460 FALSE
40 Item9 Item3 0.3912734 0.5436816 -0.152408206 0.01670 TRUE
41 Item9 Item4 0.6108052 0.5917868 0.019018393 0.60460 FALSE
42 Item9 Item5 0.7119365 0.6693743 0.042562230 0.76780 FALSE
43 Item9 Item6 0.5849045 0.6643001 -0.079395555 0.07880 FALSE
44 Item9 Item7 0.5653926 0.5905836 -0.025191060 0.33045 FALSE
45 Item9 Item8 0.6068774 0.6608282 -0.053950800 0.15570 FALSE
ppmcCorr.y residual.y observedCorrPCT.y inTail.y
1 0.4062453 -0.149411838 0.021375 TRUE
2 0.4741188 -0.061992441 0.208000 FALSE
3 0.4396909 0.127324771 0.954625 FALSE
4 0.4789946 -0.027382240 0.337625 FALSE
5 0.5336977 0.048522599 0.736750 FALSE
6 0.5316488 -0.022448494 0.350250 FALSE
7 0.4725387 -0.030518156 0.338000 FALSE
8 0.5166837 0.029028985 0.646500 FALSE
9 0.4749625 -0.050470922 0.258750 FALSE
10 0.5225588 0.079859795 0.897625 FALSE
11 0.4845009 -0.137193683 0.025250 FALSE
12 0.5407355 -0.023053619 0.348375 FALSE
13 0.5350439 0.035046168 0.693625 FALSE
14 0.5943265 -0.012745647 0.396500 FALSE
15 0.5518686 -0.054469852 0.197875 FALSE
16 0.5823830 -0.029775722 0.296250 FALSE
17 0.6655675 -0.014290652 0.375000 FALSE
18 0.6060217 -0.011805479 0.396875 FALSE
19 0.6653200 -0.038603963 0.219375 FALSE
20 0.5827271 -0.039150054 0.242750 FALSE
21 0.6534385 0.030089655 0.701875 FALSE
22 0.6077661 0.052363175 0.802875 FALSE
23 0.6669047 -0.041759184 0.201375 FALSE
24 0.7328579 0.003831089 0.509500 FALSE
25 0.5004824 -0.092043558 0.087625 FALSE
26 0.5787187 -0.106555900 0.051250 FALSE
27 0.5271264 0.092200572 0.912500 FALSE
28 0.5763177 -0.108634364 0.048625 FALSE
29 0.6497289 -0.029280447 0.293375 FALSE
30 0.6379156 0.049114716 0.802500 FALSE
31 0.5730756 -0.030559026 0.295375 FALSE
32 0.6518795 -0.020108970 0.331750 FALSE
33 0.5950893 0.023184359 0.626625 FALSE
34 0.6538394 -0.016509531 0.355250 FALSE
35 0.7305302 0.033640008 0.771125 FALSE
36 0.7215552 0.034180334 0.770375 FALSE
37 0.6364697 0.024677809 0.647000 FALSE
38 0.5001129 -0.058026879 0.189375 FALSE
39 0.5797217 0.020114922 0.602250 FALSE
40 0.5287277 -0.137454306 0.028625 FALSE
41 0.5774416 0.033363627 0.688875 FALSE
42 0.6522837 0.059652847 0.848125 FALSE
43 0.6383018 -0.053397216 0.170875 FALSE
44 0.5700175 -0.004624969 0.456375 FALSE
45 0.6362707 -0.029393327 0.285125 FALSE
One \(\theta\)
badItems
Item1 Item3 Item10 Item4 Item7 Item9
2 2 1 1 1 1
Two \(\theta\)s
badItems2
Item1 Item10
1 1
Relative Model Fit in Bayesian Psychometric Models
As with other Bayesian models, we can use WAIC and LOO to compare the model fit of two Bayesian models
generated quantities{
// for LOO/WAIC:
vector[nObs] personLike = rep_vector(0.0, nObs);
for (item in 1:nItems){
for (obs in 1:nObs){
// calculate conditional data likelihood for LOO/WAIC
personLike[obs] = personLike[obs] + ordered_logistic_lpmf(Y[item, obs] | lambda[item]*theta[obs], thr[item]);
}
}
}
Notes:
ordered_logistic_lpmf
needs the observed data to work (first argument)vector[nObs] personLike = rep_vector(0.0, nObs);
is needed to set the values to zero at each iterationgenerated quantities{
// for LOO/WAIC:
vector[nObs] personLike = rep_vector(0.0, nObs);
for (item in 1:nItems){
for (obs in 1:nObs){
// calculate conditional data likelihood for LOO/WAIC
personLike[obs] = personLike[obs] + ordered_logistic_lpmf(Y[item, obs] | thetaMatrix[obs,]*lambdaMatrix[item,1:nFactors]', thr[item]);
}
}
}
Notes:
ordered_logistic_lpmf
needs the observed data to work (first argument)vector[nObs] personLike = rep_vector(0.0, nObs);
is needed to set the values to zero at each iteration
Computed from 20000 by 177 log-likelihood matrix
Estimate SE
elpd_waic -1482.6 70.4
p_waic 137.7 5.1
waic 2965.2 140.8
173 (97.7%) p_waic estimates greater than 0.4. We recommend trying loo instead.
Computed from 8000 by 177 log-likelihood matrix
Estimate SE
elpd_waic -1490.6 68.0
p_waic 142.6 4.6
waic 2981.1 136.1
174 (98.3%) p_waic estimates greater than 0.4. We recommend trying loo instead.
Smaller is better, so the unidimensional model wins (but very high SE for both)
Computed from 20000 by 177 log-likelihood matrix
Estimate SE
elpd_loo -1542.0 71.0
p_loo 197.1 5.8
looic 3084.0 141.9
------
Monte Carlo SE of elpd_loo is NA.
Pareto k diagnostic values:
Count Pct. Min. n_eff
(-Inf, 0.5] (good) 0 0.0% <NA>
(0.5, 0.7] (ok) 4 2.3% 426
(0.7, 1] (bad) 156 88.1% 31
(1, Inf) (very bad) 17 9.6% 10
See help('pareto-k-diagnostic') for details.
Computed from 8000 by 177 log-likelihood matrix
Estimate SE
elpd_loo -1541.3 68.1
p_loo 193.2 5.1
looic 3082.5 136.2
------
Monte Carlo SE of elpd_loo is NA.
Pareto k diagnostic values:
Count Pct. Min. n_eff
(-Inf, 0.5] (good) 1 0.6% 743
(0.5, 0.7] (ok) 19 10.7% 42
(0.7, 1] (bad) 135 76.3% 4
(1, Inf) (very bad) 22 12.4% 4
See help('pareto-k-diagnostic') for details.
LOO seems to (slightly) prefer the two-dimensional model (but by warnings about bad PSIS)
loo_compare(list(unidimensional = modelOrderedLogit_samples$loo(variables = "personLike"),
twodimensional = modelOrderedLogit2D_samples$loo(variables = "personLike")))
elpd_diff se_diff
twodimensional 0.0 0.0
unidimensional -0.7 4.9
Again, both models are very close in fit
Wrapping Up
Model fit is complicated for psychometric models
Thank you for a great semester