Today’s example is from a bootstrap resample of 177 undergraduate students at a large state university in the Midwest. The survey was a measure of 10 questions about their beliefs in various conspiracy theories that were being passed around the internet in the early 2010s. Additionally, gender was included in the survey. All items responses were on a 5- point Likert scale with:
Our purpose in using this instrument is to provide a context that we all may find relevant as many of these conspiracy theories are still prevalent today.
Questions:
Questions:
Missing Data in Stan
If you ever attempted to analyze missing data in Stan, you likely received an error message:
Error: Variable 'Y' has NA values.
That is because, as a default, Stan does not model missing data
To make things a bit easier, I’m only turning one value into missing data (the first person’s response to the first item)
Note: All code will work with as much missing as you have
We will use the syntax from the last lecture–for multidimensional measurement models with ordinal logit (graded response) items
The Q-matrix this time, will be a single column vector (one dimension)
[,1]
[1,] 1
[2,] 1
[3,] 1
[4,] 1
[5,] 1
[6,] 1
[7,] 1
[8,] 1
[9,] 1
[10,] 1
model {
lambda ~ multi_normal(meanLambda, covLambda);
thetaCorrL ~ lkj_corr_cholesky(1.0);
theta ~ multi_normal_cholesky(meanTheta, thetaCorrL);
for (item in 1:nItems){
thr[item] ~ multi_normal(meanThr[item], covThr[item]);
Y[item, observed[item, 1:nObserved[item]]] ~ ordered_logistic(thetaMatrix[observed[item, 1:nObserved[item]],]*lambdaMatrix[item,1:nFactors]', thr[item]);
}
}
Notes:
Y
:
Y[item]
Y[item, observed[item, 1:nObserved[item]]]
thetaMatrix[observed[item, 1:nObserved[item]],]
data {
// data specifications =============================================================
int<lower=0> nObs; // number of observations
int<lower=0> nItems; // number of items
int<lower=0> maxCategory; // number of categories for each item
array[nItems] int nObserved;
array[nItems, nObs] array[nItems] int observed;
// input data =============================================================
array[nItems, nObs] int<lower=-1, upper=5> Y; // item responses in an array
// loading specifications =============================================================
int<lower=1> nFactors; // number of loadings in the model
array[nItems, nFactors] int<lower=0, upper=1> Qmatrix;
// prior specifications =============================================================
array[nItems] vector[maxCategory-1] meanThr; // prior mean vector for intercept parameters
array[nItems] matrix[maxCategory-1, maxCategory-1] covThr; // prior covariance matrix for intercept parameters
vector[nItems] meanLambda; // prior mean vector for discrimination parameters
matrix[nItems, nItems] covLambda; // prior covariance matrix for discrimination parameters
vector[nFactors] meanTheta;
}
array[nItems] int nObserved;
: The number of observations with non-missing data for each itemarray[nItems, nObs] array[nItems] int observed;
: A listing of which observations have non-missing data for each item
To build these arrays, we can use a loop in R:
observed = matrix(data = -1, nrow = nrow(conspiracyItems), ncol = ncol(conspiracyItems))
nObserved = NULL
for (variable in 1:ncol(conspiracyItems)){
nObserved = c(nObserved, length(which(!is.na(conspiracyItems[, variable]))))
observed[1:nObserved[variable], variable] = which(!is.na(conspiracyItems[, variable]))
}
For the item that has the first case missing, this gives:
[1] 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
[19] 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
[37] 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55
[55] 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73
[73] 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91
[91] 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109
[109] 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127
[127] 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145
[145] 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163
[163] 164 165 166 167 168 169 170 171 172 173 174 175 176 177 -1
The item has 176 observed responses and one missing
observed[,1]
list who has non-missing dataobserved[,1]
is -1 (but won’t be used in Stan)We can use the values of observed[,1]
to have Stan only select the corresponding data points that are non-missing
To demonstrate, in R, here is all of the data for the first item
[1] NA 3 4 2 2 1 4 2 3 2 1 3 2 1 2 2 2 2 3 2 1 1 1 1 4
[26] 4 4 3 4 3 3 1 2 1 2 3 1 2 1 2 1 1 2 2 4 3 3 1 1 4
[51] 1 2 1 3 1 1 2 1 4 2 2 2 1 5 3 2 3 3 1 3 2 1 2 1 1
[76] 2 3 4 3 3 2 2 1 3 3 1 2 3 1 4 2 1 2 5 5 2 3 1 3 2
[101] 3 5 2 4 1 3 3 4 3 2 2 4 3 3 4 3 2 2 1 1 3 4 3 2 1
[126] 4 3 2 2 3 4 5 1 5 3 1 3 3 3 2 2 1 1 3 1 3 3 4 1 1
[151] 4 1 4 2 1 1 1 2 2 1 4 2 1 2 4 1 2 5 3 2 1 3 3 3 2
[176] 3 3
And here, we select the non-missing using the index values in observed
:
[1] 3 4 2 2 1 4 2 3 2 1 3 2 1 2 2 2 2 3 2 1 1 1 1 4 4 4 3 4 3 3 1 2 1 2 3 1 2
[38] 1 2 1 1 2 2 4 3 3 1 1 4 1 2 1 3 1 1 2 1 4 2 2 2 1 5 3 2 3 3 1 3 2 1 2 1 1
[75] 2 3 4 3 3 2 2 1 3 3 1 2 3 1 4 2 1 2 5 5 2 3 1 3 2 3 5 2 4 1 3 3 4 3 2 2 4
[112] 3 3 4 3 2 2 1 1 3 4 3 2 1 4 3 2 2 3 4 5 1 5 3 1 3 3 3 2 2 1 1 3 1 3 3 4 1
[149] 1 4 1 4 2 1 1 1 2 2 1 4 2 1 2 4 1 2 5 3 2 1 3 3 3 2 3 3
The values of observed[1:nObserved, 1]
lead to only using the non-missing data
Finally, we must ensure all data into Stan have no NA values
With our missing values denoted, we then run Stan as we have previously
modelOrderedLogit_data = list(
nObs = nObs,
nItems = nItems,
maxCategory = maxCategory,
nObserved = nObserved,
observed = t(observed),
Y = t(Y),
nFactors = ncol(Qmatrix),
Qmatrix = Qmatrix,
meanThr = thrMeanMatrix,
covThr = thrCovArray,
meanLambda = lambdaMeanVecHP,
covLambda = lambdaCovarianceMatrixHP,
meanTheta = thetaMean
)
modelOrderedLogit_samples = modelOrderedLogit_stan$sample(
data = modelOrderedLogit_data,
seed = 191120221,
chains = 4,
parallel_chains = 4,
iter_warmup = 2000,
iter_sampling = 2000,
init = function() list(lambda=rnorm(nItems, mean=5, sd=1),
theta=thetaInit)
)
[1] 1.004752
# A tibble: 50 × 10
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 lambda[1] 2.02 2.00 0.269 0.262 1.60 2.49 1.00 2413. 4538.
2 lambda[2] 2.88 2.85 0.390 0.389 2.27 3.55 1.00 1666. 3991.
3 lambda[3] 2.40 2.38 0.344 0.342 1.87 3.00 1.00 2683. 4438.
4 lambda[4] 2.91 2.89 0.390 0.382 2.32 3.59 1.00 1781. 3763.
5 lambda[5] 4.32 4.28 0.602 0.595 3.40 5.38 1.00 1843. 3279.
6 lambda[6] 4.19 4.17 0.562 0.561 3.32 5.16 1.00 2002. 3757.
7 lambda[7] 2.90 2.87 0.425 0.413 2.25 3.66 1.00 2476. 3430.
8 lambda[8] 4.16 4.12 0.562 0.548 3.31 5.15 1.00 1545. 3167.
9 lambda[9] 2.90 2.88 0.424 0.422 2.26 3.64 1.00 2401. 3674.
10 lambda[1… 2.43 2.40 0.415 0.409 1.80 3.15 1.00 2844. 4579.
11 mu[1,1] 1.87 1.86 0.295 0.289 1.41 2.39 1.00 1110. 3809.
12 mu[2,1] 0.817 0.807 0.304 0.301 0.340 1.34 1.00 648. 1974.
13 mu[3,1] 0.102 0.101 0.264 0.259 -0.328 0.548 1.00 742. 1829.
14 mu[4,1] 0.992 0.978 0.317 0.315 0.486 1.52 1.00 610. 1473.
15 mu[5,1] 1.30 1.29 0.424 0.419 0.635 2.01 1.00 554. 1716.
16 mu[6,1] 0.954 0.948 0.403 0.401 0.309 1.63 1.00 533. 1327.
17 mu[7,1] -0.105 -0.108 0.298 0.296 -0.590 0.389 1.00 638. 1678.
18 mu[8,1] 0.691 0.679 0.399 0.391 0.0550 1.37 1.00 496. 1106.
19 mu[9,1] -0.0571 -0.0552 0.300 0.291 -0.540 0.429 1.00 634. 1829.
20 mu[10,1] -1.35 -1.34 0.308 0.303 -1.87 -0.862 1.00 857. 2389.
21 mu[1,2] -0.190 -0.190 0.238 0.238 -0.576 0.205 1.00 834. 1955.
22 mu[2,2] -1.53 -1.52 0.324 0.320 -2.08 -1.02 1.00 815. 2413.
23 mu[3,2] -1.10 -1.09 0.278 0.273 -1.56 -0.646 1.00 793. 2016.
24 mu[4,2] -1.12 -1.11 0.310 0.310 -1.64 -0.623 1.00 686. 1905.
25 mu[5,2] -1.93 -1.91 0.446 0.438 -2.70 -1.23 1.00 717. 1667.
26 mu[6,2] -1.96 -1.94 0.433 0.427 -2.70 -1.27 1.00 656. 1692.
27 mu[7,2] -1.95 -1.94 0.339 0.333 -2.53 -1.42 1.00 905. 2075.
28 mu[8,2] -1.82 -1.81 0.411 0.402 -2.54 -1.17 1.00 737. 1988.
29 mu[9,2] -1.93 -1.93 0.345 0.347 -2.51 -1.38 1.00 1003. 2673.
30 mu[10,2] -2.58 -2.56 0.365 0.368 -3.21 -2.01 1.00 1267. 3353.
31 mu[1,3] -2.02 -2.01 0.284 0.281 -2.50 -1.56 1.00 1165. 3489.
32 mu[2,3] -3.39 -3.37 0.420 0.415 -4.11 -2.72 1.00 1364. 3729.
33 mu[3,3] -3.67 -3.65 0.438 0.437 -4.41 -2.98 1.00 1845. 4387.
34 mu[4,3] -3.82 -3.80 0.447 0.445 -4.59 -3.11 1.00 1640. 4128.
35 mu[5,3] -4.51 -4.48 0.603 0.601 -5.53 -3.57 1.00 1285. 3087.
36 mu[6,3] -5.57 -5.53 0.677 0.678 -6.72 -4.53 1.00 1678. 3449.
37 mu[7,3] -4.15 -4.13 0.502 0.499 -5.00 -3.37 1.00 1959. 4834.
38 mu[8,3] -6.39 -6.36 0.771 0.771 -7.74 -5.21 1.00 2838. 4657.
39 mu[9,3] -3.23 -3.21 0.421 0.417 -3.94 -2.55 1.00 1593. 3920.
40 mu[10,3] -3.73 -3.71 0.453 0.458 -4.53 -3.02 1.00 1940. 3870.
41 mu[1,4] -3.98 -3.95 0.469 0.457 -4.78 -3.25 1.00 3042. 5545.
42 mu[2,4] -4.92 -4.90 0.575 0.572 -5.90 -4.02 1.00 2739. 5408.
43 mu[3,4] -4.76 -4.74 0.569 0.567 -5.74 -3.86 1.00 3181. 5734.
44 mu[4,4] -4.73 -4.71 0.546 0.543 -5.67 -3.87 1.00 2551. 4422.
45 mu[5,4] -6.75 -6.71 0.838 0.842 -8.20 -5.44 1.00 2242. 4082.
46 mu[6,4] -7.87 -7.80 0.977 0.980 -9.56 -6.38 1.00 3908. 5385.
47 mu[7,4] -5.70 -5.66 0.691 0.691 -6.89 -4.64 1.00 4220. 6232.
48 mu[8,4] -8.44 -8.39 1.08 1.08 -10.3 -6.76 1.00 4873. 5822.
49 mu[9,4] -4.90 -4.87 0.577 0.581 -5.90 -4.01 1.00 2558. 4782.
50 mu[10,4] -3.99 -3.97 0.481 0.486 -4.83 -3.23 1.00 2134. 4261.
Likelihoods with Missing Data
The way we have coded Stan enables estimation by effectively skipping over cases that were missing
For the parameters of an item \(i\), the previous model/data likelihood was:
\[f \left(\boldsymbol{Y}_{p} \mid \theta_p \right) = \prod_{p=1}^P f \left(Y_{pi} \mid \theta_p \right)\]
Now, we must alter the PDF so that missing data do not contribute:
\[f \left(Y_{pi} \mid \theta_p \right) = \left\{ \begin{array}{lr} f \left(Y_{pi} \mid \theta_p \right) & \text{if } Y_{pi} \text{ observed} \\ 1 & \text{if } Y_{pi} \text{ missing} \\ \end{array} \right.\]
This also applies to the likelihood for a person’s \(\theta\) as any missing items are skipped:
\[f \left(\boldsymbol{Y}_{p} \mid \theta_p \right) = \prod_{i=1}^I f \left(Y_{pi} \mid \theta_p \right)\]
It may seem somewhat basic to simply skip over missing cases (also called list-wise deletion), but:
It is a stronger method for analysis than case-wise deletion (removing a person fully)
Moreover, the methods we implemented in Stan are equivalent to those implemented in maximum likelihood algorithms
Wrapping Up
Today, we showed how to skip over missing data in Stan
Of note, we could (but didn’t) also build models for missing data in Stan
Finally, Stan’s missing data methods are quite different from JAGS