[1] "has_annotations" "temp"
[1] "Qmatrix" "responseMatrix"
The data sets you will use in this class are from educational assessments:
Note: These are not the actual assessments but are similar in structure and content and created by a bootstrapping process
Each data set is saved as in the RData format
load()
functiontemp
temp
object has a Q-matrix and the response matrix[1] "has_annotations" "temp"
[1] "Qmatrix" "responseMatrix"
Q-matrices are a way to specify which latent variables are measured by which items
RL.9-10.4 RL.9-10.6 RL.9-10.1 RL.9-10.2 RI.9-10.6 RI.9-10.1 RI.9-10.4
item1 1 0 0 0 0 0 0
item2 0 1 0 0 0 0 0
item3 0 0 1 0 0 0 0
item4 0 1 0 0 0 0 0
item5 0 0 0 1 0 0 0
item6 0 0 0 0 1 0 0
A.REI.1 A.REI.2 A.REI.8 A.CED.2 G.CO.7 G.SRT.4 G.CO.2
item1 0 0 0 0 0 0 0
item2 0 0 0 0 0 0 0
item3 0 0 0 0 0 0 0
item4 0 0 0 0 0 0 0
item5 0 0 0 0 0 0 0
item6 0 0 0 0 0 0 0
Latent variables are built by specification:
You create latent variables by specifying which items measure which latent variables in an analysis model
The column names in the class data Q-matrix reflect different educational standards being assessed
Question for discussion: What are the latent variables we must estimate in the class data?
The alignment provides a specification of which latent variables are measured by which items
The mathematical definition of either of these terms is simply whether or not a latent variable appears as a predictor for an item
RL.9-10.4 RL.9-10.6 RL.9-10.1 RL.9-10.2 RI.9-10.6 RI.9-10.1 RI.9-10.4 A.REI.1
1 0 0 0 0 0 0 0
A.REI.2 A.REI.8 A.CED.2 G.CO.7 G.SRT.4 G.CO.2
0 0 0 0 0 0
::: :::
The model for the first item is then built with only the factors measured by the item as being present:
\[ f(E\left(Y_{p1} \mid \boldsymbol{\theta}_p\right) ) = \mu_1 + \lambda_{11} \theta_{p1} \]
Psychometric Measurement Models to model the relationship between observed variables and latent variables
The models discussed today all model the expected value (mean) of the observed variables as a function of the latent variables
Binary IRT models are used when the observed variables are binary (0/1) variables
\[f\left(Y_{pi} \right) = \pi_i^{Y_{pi}} \left(1-\pi_i \right)^{\left(1-Y_{pi} \right)} \]
Where:
There are many different binary IRT models, the most common fall under the family of models identified by the number of parameters in the model
\[E\left(Y_{pi} | \theta_p \right) = P\left(Y_{pi} = 1 | \theta_p \right) = c_i + (d_i - c_i) \left( \frac{\exp\left( a_i \left(\theta_p -b_i \right) \right)}{1 + \exp\left( a_i \left(\theta_p -b_i \right) \right)} \right) \]
Where:
The model on the previous page is the four-parameter logistic model (4PL)
Many link functions exist, including:
Depending on the context, the slope/intercept parameterization can also be used:
\[E\left(Y_{pi} | \theta_p \right) = P\left(Y_{pi} = 1 | \theta_p \right) = c_i + (d_i - c_i) \left( \frac{\exp \left( \mu_i + \lambda_i \theta_p \right)}{1 + \exp \left( \mu_i + \lambda_i \theta_p \right) }\right) \]
We then can add the Q-matrix for expanding the model to multiple latent variables:
\[E\left(Y_{pi} | \theta_p \right) = P\left(Y_{pi} = 1 | \theta_p \right) = c_i + (d_i - c_i) \left( \frac{\exp \left( \mu_i + \sum_{d=1}^D q_{id} \lambda_{id} \theta_{pd} \right)}{1 + \exp \left( \mu_i + \sum_{d=1}^D q_{id} \lambda_{id} \theta_{pd} \right) }\right) \]
Where:
Note: Not all items need to measure all latent variables (some \(q_{id} = 0\))
Polytomous IRT models are used for discrete categorical observed variables
The PMF of the categorical distribution is:
\[f\left(Y_{pi} \right) = \prod_{c=1}^{C_i} \pi_{ic}^{Y_{pic}} \]
Where:
The graded response model is an ordered logistic regression model where:
\[P\left(Y_{ic } \mid \theta \right) = \left\{ \begin{array}{lr} 1-P^*\left(Y_{i1} \mid \theta \right) & \text{if } c=1 \\ P^*\left(Y_{i{c-1}} \mid \theta \right) - P^*\left(Y_{i{c}} \mid \theta \right) & \text{if } 1<c<C_i \\ P^*\left(Y_{i{C_i -1} } \mid \theta \right) & \text{if } c=C_i \\ \end{array} \right.\]
Where:
\[P^*\left(Y_{i{c}} \mid \theta \right) = \frac{\exp(\mu_{ic}+\sum_{d=1}^D q_{id} \lambda_{id} \theta_{pd})}{1+\exp(\mu_{ic}+\sum_{d=1}^D q_{id} \lambda_{id} \theta_{pd})}\]
With:
The nominal response model is an unordered logistic regression model where:
\[P\left(Y_{ic } \mid \theta \right) = \frac{\exp(\mu_{ic}+\sum_{d=1}^D q_{id} \lambda_{idc} \theta_{pd})}{\sum_{c=1}^{C_i} \exp(\mu_{ic}+\sum_{d=1}^D q_{id} \lambda_{idc} \theta_{pd})}\]
With:
Confirmatory factor models are used when the observed variables are continuous (interval/ratio) variables
\[ f\left(Y_{pi}\right) = \frac{1}{\sqrt{2\pi \sigma^2}} \exp \left( -\frac{\left(Y_{pi} - \mu_i\right)^2}{2\sigma^2} \right) \]
The CFA model is then:
\[ Y_{pi} = \mu_i + \sum_{d=1}^D q_{id} \lambda_{idc} \theta_{pd} + e_{pi}\]
Where:
We could show the model with the Q-matrix entries:
\[ f(E\left(Y_{p1} \mid \boldsymbol{\theta}_p\right) ) = \mu_1 + q_{11}\left( \lambda_{11} \theta_{p1} \right) + q_{12}\left( \lambda_{12} \theta_{p2} \right) = \mu_1 + \boldsymbol{\theta}_p \text{diag}\left(\boldsymbol{q}_i \right) \boldsymbol{\lambda}_{1} \]
\(\boldsymbol{\lambda}_{1} = \left[ \begin{array}{c} \lambda_{11} \\ \lambda_{12} \end{array} \right]\) contains all possible factor loadings for item 1 (size \(2 \times 1\))
\(\boldsymbol{\theta}_p = \left[ \begin{array}{cc} \theta_{p1} & \theta_{p2} \\ \end{array} \right]\) contains the factor scores for person \(p\) (size \(1 \times 2\))
\(\text{diag}\left(\boldsymbol{q}_i \right) = \boldsymbol{q}_i \left[ \begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array} \right] = \left[ \begin{array}{cc} 0 & 1 \\ \end{array} \right] \left[ \begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array} \right] = \left[ \begin{array}{cc} 0 & 0 \\ 0 & 1 \end{array} \right]\) is a diagonal matrix of ones times the vector of Q-matrix entries for item 1
The matrix product then gives:
\[\boldsymbol{\theta}_p \text{diag}\left(\boldsymbol{q}_i \right) \boldsymbol{\lambda}_{1} = \left[ \begin{array}{cc} \theta_{p1} & \theta_{p2} \\ \end{array} \right]\left[ \begin{array}{cc} 0 & 0 \\ 0 & 1 \end{array} \right] \left[ \begin{array}{c} \lambda_{11} \\ \lambda_{12} \end{array} \right] = \left[ \begin{array}{cc} \theta_{p1} & \theta_{p2} \\ \end{array} \right]\left[ \begin{array}{c} 0 \\ \lambda_{12} \end{array} \right] = \lambda_{12}\theta_{p2}\]
The Q-matrix functions like a partial version of the model (predictor) matrix that we saw in linear models
Although infrequently shown in psychometric measurement models of the IRT/CFA family, latent variable interactions can be modeled
For example, a binary item measuring two latent variables could be modeled as:
\[E\left(Y_{pi} | \theta_p \right) = P\left(Y_{pi} = 1 | \theta_p \right) = c_i + (d_i - c_i) \left( \frac{\exp \left( \mu_i + \lambda_{i1} \theta_{p1} + \lambda_{i2} \theta_{p2} + \lambda_{i12} \theta_{p1} \theta_{p2} \right)}{1 + \exp \left( \mu_i + \lambda_{i1} \theta_{p1} + \lambda_{i2} \theta_{p2} + \lambda_{i12} \theta_{p1} \theta_{p2} \right) }\right) \]
Notes:
Latent variable structural models are models for the distributions of the latent variables
For our example, we will assume the set of traits follows a multivariate normal distribution
\[ f\left(\boldsymbol{\theta}_p \right) = \left(2 \pi \right)^{-\frac{D}{2}} \det\left(\boldsymbol{\Sigma}_\theta \right)^{-\frac{1}{2}}\exp\left[-\frac{1}{2}\left(\boldsymbol{\theta}_p - \boldsymbol{\mu}_\theta \right)^T\boldsymbol{\Sigma}_\theta^{-1}\left(\boldsymbol{\theta}_p - \boldsymbol{\mu}_\theta \right) \right] \] Where:
Alternatively, we would specify \(\boldsymbol{\theta}_p \sim N_D\left( \boldsymbol{\mu}_\theta, \boldsymbol{\Sigma}_\theta \right)\); but, we cannot always estimate \(\boldsymbol{\mu}_\theta\) and \(\boldsymbol{\Sigma}_\theta\)
Psychometric models require two types of identification to be valid:
Bayesian priors can help to make models with fewer items than these criteria suggest estimable
Psychometric models require two types of identification to be valid:
Bayesian priors can let you believe you can estimate more parameters than the non-Bayesian standards suggest
Like empirical identification, these estimates are often unstable and are not recommended
Most common:
Multidimensional Measurement Models (Fall 2023): Lecture 3