This chapter focuses on Missing Not At Random (MNAR) processes
Unlike MAR (Missing At Random), MNAR assumes unseen (missing) values do carry unique information about why they are missing
The MAR assumption is convenient but may not always hold
MNAR: The Challenge
Definition: The probability of missingness is related to the unseen score values themselves
Problem: Standard analyses assuming MAR can lead to biased results if the data are actually MNAR
Goal: Mitigate nonresponse bias when MNAR is suspected
Two Major MNAR Modeling Frameworks
Selection Models:
Model the probability of missingness as an outcome
Use a regression equation with a missing data indicator as the dependent variable
Treat missingness as dependent on the analysis variables
Pattern Mixture Models:
Model data patterns based on missingness
Use the missing data indicator as a predictor or moderator
Assume different parameters for different missing data patterns
Key Considerations & Sensitivity Analysis
Strict Assumptions: MNAR models require strong, often unverifiable, assumptions (e.g., distributional assumptions for selection models, specifying inestimable parameters for pattern mixture models)
Risk of Misspecification: Incorrect MNAR models can potentially increase bias compared to MAR analyses
Best Use Case: Often used for Sensitivity Analysis - examining how results change under different missing data assumptions (MAR vs. various MNAR scenarios)
MNAR Processes Revisited
MNAR Definition: The probability of missingness, \(Pr(M_i=1)\), is related to the unseen score values, \(Y_{i(mis)}\)
Gomer & Yuan (2021) propose a useful distinction:
Focused MNAR
Diffuse MNAR
This distinction helps structure sensitivity analyses
Focused MNAR
Definition: Missingness depends only on the unseen score values in \(Y_{i(mis)}\)
\(Y_{i(obs)}\) are the observed score values for individual \(i\)
Other terms are as defined previously
Example: Participants with worst symptoms (\(Y_{i(mis)}\)) leave and participants with low perceived control (\(Y_{i(obs)}\)) miss assessments due to daily disruptions
Implications of Focused vs. Diffuse MNAR
The distinction implies nonresponse is entangled with the outcome variable
Diffuse processes:
Generally considered harder to model (e.g., may require larger sample sizes)
Potentially capable of inducing greater nonresponse bias than focused processes
Understanding the potential type of MNAR can guide modeling choices
Major Modeling Frameworks
How do we model MNAR processes? Two main approaches:
Selection Models
Pattern Mixture Models
Both aim to mitigate nonresponse bias by modeling the occurrence of missing data
They start by factorizing the joint distribution of data (\(Y_i\)) and missingness indicators (\(M_i\))
\(f(Y_i | M_i)\): Analysis model where parameters differ based on missing data pattern (\(M\))
\(f(M_i)\): Model describing the proportions of different missing data patterns
Interpretation: Treats missingness (\(M\)) as a predictor or moderator. Participants with different missing data patterns form distinct subgroups with potentially different parameter values
Analogy: Similar to moderation or group comparisons
Selection vs. Pattern Mixture: Key Differences
Equivalence: Both attempt to describe the same joint distribution \(f(Y_i, M_i)\)
Differences:
Require different inputs and assumptions
Can yield different estimates for the same data
Tell different “stories” about the missing data process:
Selection: Missingness is an outcome
Pattern Mixture: Missingness is a grouping/moderating variable
Either (or both) could be plausible for a given application
Selection Models for Multiple Regression
Origins: Trace back to Heckman’s (1976, 1979) work on sample selection bias
Core Idea: Pair the focal analysis model (e.g., regression) with an additional model for the missing data indicator (M)
Modern Approach: Typically uses likelihood or Bayesian estimation via factored regressions, rather than Heckman’s original two-step method
The Missingness Model (Probit)
Assumes an underlying latent variable (\(M_i^*\)) representing the propensity for missingness
\(M_i^*\) is typically assumed to be normally distributed
A threshold (\(\tau\)) determines the observed missingness status (\(M_i\)): \[ M_i = \begin{cases} 0 & \text{if } M_i^* \le \tau \\ 1 & \text{if } M_i^* > \tau \end{cases} \]
For identification:
Threshold \(\tau\) is usually fixed at 0
Residual variance of \(M_i^*\) is fixed at 1 (\(\sigma_r^2 = 1\))
Interpretation: Missingness depends on the value of the missing variable and other variables in the model
Analogy: Partially mediated model (X influences M directly and through Y)
Important Assumptions
Bivariate Normality: The residuals of the focal model (\(\epsilon_i\)) and the missingness model (\(r_i\)) must follow a bivariate normal distribution
This is a strict and untestable assumption
It’s the “glue” holding the models together, especially when \(Y_i\) predicts \(M_i^*\)
Correct Specification of Missingness Model:
Must include the correct predictors of missingness
Must use the correct functional form (e.g., linear, curvilinear). Example for curvilinear effect of Y: \[ M_i^* = \gamma_0 + \gamma_1 Y_i + \gamma_2 Y_i^2 + r_i \]
Omitting important predictors \(\rightarrow\) Bias
Including unnecessary predictors \(\rightarrow\) Noisy estimates, reduced precision
Auxiliary variables (predict M but not Y, or vice-versa) can help identification and estimation
How Selection Models Mitigate Bias
They use a factored regression approach: \[ f(Y, M, X) = f(M|Y, X) \times f(Y|X) \times f(X) \]
The key is how the model defines the distribution of the missing Y values, \(f(Y_{i(mis)} | M_i^*, X_i)\)
This distribution becomes conditional on both the focal model parameters (\(\beta\)s) and the selection model parameters (\(\gamma\)s) \[ E(Y_i|M_i^*, X_i) = \text{Var}(Y_i|M_i^*, X_i) \times \left( \frac{\gamma_1(M_i^* - \gamma_0 - \gamma_2 X_i)}{\sigma_r^2} + \frac{\beta_0 + \beta_1 X_i}{\sigma_\epsilon^2} \right) \]\[ \text{Var}(Y_i|M_i^*, X_i) = \left( \frac{\gamma_1^2}{\sigma_r^2} + \frac{1}{\sigma_\epsilon^2} \right)^{-1} \]
The coefficient \(\gamma_1\) (linking \(Y_i\) to \(M_i^*\)) introduces the MNAR correction. If \(\gamma_1 = 0\), the model simplifies to MAR
Practical Recommendations
Model Complexity: Diffuse models are harder to estimate and may require larger N than focused models
Specification: While omitting important predictors is generally worse, overfitting (too many predictors in missingness model) reduces precision and power. Be judicious
Building Strategy: Start with a focused model. Add potential predictors (especially those from the focal model or strong auxiliary variables) stepwise, checking model fit and parameter stability (Ibrahim et al., 2005)
Diagnostics: Carefully inspect parameter estimates and standard errors. Watch for:
Large increases in SEs compared to MAR models
Implausibly large \(R^2\) for the missingness model
Convergence issues
Pattern Mixture Models (PMM) for Regression
Contrast: PMMs align with moderated processes (vs. mediated for selection models)
Core Idea:
Define subgroups based on missing data patterns (e.g., completers vs. those missing Y)
Estimate model parameters within each pattern
Compute overall (marginal) parameters as a weighted average of pattern-specific estimates, weighted by pattern proportions (\(\pi^{(pattern)}\)) \[ \text{e.g., } \mu_Y = \pi^{(0)}\mu_Y^{(0)} + \pi^{(1)}\mu_Y^{(1)} \]
PMM: The Challenge of Inestimability
Problem: For patterns with missing data on Y, parameters involving Y (means, variances, covariances/slopes) cannot be estimated directly from the data for that pattern
E.g., In the pattern where Y is missing (\(M=1\)), \(\mu_Y^{(1)}\), \(\sigma_Y^{2(1)}\), \(\beta_1^{(1)}\) are inestimable
Solution Required: Must either:
Provide plausible values for the inestimable parameters
Impose identifying restrictions (borrowing information from other patterns)
PMM Specification: Moderated Regression
Use the missing data indicator (\(M_i\), coded 0/1) as a moderator
Diffuse PMM: Allows intercept and slope(s) to differ by pattern \[ Y_i = \beta_0^{(0)} + \beta_0^{(diff)} M_i + \beta_1^{(0)} X_i + \beta_1^{(diff)} X_i M_i + \epsilon_i \]
\(\beta_0^{(0)}, \beta_1^{(0)}\): Intercept & slope for completers (\(M=0\))
\(\beta_0^{(diff)}, \beta_1^{(diff)}\): Inestimable differences for pattern \(M=1\)
Use standardized effect sizes (e.g., Cohen’s d) as a guide
Intercept Difference (\(\beta_0^{(diff)}\)): Specify a plausible standardized mean difference (d) between patterns at the mean of X. \[ \beta_0^{(diff)} \approx d \times \hat{\sigma}_Y \quad \text{or} \quad d \times \sqrt{\hat{\sigma}_\epsilon^2} \]
Get \(\hat{\sigma}_Y\) or \(\hat{\sigma}_\epsilon^2\) from a preliminary MAR analysis
Slope Difference (\(\beta_1^{(diff)}\)): Specify a plausible difference in standardized slopes (\(d_\Delta\)) between patterns. \[ \beta_1^{(diff)} \approx d_\Delta \times \frac{\hat{\sigma}_Y}{\hat{\sigma}_X} \quad \text{or} \quad d_\Delta \times \frac{\sqrt{\hat{\sigma}_\epsilon^2}}{\hat{\sigma}_X} \]
Choose plausible values for d and \(d_\Delta\) (e.g., ±0.1, ±0.2, ±0.5)
PMM: Assumptions & Pooling
Key Assumption: Correct specification of the pattern differences (which parameters differ, and by how much via specified values or restrictions)
Transparency: Unlike selection models, the assumptions about the missing data mechanism are explicit in the specified differences or restrictions
Pooling: To get marginal estimates (averaging over patterns), you need pattern proportions (\(\pi^{(0)}, \pi^{(1)}\))
Estimate \(\pi\)’s using \(f(M_i)\) (e.g., empty probit model)
Combine estimates and standard errors (e.g., delta method, Bayesian estimation, MI)
PMM for Sensitivity Analysis
Challenge: Choosing the “correct” single value for \(d\) or \(d_\Delta\) is difficult/impossible
Strategy: Fit the PMM repeatedly using a range of plausible values for the inestimable parameters (e.g., \(d\) from -0.5 to +0.5)
Goal: Assess the sensitivity of results to different MNAR assumptions
“How much does the missing data distribution need to differ (\(d, d_\Delta\)) to meaningfully change MAR-based estimates?”
“How big a difference is needed to change substantive conclusions (e.g., significance tests)?”
Summary: MNAR Models (Regression Focus)
MNAR: Missingness depends on unobserved values (focused or diffuse)
Goal: Mitigate bias when MAR is questionable, often via Sensitivity Analysis
Selection Models:
Model \(Pr(M|Y,X)\); Missingness is DV
Assumes bivariate normality of residuals (strict, untestable)
Bias correction via factored regression conditional on \(\gamma\)s
Pattern Mixture Models:
Model \(f(Y|M)\); Missingness defines subgroups/moderates effects
Challenge: Inestimable parameters for patterns missing Y
Solution: Specify differences (e.g., via effect size \(d, d_\Delta\)) or use restrictions
Transparent assumptions, useful for exploring range of MNAR scenarios