Missing Not At Random

Lecture 10: April 30, 2025

Chapter 9 Overview

This chapter focuses on Missing Not At Random (MNAR) processes
Unlike MAR (Missing At Random), MNAR assumes unseen (missing) values do carry unique information about why they are missing
The MAR assumption is convenient but may not always hold

MNAR: The Challenge

Definition: The probability of missingness is related to the unseen score values themselves
Problem: Standard analyses assuming MAR can lead to biased results if the data are actually MNAR
Goal: Mitigate nonresponse bias when MNAR is suspected

Two Major MNAR Modeling Frameworks

Selection Models:
- Model the probability of missingness as an outcome
- Use a regression equation with a missing data indicator as the dependent variable
- Treat missingness as dependent on the analysis variables
Pattern Mixture Models:
- Model data patterns based on missingness
- Use the missing data indicator as a predictor or moderator
- Assume different parameters for different missing data patterns

Key Considerations & Sensitivity Analysis

Strict Assumptions: MNAR models require strong, often unverifiable, assumptions (e.g., distributional assumptions for selection models, specifying inestimable parameters for pattern mixture models)
Risk of Misspecification: Incorrect MNAR models can potentially increase bias compared to MAR analyses
Best Use Case: Often used for Sensitivity Analysis - examining how results change under different missing data assumptions (MAR vs. various MNAR scenarios)

MNAR Processes Revisited

MNAR Definition: The probability of missingness, \(Pr(M_i=1)\), is related to the unseen score values, \(Y_{i(mis)}\)
Gomer & Yuan (2021) propose a useful distinction:
- Focused MNAR
- Diffuse MNAR
This distinction helps structure sensitivity analyses

Focused MNAR

Definition: Missingness depends only on the unseen score values in \(Y_{i(mis)}\)
Conditional Distribution: \[ Pr(M_i = 1 | Y_{i(mis)}, \Phi) \] Where:
- \(M_i\) is the vector of missingness indicators for individual \(i\)
- \(Y_{i(mis)}\) are the unseen score values for individual \(i\)
- \(\Phi\) represents model parameters for the missing data process
Example: Unseen depression scores are the sole determinant of missingness (e.g., participants with worst symptoms leave)

Diffuse MNAR

Definition: Missingness depends on both the unseen score values (\(Y_{i(mis)}\)) and the observed data (\(Y_{i(obs)}\))
Conditional Distribution: \[ Pr(M_i = 1 | Y_{i(obs)}, Y_{i(mis)}, \Phi) \] Where:
- \(Y_{i(obs)}\) are the observed score values for individual \(i\)
- Other terms are as defined previously
Example: Participants with worst symptoms (\(Y_{i(mis)}\)) leave and participants with low perceived control (\(Y_{i(obs)}\)) miss assessments due to daily disruptions

Implications of Focused vs. Diffuse MNAR

The distinction implies nonresponse is entangled with the outcome variable
Diffuse processes:
- Generally considered harder to model (e.g., may require larger sample sizes)
- Potentially capable of inducing greater nonresponse bias than focused processes
Understanding the potential type of MNAR can guide modeling choices

Major Modeling Frameworks

How do we model MNAR processes? Two main approaches:
1. Selection Models
2. Pattern Mixture Models
Both aim to mitigate nonresponse bias by modeling the occurrence of missing data
They start by factorizing the joint distribution of data (\(Y_i\)) and missingness indicators (\(M_i\))

Selection Models

Factorization: \[ f(Y_i, M_i) = f(M_i | Y_i) \times f(Y_i) \]
- \(f(M_i | Y_i)\): Model for missingness given the data (e.g., logistic/probit regression of M on Y, X)
- \(f(Y_i)\): The primary analysis model (e.g., regression of Y on X)
Interpretation: Treats missingness (\(M\)) as an outcome that depends on the analysis variables (\(Y\), potentially \(X\))
Analogy: Similar to statistical mediation (e.g., X -> Y -> M)

Pattern Mixture Models

Factorization: \[ f(Y_i, M_i) = f(Y_i | M_i) \times f(M_i) \]
- \(f(Y_i | M_i)\): Analysis model where parameters differ based on missing data pattern (\(M\))
- \(f(M_i)\): Model describing the proportions of different missing data patterns
Interpretation: Treats missingness (\(M\)) as a predictor or moderator. Participants with different missing data patterns form distinct subgroups with potentially different parameter values
Analogy: Similar to moderation or group comparisons

Selection vs. Pattern Mixture: Key Differences

Equivalence: Both attempt to describe the same joint distribution \(f(Y_i, M_i)\)
Differences:
- Require different inputs and assumptions
- Can yield different estimates for the same data
- Tell different “stories” about the missing data process:
  - Selection: Missingness is an outcome
  - Pattern Mixture: Missingness is a grouping/moderating variable
Either (or both) could be plausible for a given application

Selection Models for Multiple Regression

Origins: Trace back to Heckman’s (1976, 1979) work on sample selection bias
Core Idea: Pair the focal analysis model (e.g., regression) with an additional model for the missing data indicator (M)
Modern Approach: Typically uses likelihood or Bayesian estimation via factored regressions, rather than Heckman’s original two-step method

The Missingness Model (Probit)

Assumes an underlying latent variable (\(M_i^*\)) representing the propensity for missingness
\(M_i^*\) is typically assumed to be normally distributed
A threshold (\(\tau\)) determines the observed missingness status (\(M_i\)): \[ M_i = \begin{cases} 0 & \text{if } M_i^* \le \tau \\ 1 & \text{if } M_i^* > \tau \end{cases} \]
For identification:
- Threshold \(\tau\) is usually fixed at 0
- Residual variance of \(M_i^*\) is fixed at 1 (\(\sigma_r^2 = 1\))

Focused Selection Model

Focal Model: Standard regression \[ Y_i = \beta_0 + \beta_1 X_i + \epsilon_i \quad \text{where } \epsilon_i \sim N(0, \sigma_\epsilon^2) \]
Missingness Model: Only the dependent variable (\(Y_i\)) predicts missingness \[ M_i^* = \gamma_0 + \gamma_1 Y_i + r_i \quad \text{where } r_i \sim N(0, 1) \]
Interpretation: Missingness depends only on the value of the variable that is missing
Analogy: Single mediator model (X influences M only through Y)

Diffuse Selection Model

Focal Model: Standard regression (same as focused) \[ Y_i = \beta_0 + \beta_1 X_i + \epsilon_i \quad \text{where } \epsilon_i \sim N(0, \sigma_\epsilon^2) \]
Missingness Model: Both the dependent (\(Y_i\)) and independent (\(X_i\)) variables predict missingness \[ M_i^* = \gamma_0 + \gamma_1 Y_i + \gamma_2 X_i + r_i \quad \text{where } r_i \sim N(0, 1) \]
Interpretation: Missingness depends on the value of the missing variable and other variables in the model
Analogy: Partially mediated model (X influences M directly and through Y)

Important Assumptions

Bivariate Normality: The residuals of the focal model (\(\epsilon_i\)) and the missingness model (\(r_i\)) must follow a bivariate normal distribution
- This is a strict and untestable assumption
- It’s the “glue” holding the models together, especially when \(Y_i\) predicts \(M_i^*\)
Correct Specification of Missingness Model:
- Must include the correct predictors of missingness
- Must use the correct functional form (e.g., linear, curvilinear). Example for curvilinear effect of Y: \[ M_i^* = \gamma_0 + \gamma_1 Y_i + \gamma_2 Y_i^2 + r_i \]
- Omitting important predictors \(\rightarrow\) Bias
- Including unnecessary predictors \(\rightarrow\) Noisy estimates, reduced precision
- Auxiliary variables (predict M but not Y, or vice-versa) can help identification and estimation

How Selection Models Mitigate Bias

They use a factored regression approach: \[ f(Y, M, X) = f(M|Y, X) \times f(Y|X) \times f(X) \]
The key is how the model defines the distribution of the missing Y values, \(f(Y_{i(mis)} | M_i^*, X_i)\)
This distribution becomes conditional on both the focal model parameters (\(\beta\)s) and the selection model parameters (\(\gamma\)s) \[ E(Y_i|M_i^*, X_i) = \text{Var}(Y_i|M_i^*, X_i) \times \left( \frac{\gamma_1(M_i^* - \gamma_0 - \gamma_2 X_i)}{\sigma_r^2} + \frac{\beta_0 + \beta_1 X_i}{\sigma_\epsilon^2} \right) \] \[ \text{Var}(Y_i|M_i^*, X_i) = \left( \frac{\gamma_1^2}{\sigma_r^2} + \frac{1}{\sigma_\epsilon^2} \right)^{-1} \]
The coefficient \(\gamma_1\) (linking \(Y_i\) to \(M_i^*\)) introduces the MNAR correction. If \(\gamma_1 = 0\), the model simplifies to MAR

Practical Recommendations

Model Complexity: Diffuse models are harder to estimate and may require larger N than focused models
Specification: While omitting important predictors is generally worse, overfitting (too many predictors in missingness model) reduces precision and power. Be judicious
Building Strategy: Start with a focused model. Add potential predictors (especially those from the focal model or strong auxiliary variables) stepwise, checking model fit and parameter stability (Ibrahim et al., 2005)
Diagnostics: Carefully inspect parameter estimates and standard errors. Watch for:
- Large increases in SEs compared to MAR models
- Implausibly large \(R^2\) for the missingness model
- Convergence issues

Pattern Mixture Models (PMM) for Regression

Contrast: PMMs align with moderated processes (vs. mediated for selection models)
Core Idea:
1. Define subgroups based on missing data patterns (e.g., completers vs. those missing Y)
2. Estimate model parameters within each pattern
3. Compute overall (marginal) parameters as a weighted average of pattern-specific estimates, weighted by pattern proportions (\(\pi^{(pattern)}\)) \[ \text{e.g., } \mu_Y = \pi^{(0)}\mu_Y^{(0)} + \pi^{(1)}\mu_Y^{(1)} \]

PMM: The Challenge of Inestimability

Problem: For patterns with missing data on Y, parameters involving Y (means, variances, covariances/slopes) cannot be estimated directly from the data for that pattern
- E.g., In the pattern where Y is missing (\(M=1\)), \(\mu_Y^{(1)}\), \(\sigma_Y^{2(1)}\), \(\beta_1^{(1)}\) are inestimable
Solution Required: Must either:
- Provide plausible values for the inestimable parameters
- Impose identifying restrictions (borrowing information from other patterns)

PMM Specification: Moderated Regression

Use the missing data indicator (\(M_i\), coded 0/1) as a moderator
Diffuse PMM: Allows intercept and slope(s) to differ by pattern \[ Y_i = \beta_0^{(0)} + \beta_0^{(diff)} M_i + \beta_1^{(0)} X_i + \beta_1^{(diff)} X_i M_i + \epsilon_i \]
- \(\beta_0^{(0)}, \beta_1^{(0)}\): Intercept & slope for completers (\(M=0\))
- \(\beta_0^{(diff)}, \beta_1^{(diff)}\): Inestimable differences for pattern \(M=1\)
Focused PMM: Allows only intercept to differ (common slope \(\beta_1\)) \[ Y_i = \beta_0^{(0)} + \beta_0^{(diff)} M_i + \beta_1 X_i + \epsilon_i \]
- Only \(\beta_0^{(diff)}\) is inestimable here

Specifying Inestimable Parameters (Heuristic)

Use standardized effect sizes (e.g., Cohen’s d) as a guide
Intercept Difference (\(\beta_0^{(diff)}\)): Specify a plausible standardized mean difference (d) between patterns at the mean of X. \[ \beta_0^{(diff)} \approx d \times \hat{\sigma}_Y \quad \text{or} \quad d \times \sqrt{\hat{\sigma}_\epsilon^2} \]
- Get \(\hat{\sigma}_Y\) or \(\hat{\sigma}_\epsilon^2\) from a preliminary MAR analysis
Slope Difference (\(\beta_1^{(diff)}\)): Specify a plausible difference in standardized slopes (\(d_\Delta\)) between patterns. \[ \beta_1^{(diff)} \approx d_\Delta \times \frac{\hat{\sigma}_Y}{\hat{\sigma}_X} \quad \text{or} \quad d_\Delta \times \frac{\sqrt{\hat{\sigma}_\epsilon^2}}{\hat{\sigma}_X} \]
- Choose plausible values for d and \(d_\Delta\) (e.g., ±0.1, ±0.2, ±0.5)

PMM: Assumptions & Pooling

Key Assumption: Correct specification of the pattern differences (which parameters differ, and by how much via specified values or restrictions)
Transparency: Unlike selection models, the assumptions about the missing data mechanism are explicit in the specified differences or restrictions
Pooling: To get marginal estimates (averaging over patterns), you need pattern proportions (\(\pi^{(0)}, \pi^{(1)}\))
- Estimate \(\pi\)’s using \(f(M_i)\) (e.g., empty probit model)
- Combine estimates and standard errors (e.g., delta method, Bayesian estimation, MI)

PMM for Sensitivity Analysis

Challenge: Choosing the “correct” single value for \(d\) or \(d_\Delta\) is difficult/impossible
Strategy: Fit the PMM repeatedly using a range of plausible values for the inestimable parameters (e.g., \(d\) from -0.5 to +0.5)
Goal: Assess the sensitivity of results to different MNAR assumptions
- “How much does the missing data distribution need to differ (\(d, d_\Delta\)) to meaningfully change MAR-based estimates?”
- “How big a difference is needed to change substantive conclusions (e.g., significance tests)?”

Summary: MNAR Models (Regression Focus)

MNAR: Missingness depends on unobserved values (focused or diffuse)
Goal: Mitigate bias when MAR is questionable, often via Sensitivity Analysis
Selection Models:
- Model \(Pr(M|Y,X)\); Missingness is DV
- Assumes bivariate normality of residuals (strict, untestable)
- Bias correction via factored regression conditional on \(\gamma\)s
Pattern Mixture Models:
- Model \(f(Y|M)\); Missingness defines subgroups/moderates effects
- Challenge: Inestimable parameters for patterns missing Y
- Solution: Specify differences (e.g., via effect size \(d, d_\Delta\)) or use restrictions
- Transparent assumptions, useful for exploring range of MNAR scenarios