Classification Methods: Mixture Models

Latent Class & Latent Profile Analysis

Lecture Outline

Latent Class Analysis (LCA)
- Underlying theory
- Example analysis in R (poLCA)
- Interpreting model parameters
- Assessing model fit
Latent Profile Analysis (LPA)
- Underlying theory
- Example analysis in R (tidyLPA)
- Comparing class solutions

Overview

Clusters Versus Classes

When a researcher mentions cluster analysis, they often mean one of the following:
- K-means clustering
- Hierarchical clustering using distance methods
- Discriminant analysis
- Taxometrics
Much less often, latent class analysis is included in the group
- Although it is also useful for detecting clusters of observations
For this lecture, we will consider clusters and classes to be synonymous
- Clustering is often used to find clusters whereas classification is used to classify observations into known clusters

LCA Versus Other Methods

Although we use the terms classes and clusters synonymously, LCA differs from other methods
LCA is a model-based method for clustering
- LCA fits a statistical model to the data to determine classes
Other methods (K-means, hierarchical) do not explicitly state a statistical model
Being model-based means we make explicit, testable assumptions about our data
- Assumptions can be checked against the observed data

Latent Class Analysis

LCA Introduction

Latent class models are commonly attributed to Lazarsfeld and Henry (1968)
The final number of classes is not predetermined prior to analysis
- The number of classes is determined through comparison of fit statistics after fitting multiple models
- The characteristics of each class are determined following the analysis
- Similar to K-means and hierarchical clustering in this respect

Variable Types Used in LCA

As originally conceived, LCA uses:
- A set of binary-outcome variables — values coded as 0 or 1. Examples include:
  - Test items — scored correct (1) or incorrect (0)
  - True/false questions
  - Presence/absence indicators
  - Any other binary outcome
Extensions allow for ordered categorical or nominal indicators
- This general form is still called LCA regardless of indicator type

LCA Process

For a specified number of classes, LCA estimates:
1. For each class: the probability that each indicator equals one \(\pi_{jc}\)
2. For each observation: the probability of belonging to each class \(\alpha_{ic}\)
  - These sum to 1 across classes for each person
  - This differs from K-means, where class membership is certain
3. Across observations: the overall probability that any observation is in each class \(\eta_c\)

LCA Estimation

LCA estimation differs from other clustering methods:
- Hierarchical clustering creates new distance matrices at each step
- K-means shifts cases between clusters using distance metrics
- LCA uses distributional assumptions to find classes — the distribution provides the measure of “distance”
Estimation is performed via the Expectation-Maximization (EM) algorithm
- E-step: compute posterior class probabilities
- M-step: update model parameters to maximize log-likelihood
- Iterate until convergence
Multiple random starting values are recommended to avoid local maxima

LCA Distributional Assumptions

Because LCA uses binary-outcome variables, it relies on a binary-outcome distribution
Within each latent class, the variables are assumed to:
- Be independent of one another (Local Independence)
- Follow a Bernoulli distribution marginally:

\[f(x_i) = \left(\pi_i \right)^{x_i} \left(1-\pi_i\right)^{(1-x_i)}\]

The Bernoulli distribution describes a single binary event — like flipping a coin with probability \(\pi\) of heads

Bernoulli Distribution Illustration

Consider a single binary test item, \(X\):
- Let \(X = 1\) if a student answers correctly, \(X = 0\) if incorrect
- Suppose the probability of a correct response is \(\pi = 0.75\)
If \(X = 1\), the likelihood is: \[f(x_i=1) = (0.75)^{1}(1-0.75)^{0} = 0.75\]
If \(X = 0\), the likelihood is: \[f(x_i=0) = (0.75)^{0}(1-0.75)^{1} = 0.25\]
For discrete-outcome variables, the likelihood equals the probability of the event occurring

Independent Bernoulli Variables

For independent binary variables, the joint probability is the product of the individual probabilities:

\[P(X_1=x_1, X_2=x_2,\ldots,X_J=x_J) = \prod_{j=1}^{J} \pi_j^{x_j} \left(1-\pi_j\right)^{\left(1-x_j\right)}\]

In LCA, this independence assumption holds within each class (Local Independence)
- Conditional on class membership, knowing one item response gives no additional information about another item response
- Any association among observed variables is explained entirely by the latent class

Finite Mixture Models

LCA models are special cases of Finite Mixture Models (FMM)
A finite mixture model expresses the distribution of X as a weighted sum of class-specific distributions:

\[f(\textbf{X}) = \sum_{g=1}^G \eta_g f(\textbf{X}|g)\]

\(\eta_g\) is the mixing proportion for class \(g\) (must sum to 1)
\(f(\textbf{X}|g)\) is the class-specific distribution of X
In LCA, \(f(\textbf{X}|g)\) is a product of Bernoulli distributions (due to local independence)

Latent Class Analysis as a FMM

A latent class model for \(J\) binary indicators (\(j = 1,\ldots,J\)) with \(C\) classes (\(c = 1,\ldots,C\)):

\[f(\mathbf{x}_i) = \displaystyle {\sum_{c=1}^{C} \eta_c} \prod_{j=1}^{J} \pi_{jc}^{x_{ij}} \left(1-\pi_{jc}\right)^{1-x_{ij}}\]

\(\eta_c\) — probability that any individual is in class \(c\) (must sum to 1)
\(x_{ij}\) — observed binary response of individual \(i\) to item \(j\)
\(\pi_{jc}\) — probability of a positive response (\(x_{ij} = 1\)) for class \(c\) on item \(j\)

Estimation Process

Successfully applying LCA involves answering two key questions:
1. How many classes are present?
  - Fit LCA models with differing numbers of classes
  - Choose based on fit statistics (AIC, BIC, chi-square, entropy)
2. What does each class represent?
  - Inspect the item response probabilities (\(\hat{\pi}_{jc}\)) for the best-fitting solution
  - Look for meaningful patterns distinguishing the classes

LCA in R: The `poLCA` Package

In R, the poLCA package (Linzer & Lewis, 2011) fits LCA models
- “Polytomous Variable Latent Class Analysis”
- Handles binary and polytomous (multi-category) indicators
- Uses EM algorithm with multiple random starts
Install: install.packages("poLCA")

LCA in R: The `poLCA` Package

Argument	Description
`formula`	`cbind(item1, item2, ...) ~ 1`
`data`	Data frame; item values must start at 1, not 0
`nclass`	Number of latent classes to estimate
`nrep`	Number of random starting points (≥ 10 recommended)
`verbose`	`FALSE` to suppress iteration output

LCA Example

LCA Example #1

LCA Example: Macready & Dayton (1977)

Data discussed in Bartholomew and Knott (Latent Variable Models and Factor Analysis)
A four-item binary math test administered to \(N = 142\) students
Macready and Dayton’s goal: classify students into two latent groups:
- Masters — students who have mastered the content
- Non-masters — students who have not mastered the content
We will fit a 2-class LCA model and interpret the results

Variable	Description
`u1`	Math item 1 (0/1)
`u2`	Math item 2 (0/1)
`u3`	Math item 3 (0/1)
`u4`	Math item 4 (0/1)

LCA in R: Data Preparation

  u1 u2 u3 u4
1  2  2  2  2
2  2  2  2  2
3  2  2  2  2
4  2  2  2  2
5  2  2  2  2
6  2  2  2  2


 1  2 
67 75

Important

poLCA requires each item’s values to be positive integers starting at 1. For binary items originally coded 0/1, add 1 to recode them as 1/2 before fitting.

LCA in R: Fitting the 2-Class Model

Conditional item response (column) probabilities,
 by outcome variable, for each class (row) 
 
$u1
           Pr(1)  Pr(2)
class 1:  0.7914 0.2086
class 2:  0.2466 0.7534

$u2
           Pr(1)  Pr(2)
class 1:  0.9317 0.0683
class 2:  0.2197 0.7803

$u3
           Pr(1)  Pr(2)
class 1:  0.9821 0.0179
class 2:  0.5684 0.4316

$u4
           Pr(1)  Pr(2)
class 1:  0.9477 0.0523
class 2:  0.2925 0.7075

Estimated class population shares 
 0.4134 0.5866 
 
Predicted class memberships (by modal posterior prob.) 
 0.4577 0.5423 
 
========================================================= 
Fit for 2 latent classes: 
========================================================= 
number of observations: 142 
number of estimated parameters: 9 
residual degrees of freedom: 6 
maximum log-likelihood: -331.7637 
 
AIC(2): 681.5273
BIC(2): 708.1298
G^2(2): 8.965682 (Likelihood ratio/deviance statistic) 
X^2(2): 9.459244 (Chi-square goodness of fit)

nrep = 10 runs the algorithm 10 times with different random starts
The solution with the highest log-likelihood is retained
Increasing nrep reduces the chance of stopping at a local maximum

LCA Parameter Information Types

Three types of information from an LCA model:

Class size estimates (\(\eta_c\))
- Proportion of the population in each class
Item response probabilities (\(\pi_{jc}\))
- Probability of a correct response to item \(j\) for a person in class \(c\)
Posterior class probabilities (\(\alpha_{ic}\))
- Probability that individual \(i\) belongs to class \(c\), given their responses

Class Size Estimates (\(\eta_c\))

From poLCA output:

Estimated class population shares
 0.5866 0.4134

Predicted class memberships (modal posterior prob.)
 0.5423 0.4577

Class 1: approximately 58.7% of the population (\(\eta_1 = 0.587\))
Class 2: approximately 41.3% of the population (\(\eta_2 = 0.413\))

Note

“Population shares” come from the model. “Predicted memberships” assign each person to their most probable class — these will differ slightly from the model-estimated shares.

Item Response Probability Estimates (\(\pi_{jc}\))

From poLCA output (Pr(2) = probability of correct response):

Conditional item response probabilities by class

$u1      Pr(1)  Pr(2)         $u2      Pr(1)  Pr(2)
 Class 1: 0.247  0.753          Class 1: 0.220  0.780
 Class 2: 0.791  0.209          Class 2: 0.932  0.068

$u3      Pr(1)  Pr(2)         $u4      Pr(1)  Pr(2)
 Class 1: 0.568  0.432          Class 1: 0.292  0.708
 Class 2: 0.982  0.018          Class 2: 0.948  0.052

Pr(2) is the probability of a correct response (original code = 1)
Class 1 has high correct-response probabilities; Class 2 has very low probabilities

Interpreting the Classes

Item	Class 1 \(\pi_{j1}\)	Class 2 \(\pi_{j2}\)
u1	0.753	0.209
u2	0.780	0.068
u3	0.432	0.018
u4	0.708	0.052

Class 1 — high probability of correct response across all items
- \(\Rightarrow\) Masters: students who have mastered the material
Class 2 — very low probability of correct response across all items
- \(\Rightarrow\) Non-masters: students who have not mastered the material

Visualizing Item Response Profiles

Assessing Model Fit

Approaches to Assessing LCA Fit

No single definitive approach — use multiple measures together:
- Absolute fit: model-based \(\chi^2\) tests
- Relative fit: information criteria (AIC, BIC)
- Classification quality: entropy
The goal is to identify a parsimonious, substantively meaningful solution

Model Chi-Squared Test

The \(\chi^2\) test compares observed response pattern frequencies to model-expected frequencies
For \(J\) binary items, there are \(p = 2^J\) possible response patterns
Degrees of freedom = number of response patterns \(-\) model parameters \(-\) 1
The Pearson statistic:

\[\chi^2_p = \sum_r \frac{(O_r - E_r)^2}{E_r}\]

Model Chi-Squared Test Limitations

For \(J\) binary items, there are \(2^J\) possible response patterns
Limitation: invalid when many cells have small expected frequencies
- With \(J = 4\) items: 16 patterns — feasible
- With \(J = 20\) items: over 1 million patterns — infeasible

Chi-Squared in R

[1] 9.459244

[1] 8.965682

NULL

[1] 0.1493499

[1] 0.1755173

Results for the Macready & Dayton 2-class solution:

Pearson chi-square (df = 6):  9.459   p = 0.149
G-squared       (df = 6):     8.966   p = 0.176

Neither test is significant — the 2-class model fits the data adequately.

Log Likelihood

The log-likelihood measures how well the model reproduces the observed data:

\[\log L = \sum_{k=1}^N \log \left( {\sum_{c=1}^{C} \eta_c} \prod_{j=1}^{J} \pi_{jc}^{x_{kj}} \left(1-\pi_{jc}\right)^{1-x_{kj}} \right)\]

Information Criteria

The AIC and BIC penalize the log-likelihood for model complexity:

\[AIC = 2q - 2\log L \qquad BIC = q\log(N) - 2\log L\]

\(q\) = number of free parameters; \(N\) = sample size
Lower values indicate better fit (for comparing models with the same data)

Comparing Models in R

  Classes Parameters LogL AIC BIC
1       1          4 -375 759 770
2       2          9 -332 682 708
3       3         14 -329 687 728

Information Criteria: Model Comparison

Fit statistics for the Macready & Dayton data:

Classes	Parameters	Log L	AIC	BIC
1	4	−373.04	754.08	766.16
2	9	−331.76	681.53	708.13
3	14	−331.49	690.97	733.10

Both AIC and BIC are lowest for the 2-class solution ✓
Adding a third class provides minimal improvement in log-likelihood
The 2-class model is preferred — consistent with Macready and Dayton’s hypothesis

Entropy

Entropy measures how clearly each observation is classified into a class
Based on posterior class probabilities (\(\hat{\alpha}_{ic}\))
The relative entropy (scaled to \([0, 1]\)):
- E near 1: high classification certainty (each person clearly belongs to one class)
- E near 0: high uncertainty (posterior probabilities are diffuse across classes)

\[E = 1 - \frac{-\displaystyle\sum_{i=1}^N \sum_{c=1}^C \hat{\alpha}_{ic} \log \hat{\alpha}_{ic}}{N \log C}\]

Computing Entropy in R

Relative Entropy (2-class): 0.754

Entropy of 0.754 indicates good classification quality
Most students are clearly classified as masters or non-masters

Latent Profile Analysis

LPA Introduction

Latent Profile Analysis (LPA) is the continuous-variable analog to LCA
- Also attributed to Lazarsfeld and Henry (1968)
Uses a set of continuous variables rather than binary indicators
The estimation process is the same: EM algorithm with multiple random starts
The key difference: normal (Gaussian) distributions replace Bernoulli distributions within each class

LPA Process

For a specified number of classes, LPA estimates:
1. For each class: the mean (\(\mu_{jc}\)) and variance (\(\sigma^2_{jc}\)) for each variable
2. For each observation: the probability of belonging to each class \(\alpha_{ic}\)
  - These sum to 1 across classes for each person
3. Across observations: the overall class proportions \(\eta_c\)
The same three types of information as LCA:
- Class sizes, class-specific parameters, and posterior probabilities

LPA Distributional Assumptions

Within each latent class, the variables are assumed to:
- Be independent (local independence assumption, same as LCA)
- Follow a normal distribution marginally:

\[f(x_i) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(\frac{-(x_i - \mu)^2}{2\sigma^2}\right)\]

LPA Distributional Assumptions

Different classes have different means — and possibly different variances
The variance structure across classes is an important modeling choice:
- Equal variances across classes (more parsimonious, common default)
- Unequal variances (more flexible, more parameters to estimate)

Multivariate Normal Distribution

Because LPA assumes independent normal variables within each class, the joint within-class distribution is multivariate normal (MVN) with a diagonal covariance matrix

\[f(\textbf{x}) = \frac{1}{(2\pi)^{p/2}|\boldsymbol{\Sigma}|^{1/2}} \exp\left(-\frac{(\textbf{x}-\boldsymbol{\mu})^T\boldsymbol{\Sigma}^{-1}(\textbf{x}-\boldsymbol{\mu})}{2}\right)\]

\(\boldsymbol{\mu}\) = class mean vector
\(\boldsymbol{\Sigma}\) = covariance matrix (diagonal for standard LPA — no within-class correlations)
Standard notation: \(\mathbf{x} \sim N_p(\boldsymbol{\mu}, \boldsymbol{\Sigma})\)

Within-Class Covariance Structure

In standard LPA, within-class covariance is zero (local independence):
- Knowing one variable gives no additional information about another, given class
When within-class covariance is non-zero:
- Contours become tilted ellipses
The local independence assumption can be checked:
- Fit a model that allows within-class covariances and compare fit via AIC/BIC
- Persistent residual correlations within classes suggest the assumption is violated

Tip

If you suspect within-class correlations, try variances = "equal", covariances = "equal" (Model 2) and compare fit statistics to Model 1.

LPA as a Finite Mixture Model

A latent profile model for \(J\) continuous variables \((j = 1,\ldots,J)\) with \(C\) classes (\(c = 1,\ldots,C\)):

\[f(\mathbf{x}_i) = \displaystyle {\sum_{c=1}^{C} \eta_c} \prod_{j=1}^{J} \frac{1}{\sqrt{2\pi\sigma^2_{jc}}} \exp\left(\frac{-(x_{ij} - \mu_{jc})^2}{2\sigma^2_{jc}}\right)\]

\(\eta_c\) — class proportion (must sum to 1)
\(\mu_{jc}\) — mean of variable \(j\) in class \(c\)
\(\sigma^2_{jc}\) — variance of variable \(j\) in class \(c\)

LPA in R: The `tidyLPA` Package

In R, the tidyLPA package (Rosenberg et al., 2018) fits LPA models
- Wraps the mclust package with a cleaner interface
- Install: install.packages("tidyLPA")

LPA in R: The `tidyLPA` Package

Function	Description
`estimate_profiles()`	Fits one or more LPA models
`get_fit()`	Extracts fit statistics (log L, AIC, BIC, entropy)
`get_estimates()`	Extracts class-specific means and variances
`get_data()`	Returns data with posterior probabilities and class assignments
`plot_profiles()`	Creates a class profile plot

LPA Variance Model Options

Model	Variances	Covariances
1 (default)	Equal across classes	Zero (diagonal)
2	Equal across classes	Equal across classes
3	Varying across classes	Zero (diagonal)

Model 1 matches the standard LPA assumption: equal variances, no within-class covariances

LPA Variance Model Options

Model	Variances	Covariances
4	Varying across classes	Varying across classes
5	Equal across classes	Varying across classes
6	Varying across classes	Equal across classes

More complex models add parameters; use information criteria to compare

LPA Example

LPA Example: Fisher’s Iris Data

Fisher (1936) introduced a dataset of measurements on 150 iris flowers
Four continuous measurements per flower:

Variable	Measurement
x1	Sepal length (cm)
x2	Sepal width (cm)
x3	Petal length (cm)
x4	Petal width (cm)

LPA Example: Fisher’s Iris Data

Three species: Iris setosa, I. versicolor, I. virginica (50 flowers each; \(N = 150\))
We treat species membership as unknown and attempt to recover it via LPA

LPA in R: Loading Data and Fitting Models

# A tibble: 3 × 20
  Model Classes LogLik parameters     n   AIC   AWE   BIC  CAIC   CLC   KIC
  <dbl>   <int>  <dbl>      <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1     1       2  -489.         13   150 1004. 1145. 1043. 1056.  980. 1020.
2     1       3  -361.         18   150  759.  955.  813.  831.  725.  780.
3     1       4  -356.         23   150  758. 1010.  827.  850.  714.  784.
# ℹ 9 more variables: SABIC <dbl>, ICL <dbl>, Entropy <dbl>, prob_min <dbl>,
#   prob_max <dbl>, n_min <dbl>, n_max <dbl>, BLRT_val <dbl>, BLRT_p <dbl>

Note

variances = "equal" and covariances = "zero" specify Model 1 — equal variances across classes, no within-class covariances. This is the standard LPA assumption.

Model Comparison

Fit statistics for the Iris LPA (Model 1):

Classes	Parameters	Log L	AIC	BIC	Entropy
2	13	−488.92	1003.83	1042.97	0.991
3	18	−361.43	758.85	813.04	0.957
4	23	−310.12	666.23	735.48	0.945

AIC and BIC both favor the 4-class solution
But the data were generated from exactly 3 species
This illustrates a known problem with information criteria in mixture modeling:
- They can favor over-extraction (more classes than truly exist)
- Substantive knowledge and external validation are essential
We select the 3-class solution based on prior knowledge

Fitting the 3-Class Model

# A tibble: 24 × 8
   Category  Parameter Estimate      se        p Class Model Classes
   <chr>     <chr>        <dbl>   <dbl>    <dbl> <int> <dbl>   <dbl>
 1 Means     x1          5.01   0.0508  0            1     1       3
 2 Means     x2          3.43   0.0535  0            1     1       3
 3 Means     x3          1.46   0.0218  0            1     1       3
 4 Means     x4          0.246  0.0155  7.87e-57     1     1       3
 5 Variances x1          0.235  0.0261  2.38e-19     1     1       3
 6 Variances x2          0.107  0.0143  6.57e-14     1     1       3
 7 Variances x3          0.187  0.0254  1.82e-13     1     1       3
 8 Variances x4          0.0379 0.00572 3.39e-11     1     1       3
 9 Means     x1          5.92   0.0750  0            2     1       3
10 Means     x2          2.75   0.0443  0            2     1       3
# ℹ 14 more rows

Fitting the 3-Class Model

# A tibble: 6 × 10
  model_number classes_number    x1    x2    x3    x4 CPROB1   CPROB2   CPROB3
         <dbl>          <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1            1              3   5.1   3.5   1.4   0.2      1 5.37e-20 3.72e-44
2            1              3   4.9   3     1.4   0.2      1 5.82e-19 5.84e-44
3            1              3   4.7   3.2   1.3   0.2      1 1.63e-20 7.19e-46
4            1              3   4.6   3.1   1.5   0.2      1 4.45e-19 4.35e-44
5            1              3   5     3.6   1.4   0.2      1 1.94e-20 1.25e-44
6            1              3   5.4   3.9   1.7   0.4      1 4.69e-16 8.18e-37
# ℹ 1 more variable: Class <dbl>

The 3-Class Solution: Class Sizes

From get_fit() output:

FINAL CLASS COUNTS AND PROPORTIONS
   Class 1:  50.000  (0.333)
   Class 2:  54.888  (0.366)
   Class 3:  45.112  (0.301)

Three approximately equal classes, consistent with the true species distribution (50 per species)
The model has correctly identified the rough size structure

The 3-Class Solution: Class Means

Class-specific means from get_estimates():

Variable	Class 1	Class 2	Class 3
x1 (Sepal Length)	5.01	5.92	6.68
x2 (Sepal Width)	3.43	2.75	3.02
x3 (Petal Length)	1.46	4.33	5.61
x4 (Petal Width)	0.25	1.35	2.07

Class 1: Very small petals, wide sepals → Iris setosa
Class 2: Medium-sized flowers, narrower sepals → Iris versicolor
Class 3: Largest petals and sepals → Iris virginica

Visualizing Class Profiles

Evaluating Classification Quality

Class 1  Class 2  Class 3
  1.000    0.970    0.966

Class 1 (setosa) is perfectly separated from the others
Classes 2 and 3 have 3–4% overlap — the two larger species are harder to distinguish
Matches the known botanical overlap between versicolor and virginica

Concluding Remarks

LCA vs. LPA: Comparison

Feature	LCA	LPA
Indicator type	Binary / categorical	Continuous
Within-class distribution	Bernoulli	Normal
Class parameters	Item probabilities (\(\pi_{jc}\))	Means (\(\mu_{jc}\)) and variances (\(\sigma^2_{jc}\))
R package	`poLCA`	`tidyLPA`

Extensions of These Methods

Many extensions exist in recent empirical research:
- Growth Mixture Models — detect groups with differing growth trajectories over time
- Diagnostic Classification Models — confirmatory LCA specifying class characteristics prior to analysis; used in psychological and educational testing
- Mixture IRT Models — combine item response theory with latent class structure
- Mixture Regression Models — latent classes defined by distinct regression relationships
- General Finite Mixture Models — virtually any statistical distribution can form a mixture model

Concluding Remarks

LCA and LPA are model-based techniques for identifying latent clusters
Both methods make explicit, testable assumptions about the data
Key practical challenges:
- The number of classes must be inferred — no single definitive criterion
- Information criteria often favor over-extraction (too many classes)
- Multiple local maxima are possible — always use many random starts (nrep)
Use these methods carefully — they are exploratory and prone to spurious findings
- Validate results with substantive knowledge
- Seek convergence across multiple fit indices
- Replicate with independent samples when possible

References

Lazarsfeld, P. F. & Henry, N. W. (1968). Latent structure analysis. Houghton Mifflin.
Linzer, D. A. & Lewis, J. B. (2011). poLCA: An R package for polytomous variable latent class analysis. Journal of Statistical Software, 42(10), 1–29.
Macready, G. B. & Dayton, C. M. (1977). The use of probabilistic models in the assessment of mastery. Journal of Educational Statistics, 2(2), 99–120.
Rosenberg, J. M., van Lissa, C. J., Beymer, P. N., Anderson, D. J., Schell, M. J., & Schmidt, J. A. (2018). tidyLPA: An R package to easily carry out latent profile analysis (LPA). Journal of Open Source Software, 3(30), 978.

Classification Methods: Mixture Models

Lecture Outline

Overview

Clusters Versus Classes

LCA Versus Other Methods

Latent Class Analysis

LCA Introduction

Variable Types Used in LCA

LCA Process

LCA Estimation

LCA Distributional Assumptions

Bernoulli Distribution Illustration

Independent Bernoulli Variables

Finite Mixture Models

Latent Class Analysis as a FMM

Estimation Process

LCA in R: The poLCA Package

LCA in R: The poLCA Package

LCA Example

LCA Example: Macready & Dayton (1977)

LCA in R: Data Preparation

LCA in R: Fitting the 2-Class Model

LCA Parameter Information Types

Class Size Estimates (\(\eta_c\))

Item Response Probability Estimates (\(\pi_{jc}\))

Interpreting the Classes

Visualizing Item Response Profiles

Assessing Model Fit

Approaches to Assessing LCA Fit

Model Chi-Squared Test

Model Chi-Squared Test Limitations

Chi-Squared in R

Log Likelihood

Information Criteria

Comparing Models in R

Information Criteria: Model Comparison

Entropy

Computing Entropy in R

Latent Profile Analysis

LPA Introduction

LPA Process

LPA Distributional Assumptions

LPA Distributional Assumptions

Multivariate Normal Distribution

Within-Class Covariance Structure

LPA as a Finite Mixture Model

LPA in R: The tidyLPA Package

LPA in R: The tidyLPA Package

LPA Variance Model Options

LPA Variance Model Options

LPA Example

LPA Example: Fisher’s Iris Data

LPA Example: Fisher’s Iris Data

LPA in R: Loading Data and Fitting Models

Model Comparison

Fitting the 3-Class Model

Fitting the 3-Class Model

The 3-Class Solution: Class Sizes

The 3-Class Solution: Class Means

Visualizing Class Profiles

Visualizing Class Profiles

Evaluating Classification Quality

Concluding Remarks

LCA vs. LPA: Comparison

Extensions of These Methods

Concluding Remarks

References

LCA in R: The `poLCA` Package

LCA in R: The `poLCA` Package

LPA in R: The `tidyLPA` Package

LPA in R: The `tidyLPA` Package