PSQF 6249: Example 03

CFA Example Using Forgiveness of Situations (N = 1103)

The Forgiveness of Situations Subscale includes six items, three of which are reverse-coded, on a seven-point scale:

When things go wrong for reasons that can’t be controlled, I get stuck in negative thoughts about it. (R)
With time I can be understanding of bad circumstances in my life.
If I am disappointed by uncontrollable circumstances in my life, I continue to think negatively about them. (R)
I eventually make peace with bad situations in my life.
It’s really hard for me to accept negative situations that aren’t anybody’s fault. (R)
Eventually I let go of negative thoughts about bad circumstances that are beyond anyone’s control.

Response Anchors:

1 = Almost Always False of Me
2 = ?
3 = More Often False of Me
4 = ?
5 = More Often True of Me
6 = ?
7 = Almost Always True of Me

Package Installation and Loading

For this example, we will be using the ggplot2 (for general plotting), melt2 (for data reshaping before plotting), and lavaan (for CFA) packages.

if (require(ggplot2)==FALSE){
  install.packages("ggplot2")
}

## Loading required package: ggplot2

library(ggplot2)

if (require(reshape2) == FALSE){
  install.packages("reshape2")
}

## Loading required package: reshape2

if (require(lavaan) == FALSE){
  install.packages("lavaan")
}

## Loading required package: lavaan

## This is lavaan 0.6-19
## lavaan is FREE software! Please report any bugs.

library(lavaan)

if (require(psych) == FALSE){
  install.packages("psych")
}

## Loading required package: psych

## 
## Attaching package: 'psych'

## The following object is masked from 'package:lavaan':
## 
##     cor2cov

## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha

if (require(knitr) == FALSE){
  install.packages("knitr")
}

## Loading required package: knitr

if (require(kableExtra) == FALSE){
  install.packages("kableExtra")
}

## Loading required package: kableExtra

if (require(semPlot) == FALSE){
  install.packages("semPlot")
}

## Loading required package: semPlot

Data Import into R

The data are in a text file named Study2.dat originally used in Mplus (so no column names were included at the top of the file). The file contains more items than we will use, so we select only the “Forgiveness of Situations” items from the whole file.

#read in data file (Mplus file format so having to label columns)
hfsData = read.table(file = "Study2.dat", header = FALSE, na.strings = "99999", col.names = c("PersonID", "Self1", "Self2r", "Self3", "Self4r", "Self5", "Self6r", "Other1r", "Other2", "Other3r", "Other4", "Other5r", "Other6", "Sit1r", "Sit2", "Sit3r", "Sit4", "Sit5r", "Sit6", "Selfsub", "Othsub", "Sitsub", "HFSsum"))

#select Situations items and PersonID variables
hfsSituations = hfsData[c("PersonID", "Sit1r", "Sit2", "Sit3r", "Sit4", "Sit5r", "Sit6")]

Observed Sample Statistics

Sample Correlation Matrix

The observed correlation matrix, rounded to three digits:

#here the c() function selects only the variables, not the PersionID variable
round(cor(hfsSituations[c("Sit1r", "Sit2", "Sit3r", "Sit4", "Sit5r", "Sit6")]), digits = 3)

##       Sit1r  Sit2 Sit3r  Sit4 Sit5r  Sit6
## Sit1r 1.000 0.240 0.647 0.300 0.453 0.297
## Sit2  0.240 1.000 0.317 0.570 0.255 0.457
## Sit3r 0.647 0.317 1.000 0.369 0.482 0.356
## Sit4  0.300 0.570 0.369 1.000 0.289 0.448
## Sit5r 0.453 0.255 0.482 0.289 1.000 0.304
## Sit6  0.297 0.457 0.356 0.448 0.304 1.000

Sample Means and Variances

The observed means, rounded to three digits:

apply(X = hfsSituations[c("Sit1r", "Sit2", "Sit3r", "Sit4", "Sit5r", "Sit6")], MARGIN = 2, FUN = function(x) round(mean(x), digits = 3))

## Sit1r  Sit2 Sit3r  Sit4 Sit5r  Sit6 
## 4.547 5.289 4.896 5.359 4.860 5.321

The observed variances (using \(N\) in the denominator to match ML estimated output and Mplus example), rounded to three digits:

apply(X = hfsSituations[c("Sit1r", "Sit2", "Sit3r", "Sit4", "Sit5r", "Sit6")], MARGIN = 2, FUN = function(x) round(var(x, na.rm = TRUE)*1102/1103, digits = 3))

## Sit1r  Sit2 Sit3r  Sit4 Sit5r  Sit6 
## 3.049 1.903 2.543 1.967 2.945 2.341

Sample Covariances

To do a CFA analysis, you only really need means, variances, and either correlations or covariances among items. That said, modern methods of estimation use the raw data (often called full information) rather than the summary statistics as the raw data enable better missing data assumptions when using maximum likelihood and Bayesian estimation methods.

The sample covariance matrix can be found from the sample correlations and variances. Each covariance between a pair of variables \(y_1\) and \(y_2\) is denoted with a \(\sigma_{y_1, y_2}\) and each correlation is denoted with a \(\rho_{y_1, y_2}\). The variance of a variable is denoted by \(\sigma^2_{y_1}\) and the standard deviation of a variable is the square root of the variance \(\sqrt{\sigma^2_{y_1}}\). The covariance can be found by taking the correlation and multiplying it by the product of the standard deviations.

\[\sigma_{y_1, y_2} = \rho_{y_1, y_2}\sqrt{\sigma^2_{y_1}}\sqrt{\sigma^2_{y_2}}. \]

Inversely, the correlation can be found by taking the covariance and dividing it by the product of the standard deviations:

\[\rho_{y_1, y_2} = \frac{\sigma_{y_1, y_2}}{\sqrt{\sigma^2_{y_1}}\sqrt{\sigma^2_{y_2}}}. \] Again, we change the denominator from \(N-1\) to \(N\) to be consistent with the Mplus example, which calculates covariances using maximum likelihood.

round(cov(hfsSituations[c("Sit1r", "Sit2", "Sit3r", "Sit4", "Sit5r", "Sit6")])*1102/1103, digits = 3)

##       Sit1r  Sit2 Sit3r  Sit4 Sit5r  Sit6
## Sit1r 3.049 0.577 1.802 0.734 1.358 0.795
## Sit2  0.577 1.903 0.697 1.103 0.604 0.965
## Sit3r 1.802 0.697 2.543 0.824 1.319 0.868
## Sit4  0.734 1.103 0.824 1.967 0.695 0.962
## Sit5r 1.358 0.604 1.319 0.695 2.945 0.798
## Sit6  0.795 0.965 0.868 0.962 0.798 2.341

Sample Item Response Distributions

The assumptions of CFA (i.e., normally distributed factors, no item-level factor interactions, and conditionally normal distributed items) lead to the overall assumption that our item responses must be normally distributed. From the histograms below, do you think these are normally distributed?

#stack data
melted = melt(hfsSituations, id.vars = "PersonID")

#plot by variable
ggplot(melted, aes(value)) + geom_bar() + facet_wrap(~ variable)

Lavaan Syntax

lavaan syntax is constructed using a long text string and saved in a character object. In the example below model01SyntaxLong = " opens the text string, which continues for multiple lines until the final " terminates the string. The variable model01Syntax now contains the text of the lavaan syntax. Within the syntax string, the R comment character # still functions to comment text around the syntax. Each part of the model syntax corresponds to a set of parameters in the CFA model. You can find information about lavaan at http://lavaan.ugent.be.

model01SyntaxLong = "

# Model 1 --  Fully Z-Scored Factor Identification Approach

# Item factor loadings --> list the factor to the left of the =~ and the items to the right, separated by a plus
# Once the factor is defined it can be used in syntax as if it is an observed variable
# Parameters can be fixed to constant values by using the * command; starting values can be specified by using start()*; labels can be implemented with []

Sit =~ Sit1r + Sit2 + Sit3r + Sit4 + Sit5r + Sit6

# Item intercepts --> ~ 1 indicates means or intercepts
# You can put multiple lines of syntax on a single line using a ; 
Sit1r ~ 1; Sit2 ~ 1; Sit3r  ~ 1; Sit4 ~ 1; Sit5r ~ 1; Sit6 ~ 1;

# Item error (unique) variances and covariances --> use the ~~ command
Sit1r ~~ Sit1r; Sit2 ~~ Sit2; Sit3r ~~ Sit3r; Sit4 ~~ Sit4; Sit5r ~~ Sit5r; Sit6 ~~ Sit6; 

# Factor variance
Sit ~~ Sit

# Factor mean (intercept)
Sit ~ 0

"

To run lavaan, the syntax string variable is passed to the lavaan(model = model01SyntaxLong, ...) function. In the function call, we also supply:

data = hfsSituations: The data frame which contains the variables in the syntax.
estimator = "MLR": Enables the use of robust maximum likelihood estimation, which helps for data that are not entirely normal.
mimic = "mplus": Ensures compatibility with Mplus output, which is often needed in practice (and is certainly needed for our homework system).
std.lv = TRUE: Uses the Z-score method of identification, setting all latent variable means to zero, all latent variable variances to one, and estimating all factor loadings.

model01Estimates = lavaan(model = model01SyntaxLong, data = hfsSituations, estimator = "MLR", mimic = "mplus", std.lv = TRUE)

If the model estimation was successful, no errors would be displayed (and no output would be shown). All model information and results are contained within the model01Estimates object. To access a summary of the results use the summary() function with the following arguments:

model01Estimates: The analysis to be summarized.
fit.measures=TRUE: Provides model fit indices in summary.
rsqaure=TRUE: Provides the proportion of variance “explained” for each variable (\(R^2\)).
standardized = TRUE: Provides standardized estimates in the summary.

summary(model01Estimates, fit.measures = TRUE, rsquare = TRUE, standardized = TRUE)

## lavaan 0.6-19 ended normally after 20 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        18
## 
##   Number of observations                          1103
##   Number of missing patterns                         1
## 
## Model Test User Model:
##                                               Standard      Scaled
##   Test Statistic                               427.937     307.803
##   Degrees of freedom                                 9           9
##   P-value (Chi-square)                           0.000       0.000
##   Scaling correction factor                                  1.390
##     Yuan-Bentler correction (Mplus variant)                       
## 
## Model Test Baseline Model:
## 
##   Test statistic                              1981.034    1128.693
##   Degrees of freedom                                15          15
##   P-value                                        0.000       0.000
##   Scaling correction factor                                  1.755
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.787       0.732
##   Tucker-Lewis Index (TLI)                       0.645       0.553
##                                                                   
##   Robust Comparative Fit Index (CFI)                         0.789
##   Robust Tucker-Lewis Index (TLI)                            0.648
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)             -11536.404  -11536.404
##   Scaling correction factor                                  1.416
##       for the MLR correction                                      
##   Loglikelihood unrestricted model (H1)     -11322.435  -11322.435
##   Scaling correction factor                                  1.407
##       for the MLR correction                                      
##                                                                   
##   Akaike (AIC)                               23108.808   23108.808
##   Bayesian (BIC)                             23198.912   23198.912
##   Sample-size adjusted Bayesian (SABIC)      23141.739   23141.739
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.205       0.173
##   90 Percent confidence interval - lower         0.189       0.160
##   90 Percent confidence interval - upper         0.222       0.188
##   P-value H_0: RMSEA <= 0.050                    0.000       0.000
##   P-value H_0: RMSEA >= 0.080                    1.000       1.000
##                                                                   
##   Robust RMSEA                                               0.204
##   90 Percent confidence interval - lower                     0.184
##   90 Percent confidence interval - upper                     0.225
##   P-value H_0: Robust RMSEA <= 0.050                         0.000
##   P-value H_0: Robust RMSEA >= 0.080                         1.000
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.086       0.086
## 
## Parameter Estimates:
## 
##   Standard errors                             Sandwich
##   Information bread                           Observed
##   Observed information based on                Hessian
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   Sit =~                                                                
##     Sit1r             1.234    0.069   17.906    0.000    1.234    0.707
##     Sit2              0.702    0.074    9.441    0.000    0.702    0.509
##     Sit3r             1.241    0.063   19.847    0.000    1.241    0.778
##     Sit4              0.784    0.069   11.333    0.000    0.784    0.559
##     Sit5r             1.023    0.053   19.179    0.000    1.023    0.596
##     Sit6              0.819    0.069   11.942    0.000    0.819    0.535
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .Sit1r             4.547    0.053   86.474    0.000    4.547    2.604
##    .Sit2              5.289    0.042  127.346    0.000    5.289    3.834
##    .Sit3r             4.896    0.048  101.959    0.000    4.896    3.070
##    .Sit4              5.359    0.042  126.896    0.000    5.359    3.821
##    .Sit5r             4.860    0.052   94.060    0.000    4.860    2.832
##    .Sit6              5.321    0.046  115.492    0.000    5.321    3.477
##     Sit               0.000                               0.000    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .Sit1r             1.526    0.149   10.217    0.000    1.526    0.500
##    .Sit2              1.409    0.128   11.014    0.000    1.409    0.741
##    .Sit3r             1.004    0.135    7.456    0.000    1.004    0.395
##    .Sit4              1.352    0.127   10.672    0.000    1.352    0.687
##    .Sit5r             1.899    0.118   16.025    0.000    1.899    0.645
##    .Sit6              1.671    0.159   10.517    0.000    1.671    0.714
##     Sit               1.000                               1.000    1.000
## 
## R-Square:
##                    Estimate
##     Sit1r             0.500
##     Sit2              0.259
##     Sit3r             0.605
##     Sit4              0.313
##     Sit5r             0.355
##     Sit6              0.286

Each section contains different portions of model information.

Unstandardized Model Parameter Estmates

FACTOR LOADINGS (regression slopes of item response on factor)

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  Sit =~                                                                
    Sit1r             1.234    0.069   17.906    0.000    1.234    0.707
    Sit2              0.702    0.074    9.441    0.000    0.702    0.509
    Sit3r             1.241    0.063   19.847    0.000    1.241    0.778
    Sit4              0.784    0.069   11.333    0.000    0.784    0.559
    Sit5r             1.023    0.053   19.179    0.000    1.023    0.596
    Sit6              0.819    0.069   11.942    0.000    0.819    0.535

Intercepts (of Items) – HERE, ARE ACTUAL ITEM MEANS BECAUSE FACTOR MEAN IS ZERO

Note: the last term in the list, Sit is the “intercept” (mean) of the factor. As it does not have a standard error, this indicates it was fixed to zero and not estimated.

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .Sit1r             4.547    0.053   86.474    0.000    4.547    2.604
   .Sit2              5.289    0.042  127.346    0.000    5.289    3.834
   .Sit3r             4.896    0.048  101.959    0.000    4.896    3.070
   .Sit4              5.359    0.042  126.896    0.000    5.359    3.821
   .Sit5r             4.860    0.052   94.060    0.000    4.860    2.832
   .Sit6              5.321    0.046  115.492    0.000    5.321    3.477
    Sit               0.000                               0.000    0.000

Residual (Unique) Variances (variance of error terms)

Note: the last term in the list, Sit, is the variance of the factor. As it does not have a standard error, this indicates it was fixed to one (from the std.lv = TRUE option we used in the lavaan() function call) and not estimated.

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .Sit1r             1.526    0.149   10.217    0.000    1.526    0.500
   .Sit2              1.409    0.128   11.014    0.000    1.409    0.741
   .Sit3r             1.004    0.135    7.456    0.000    1.004    0.395
   .Sit4              1.352    0.127   10.672    0.000    1.352    0.687
   .Sit5r             1.899    0.118   16.025    0.000    1.899    0.645
   .Sit6              1.671    0.159   10.517    0.000    1.671    0.714
    Sit               1.000                               1.000    1.000

Making use of the unstandardized model estimates:

Writing out the model—individual predicted values:

\(Y_1 = \mu_1 + \lambda_1F + e_1 = 4.547 + 1.234F + e_1\)

Writing out the model—predicted item variances and covariances:

\(Var\left( Y_1 \right) = \left( \lambda^2_1 \right) Var\left(F\right) + Var\left(e_1\right) = (1.234^2)*(1) + 1.526 = 3.049\) (= original item variance)
\(Cov\left( Y_1, Y_2 \right) = \lambda_1 Var\left(F\right) \lambda_2 = (1.234)*(1)*(.702) = .866\) (actual covariance = .577, so the model over-predicted how related items 1 and 2 should be)

Plotting Path Diagrams

The R package semPlot helps provide a path diagram of a model. Here is an example:

semPaths(object = model01Estimates, what = "est")

Standardized Model Parameter Estimates

The last two columns of the summary output, Std.lv and Std.all, contain the standardized model parameter estimates. To understand the difference between standardized and unstandardized parameter estimates, let’s start with the standardized estimates. Standardized regression coefficients (and factor loadings) have a scale that is “units of Y” per “unit of X”. That is, the slope/loading, represents the increase in the dependent variable \(X\) per unit of the independent variable \(X\) (in our case, the factor). The units of \(Y\) are given by the standard deviation of \(Y\), or \(SD(Y)\). Similarly, the units of \(X\) are given by the standard deviation of \(X\), or \(SD(X)\). You can think of the units associated by the fraction \(\frac{SD(Y)}{SD(X)}\). So, the first factor loading (a value of 1.234 for the item Sit1r) indicates the numeric response to the item goes up by 1.234 for every one-unit increase in the factor Sit.

The process of standardization removes the units attached to the parameter. So, if the unstandardized factor loadings are expressed in units of \(\frac{SD(Y)}{SD(X)}\), the standardized units are achieved by either dividing or multiplying by the appropriate standard deviation to make the numerator or denominator of \(\frac{SD(Y)}{SD(X)}\) equal to one. For the standardized estimates under std.lv, the units of the factor \((SD(X))\) are removed (yielding \(\frac{SD(Y)}{1}\)) by multiplying the estimate by the factor standard deviation. Because the factor standard deviation is set to one, all estimates in this column are the same as the unstandardized estimate. These unstandardized estimates can be used to see how parameters would look under the Z-score identification method another identification method was used.

The standardized estimates listed under std.all are formed by multiplying the estimate by the standard deviation of the factor and dividing that by the unconditional (raw) standard deviation of the item. For instance, the “fully” standardized factor loading of the first item is found by multiplying the unstandardized coefficient (1.234) by one (the factor standard deviation) and dividing by the item’s standard deviation (\(\sqrt{3.049}\) – the square root of the item variance shown at the beginning of this example). The resulting value, \(1.234\times\frac{1}{\sqrt{3.049}} = 0.707\), represents the factor loading would the analysis have been run (a) with a Z-score factor identification and (b) on an item that was a Z-score.

The process of standardization is the same for all parameters of the model: intercepts, loadings, and residual (unique) variances. The interpretation of standardized item intercepts is difficult and often these are not reported. The standardized versions of factor loadings and unique variances are commonly reported.

Moreover, in for items measuring one factor, the standardized loadings lead directly to the \(R^2\) estimate – the amount of variance in the item responses explained by the factor. The item \(R^2\) is found by the square of the unstandardized factor loading. The R-Square information is found at the end of the model summary, so long as the option rsquare = TRUE is specified in the summary() function call.

R-Square:
                   Estimate
    Sit1r             0.500
    Sit2              0.259
    Sit3r             0.605
    Sit4              0.313
    Sit5r             0.355
    Sit6              0.286

Shortening Lavaan Syntax: Lavaan Default Options

The syntax above is designed to show the mapping of all CFA parameters to all lavaan syntax elements. In reality, all you would need to write to define this model is:

model01SyntaxShort = "

Sit =~ Sit1r + Sit2 + Sit3r + Sit4 + Sit5r + Sit6

"

The syntax input into the sem() function uses some defaults:

All item intercepts are estimated (each item gets its own) and the factor mean is fixed at zero.
All item error (unique) variances are estimated (each item gets its own).
All factor variances and covariances are estimated (to do so, the first item after the =~ has its factor loading set to 1 – a marker item identification method)

Similarly, to run this syntax the sem function is now called. To see the results, the summary function is used. The sem function simplifies the syntax needed to conduct a confirmatory factor analysis. These are demonstrated below. Note the output is identical to what was run previously.

model01EstimatesShort = sem(model = model01SyntaxShort, data = hfsSituations, estimator = "MLR", mimic = "mplus", std.lv = TRUE)
summary(model01EstimatesShort, fit.measures = TRUE, rsquare = TRUE, standardized = TRUE)

## lavaan 0.6-19 ended normally after 20 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        18
## 
##   Number of observations                          1103
##   Number of missing patterns                         1
## 
## Model Test User Model:
##                                               Standard      Scaled
##   Test Statistic                               427.937     307.803
##   Degrees of freedom                                 9           9
##   P-value (Chi-square)                           0.000       0.000
##   Scaling correction factor                                  1.390
##     Yuan-Bentler correction (Mplus variant)                       
## 
## Model Test Baseline Model:
## 
##   Test statistic                              1981.034    1128.693
##   Degrees of freedom                                15          15
##   P-value                                        0.000       0.000
##   Scaling correction factor                                  1.755
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.787       0.732
##   Tucker-Lewis Index (TLI)                       0.645       0.553
##                                                                   
##   Robust Comparative Fit Index (CFI)                         0.789
##   Robust Tucker-Lewis Index (TLI)                            0.648
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)             -11536.404  -11536.404
##   Scaling correction factor                                  1.416
##       for the MLR correction                                      
##   Loglikelihood unrestricted model (H1)     -11322.435  -11322.435
##   Scaling correction factor                                  1.407
##       for the MLR correction                                      
##                                                                   
##   Akaike (AIC)                               23108.808   23108.808
##   Bayesian (BIC)                             23198.912   23198.912
##   Sample-size adjusted Bayesian (SABIC)      23141.739   23141.739
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.205       0.173
##   90 Percent confidence interval - lower         0.189       0.160
##   90 Percent confidence interval - upper         0.222       0.188
##   P-value H_0: RMSEA <= 0.050                    0.000       0.000
##   P-value H_0: RMSEA >= 0.080                    1.000       1.000
##                                                                   
##   Robust RMSEA                                               0.204
##   90 Percent confidence interval - lower                     0.184
##   90 Percent confidence interval - upper                     0.225
##   P-value H_0: Robust RMSEA <= 0.050                         0.000
##   P-value H_0: Robust RMSEA >= 0.080                         1.000
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.086       0.086
## 
## Parameter Estimates:
## 
##   Standard errors                             Sandwich
##   Information bread                           Observed
##   Observed information based on                Hessian
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   Sit =~                                                                
##     Sit1r             1.234    0.069   17.906    0.000    1.234    0.707
##     Sit2              0.702    0.074    9.441    0.000    0.702    0.509
##     Sit3r             1.241    0.063   19.847    0.000    1.241    0.778
##     Sit4              0.784    0.069   11.333    0.000    0.784    0.559
##     Sit5r             1.023    0.053   19.179    0.000    1.023    0.596
##     Sit6              0.819    0.069   11.942    0.000    0.819    0.535
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .Sit1r             4.547    0.053   86.474    0.000    4.547    2.604
##    .Sit2              5.289    0.042  127.346    0.000    5.289    3.834
##    .Sit3r             4.896    0.048  101.959    0.000    4.896    3.070
##    .Sit4              5.359    0.042  126.896    0.000    5.359    3.821
##    .Sit5r             4.860    0.052   94.060    0.000    4.860    2.832
##    .Sit6              5.321    0.046  115.492    0.000    5.321    3.477
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .Sit1r             1.526    0.149   10.217    0.000    1.526    0.500
##    .Sit2              1.409    0.128   11.014    0.000    1.409    0.741
##    .Sit3r             1.004    0.135    7.456    0.000    1.004    0.395
##    .Sit4              1.352    0.127   10.672    0.000    1.352    0.687
##    .Sit5r             1.899    0.118   16.025    0.000    1.899    0.645
##    .Sit6              1.671    0.159   10.517    0.000    1.671    0.714
##     Sit               1.000                               1.000    1.000
## 
## R-Square:
##                    Estimate
##     Sit1r             0.500
##     Sit2              0.259
##     Sit3r             0.605
##     Sit4              0.313
##     Sit5r             0.355
##     Sit6              0.286

Alternative Identification Methods

There are multiple equivalent ways of getting the same CFA model, but with different scaling for the factor mean and variance (i.e., different means of identification). Now let’s see the model parameters when using the marker item for model identification instead. In the marker item identification method:

The first factor loading of a factor (the first variable after the =~ symbol in lavaan syntax) is set to one
The factor variance is estimated
The factor mean is set to zero
All item intercepts and unique variances are estimated (as done in the Z-score identification method)

Factor Mean Zero; Factor Variance Estimated (marker item factor loading)

The lavaan syntax is the same for the marker item identification, but the call to the lavaan() function does not include the std.lv = TRUE option, which defaults to the marker item method of identification:

model02Estimates = sem(model = model01SyntaxShort, data = hfsSituations, estimator = "MLR", mimic = "mplus")
summary(model02Estimates, fit.measures = TRUE, rsquare = TRUE, standardized = TRUE)

## lavaan 0.6-19 ended normally after 28 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        18
## 
##   Number of observations                          1103
##   Number of missing patterns                         1
## 
## Model Test User Model:
##                                               Standard      Scaled
##   Test Statistic                               427.937     307.804
##   Degrees of freedom                                 9           9
##   P-value (Chi-square)                           0.000       0.000
##   Scaling correction factor                                  1.390
##     Yuan-Bentler correction (Mplus variant)                       
## 
## Model Test Baseline Model:
## 
##   Test statistic                              1981.034    1128.693
##   Degrees of freedom                                15          15
##   P-value                                        0.000       0.000
##   Scaling correction factor                                  1.755
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.787       0.732
##   Tucker-Lewis Index (TLI)                       0.645       0.553
##                                                                   
##   Robust Comparative Fit Index (CFI)                         0.789
##   Robust Tucker-Lewis Index (TLI)                            0.648
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)             -11536.404  -11536.404
##   Scaling correction factor                                  1.416
##       for the MLR correction                                      
##   Loglikelihood unrestricted model (H1)     -11322.435  -11322.435
##   Scaling correction factor                                  1.407
##       for the MLR correction                                      
##                                                                   
##   Akaike (AIC)                               23108.808   23108.808
##   Bayesian (BIC)                             23198.912   23198.912
##   Sample-size adjusted Bayesian (SABIC)      23141.739   23141.739
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.205       0.173
##   90 Percent confidence interval - lower         0.189       0.160
##   90 Percent confidence interval - upper         0.222       0.188
##   P-value H_0: RMSEA <= 0.050                    0.000       0.000
##   P-value H_0: RMSEA >= 0.080                    1.000       1.000
##                                                                   
##   Robust RMSEA                                               0.204
##   90 Percent confidence interval - lower                     0.184
##   90 Percent confidence interval - upper                     0.225
##   P-value H_0: Robust RMSEA <= 0.050                         0.000
##   P-value H_0: Robust RMSEA >= 0.080                         1.000
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.086       0.086
## 
## Parameter Estimates:
## 
##   Standard errors                             Sandwich
##   Information bread                           Observed
##   Observed information based on                Hessian
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   Sit =~                                                                
##     Sit1r             1.000                               1.234    0.707
##     Sit2              0.569    0.083    6.831    0.000    0.702    0.509
##     Sit3r             1.005    0.035   28.553    0.000    1.241    0.778
##     Sit4              0.636    0.082    7.741    0.000    0.784    0.559
##     Sit5r             0.829    0.053   15.698    0.000    1.023    0.596
##     Sit6              0.664    0.081    8.143    0.000    0.819    0.535
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .Sit1r             4.547    0.053   86.474    0.000    4.547    2.604
##    .Sit2              5.289    0.042  127.346    0.000    5.289    3.834
##    .Sit3r             4.896    0.048  101.959    0.000    4.896    3.070
##    .Sit4              5.359    0.042  126.896    0.000    5.359    3.821
##    .Sit5r             4.860    0.052   94.060    0.000    4.860    2.832
##    .Sit6              5.321    0.046  115.492    0.000    5.321    3.477
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .Sit1r             1.526    0.149   10.217    0.000    1.526    0.500
##    .Sit2              1.409    0.128   11.014    0.000    1.409    0.741
##    .Sit3r             1.004    0.135    7.456    0.000    1.004    0.395
##    .Sit4              1.352    0.127   10.672    0.000    1.352    0.687
##    .Sit5r             1.899    0.118   16.025    0.000    1.899    0.645
##    .Sit6              1.671    0.159   10.517    0.000    1.671    0.714
##     Sit               1.523    0.170    8.953    0.000    1.000    1.000
## 
## R-Square:
##                    Estimate
##     Sit1r             0.500
##     Sit2              0.259
##     Sit3r             0.605
##     Sit4              0.313
##     Sit5r             0.355
##     Sit6              0.286

Unstandardized Model Results

FACTOR LOADINGS (regression slopes of item response on factor)

Here, loading for SIT1R is not tested and has no standard error (Std.Err) because it is fixed to one. Also, note the Std.lv estimates are identical to the unstandardized estimates in the Z-score factor identification method. Further, the Std.all estimates are equal to the Std.all estimates in the Z-score factor identification method. These will be equal for all identification methods.

Latent Variables:
                  Estimate  Std.Err  z-value  P(>|z|)  Std.lv  Std.all
  Sit =~
    Sit1r            1.000                               1.234    0.707
    Sit2             0.569    0.083    6.831    0.000    0.702    0.509
    Sit3r            1.005    0.035   28.553    0.000    1.241    0.778
    Sit4             0.636    0.082    7.741    0.000    0.784    0.559
    Sit5r            0.829    0.053   15.698    0.000    1.023    0.596
    Sit6             0.664    0.081    8.143    0.000    0.819    0.535

Intercepts (of Items) – EXPECTED Y WHEN FACTOR = 0, or for mean of factor in sample

Here, all are identical to what was found in the previous Z-score standardization. That is because we left the factor mean to be zero.

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .Sit1r             4.547    0.053   86.474    0.000    4.547    2.604
   .Sit2              5.289    0.042  127.346    0.000    5.289    3.834
   .Sit3r             4.896    0.048  101.959    0.000    4.896    3.070
   .Sit4              5.359    0.042  126.896    0.000    5.359    3.821
   .Sit5r             4.860    0.052   94.060    0.000    4.860    2.832
   .Sit6              5.321    0.046  115.492    0.000    5.321    3.477
    Sit               0.000                               0.000    0.000

Residual (unique) variances and factor variance

Here, the factor variance is estimated, which is different from the previous identification method.

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .Sit1r             1.526    0.149   10.217    0.000    1.526    0.500
   .Sit2              1.409    0.128   11.014    0.000    1.409    0.741
   .Sit3r             1.004    0.135    7.456    0.000    1.004    0.395
   .Sit4              1.352    0.127   10.672    0.000    1.352    0.687
   .Sit5r             1.899    0.118   16.025    0.000    1.899    0.645
   .Sit6              1.671    0.159   10.517    0.000    1.671    0.714
    Sit               1.523    0.170    8.953    0.000    1.000    1.000

Estimating Factor Means: Marker Item Loading (=1) and Marker Item Intercept (=0)

To estimate this model, we have to return to the general lavaan model function. In the syntax below you see 1*Sit1r, setting the factor loading for Sit1r to one (the marker item factor loading). Further, you see Sit1r ~ 0 setting the item intercept of Sit1r to zero, the marker item intercept. Finally, to estimate the factor mean, we use Sit ~ 1, which estimates the factor mean explicitly.

model03Syntax = "

# Model 1 --  Fully Z-Scored Factor Identification Approach

# Item factor loadings --> list the factor to the left of the =~ and the items to the right, separated by a plus
# Once the factor is defined it can be used in syntax as if it is an observed variable
# Parameters can be fixed to constant values by using the * command; starting values can be specified by using start()*; labels can be implemented with []

Sit =~ 1*Sit1r + Sit2 + Sit3r + Sit4 + Sit5r + Sit6

# Item intercepts --> ~ 1 indicates means or intercepts
# You can put multiple lines of syntax on a single line using a ; 
Sit1r ~ 0; Sit2 ~ 1; Sit3r  ~ 1; Sit4 ~ 1; Sit5r ~ 1; Sit6 ~ 1;

# Item error (unique) variances and covariances --> use the ~~ command
Sit1r ~~ Sit1r; Sit2 ~~ Sit2; Sit3r ~~ Sit3r; Sit4 ~~ Sit4; Sit5r ~~ Sit5r; Sit6 ~~ Sit6; 

# Factor variance
Sit ~~ Sit

# Factor mean (intercept)
Sit ~ 1

"
model03Estimates = sem(model = model03Syntax, data = hfsSituations, estimator = "MLR", mimic = "mplus")
summary(model03Estimates, fit.measures = TRUE, rsquare = TRUE, standardized = TRUE)

## lavaan 0.6-19 ended normally after 55 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        18
## 
##   Number of observations                          1103
##   Number of missing patterns                         1
## 
## Model Test User Model:
##                                               Standard      Scaled
##   Test Statistic                               427.937     307.804
##   Degrees of freedom                                 9           9
##   P-value (Chi-square)                           0.000       0.000
##   Scaling correction factor                                  1.390
##     Yuan-Bentler correction (Mplus variant)                       
## 
## Model Test Baseline Model:
## 
##   Test statistic                              1981.034    1128.693
##   Degrees of freedom                                15          15
##   P-value                                        0.000       0.000
##   Scaling correction factor                                  1.755
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.787       0.732
##   Tucker-Lewis Index (TLI)                       0.645       0.553
##                                                                   
##   Robust Comparative Fit Index (CFI)                         0.789
##   Robust Tucker-Lewis Index (TLI)                            0.648
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)             -11536.404  -11536.404
##   Scaling correction factor                                  1.416
##       for the MLR correction                                      
##   Loglikelihood unrestricted model (H1)     -11322.435  -11322.435
##   Scaling correction factor                                  1.407
##       for the MLR correction                                      
##                                                                   
##   Akaike (AIC)                               23108.808   23108.808
##   Bayesian (BIC)                             23198.912   23198.912
##   Sample-size adjusted Bayesian (SABIC)      23141.739   23141.739
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.205       0.173
##   90 Percent confidence interval - lower         0.189       0.160
##   90 Percent confidence interval - upper         0.222       0.188
##   P-value H_0: RMSEA <= 0.050                    0.000       0.000
##   P-value H_0: RMSEA >= 0.080                    1.000       1.000
##                                                                   
##   Robust RMSEA                                               0.204
##   90 Percent confidence interval - lower                     0.184
##   90 Percent confidence interval - upper                     0.225
##   P-value H_0: Robust RMSEA <= 0.050                         0.000
##   P-value H_0: Robust RMSEA >= 0.080                         1.000
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.086       0.086
## 
## Parameter Estimates:
## 
##   Standard errors                             Sandwich
##   Information bread                           Observed
##   Observed information based on                Hessian
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   Sit =~                                                                
##     Sit1r             1.000                               1.234    0.707
##     Sit2              0.569    0.083    6.831    0.000    0.702    0.509
##     Sit3r             1.005    0.035   28.553    0.000    1.241    0.778
##     Sit4              0.636    0.082    7.741    0.000    0.784    0.559
##     Sit5r             0.829    0.053   15.698    0.000    1.023    0.596
##     Sit6              0.664    0.081    8.143    0.000    0.819    0.535
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .Sit1r             0.000                               0.000    0.000
##    .Sit2              2.701    0.383    7.045    0.000    2.701    1.958
##    .Sit3r             0.325    0.171    1.899    0.058    0.325    0.204
##    .Sit4              2.469    0.380    6.502    0.000    2.469    1.760
##    .Sit5r             1.092    0.246    4.431    0.000    1.092    0.636
##    .Sit6              2.304    0.369    6.249    0.000    2.304    1.506
##     Sit               4.547    0.053   86.474    0.000    3.684    3.684
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .Sit1r             1.526    0.149   10.217    0.000    1.526    0.500
##    .Sit2              1.409    0.128   11.014    0.000    1.409    0.741
##    .Sit3r             1.004    0.135    7.456    0.000    1.004    0.395
##    .Sit4              1.352    0.127   10.672    0.000    1.352    0.687
##    .Sit5r             1.899    0.118   16.025    0.000    1.899    0.645
##    .Sit6              1.671    0.159   10.517    0.000    1.671    0.714
##     Sit               1.523    0.170    8.953    0.000    1.000    1.000
## 
## R-Square:
##                    Estimate
##     Sit1r             0.500
##     Sit2              0.259
##     Sit3r             0.605
##     Sit4              0.313
##     Sit5r             0.355
##     Sit6              0.286

Here, you will note the model is identical to the previous two – the log-likelihood is -11536.404, same as previously. Now, the intercepts has the mean of item Sit1r set to zero and the mean of the factor estimated.

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .Sit1r             0.000                               0.000    0.000
   .Sit2              2.701    0.383    7.045    0.000    2.701    1.958
   .Sit3r             0.325    0.171    1.899    0.058    0.325    0.204
   .Sit4              2.469    0.380    6.502    0.000    2.469    1.760
   .Sit5r             1.092    0.246    4.431    0.000    1.092    0.636
   .Sit6              2.304    0.369    6.249    0.000    2.304    1.506
    Sit               4.547    0.053   86.474    0.000    3.684    3.684

The mean of the factor is 4.547, which, in this case, is the mean of item Sit1r. That is because the factor loading of the item is set to 1.0. You will also notice that the other item intercepts have changed from previous models. In general, an item’s mean is found by adding the item intercept to the product of the factor loading times the factor mean, or \(\mu_i + \lambda_i\mu_F\).

Model Fit Information for a Single-Factor Model (same regardless of factor scaling method):

Calculating model degrees of freedom (v is number of observed variables in a model):

Total df = \(\frac{v(v+1)}{2} + v = 27\)
Number of parameters in our model: 18
Degrees of freedom = 27-18 = 9

Loglikelihood – use for testing differences in model fit across nested models

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)             -11536.404  -11536.404
  Scaling correction factor                                  1.416
    for the MLR correction
  Loglikelihood unrestricted model (H1)     -11322.435  -11322.435
  Scaling correction factor                                  1.407
    for the MLR correction

Loglikelihood user model (H0): this is for your specified model
Scaling correction factor for the MLR correction: indicates how far off from normal=1
H1 Value: this is for a saturated (perfect) model

Information Criteria “smaller is better” – use for nested or non-nested model comparisons

  Number of free parameters                         18          18
  Akaike (AIC)                               23108.808   23108.808
  Bayesian (BIC)                             23198.912   23198.912
  Sample-size adjusted Bayesian (BIC)        23141.739   23141.739

is # of estimated parameters (“free” to be not 0)
AIC = (-2\(LL_{H0}\)) + (2estimated parameters)
BIC = (-2\(LL_{H0}\)) + (LN Nestimated parameters)
Sample-size adjusted Bayesian (BIC) = BIC replacing N with (N + 2) / 24

Chi-Square Test of Model Fit (Significance is bad here) for your specified model

  Estimator                                         ML      Robust
  Minimum Function Test Statistic              427.937     307.804
  Degrees of freedom                                 9           9
  P-value (Chi-square)                           0.000       0.000
  Scaling correction factor                                  1.390
    for the Yuan-Bentler correction (Mplus variant)

Estimator: Look at robust column for MLR version of test
Minimum Function Test Statistic: leftover after estimating our one-factor model
Scaling correction factor for the Yuan-Bentler correction (Mplus variant): indicates how far off from normal=1

Where does this \(\chi^2\) value for “model fit” come from? A rescaled −2LL model comparison of this one-factor model (H0) against the saturated model (H1) that perfectly reproduces the data covariances:

Step 1: Original −2∆LL = −2*(LLfewer – LLmore) = −2(−11,536.404 + 11,322.435) = 427.938
Step 2: Scaling correction = [ (#parmsfewer*scalefewer) – (#parmsmore*scalemore) ] / (#parmsfewer – #parmsmore) = [ (18 * 1.4158) – (27 * 1.4073) ] / (18 – 27) = −12.501 / −9 = 1.3903
Step 3: Rescaled −2∆LL = −2∆LL / scaling correction = 427.938 / 1.903 = 307.803: matches model \(\chi^2\)
Step 4: Difference in df = #parmsmore – #parmsfewer = 27 – 18 = 9

Where the Saturated (H1) Model Comes From: All Means, Variances, and Covariances Estimated

How to fit the saturated (Unstructured) Baseline Model: Item means, variances, and covariances in original data:

modelSaturated = "
# Item intercepts --> ~ 1 indicates means or intercepts
# You can put multiple lines of syntax on a single line using a ; 
Sit1r ~ 1; Sit2 ~ 1; Sit3r  ~ 1; Sit4 ~ 1; Sit5r ~ 1; Sit6 ~ 1;

# Item variances and covariances --> use the ~~ command

# Variances:
Sit1r ~~ Sit1r; Sit2 ~~ Sit2; Sit3r ~~ Sit3r; Sit4 ~~ Sit4; Sit5r ~~ Sit5r; Sit6 ~~ Sit6; 

# Covariances:
Sit1r ~~ Sit2; Sit1r ~~ Sit3r; Sit1r ~~ Sit4; Sit1r ~~ Sit5r; Sit1r ~~ Sit6;
Sit2 ~~ Sit3r; Sit2 ~~ Sit4; Sit2 ~~ Sit5r; Sit2 ~~ Sit6;
Sit3r ~~ Sit4; Sit3r ~~ Sit5r; Sit3r ~~ Sit6;
Sit4 ~~ Sit5r; Sit4 ~~ Sit6;
Sit5r ~~ Sit6;
"
modelSaturatedEstimates = sem(model = modelSaturated, data = hfsSituations, estimator = "MLR", mimic = "mplus")
summary(modelSaturatedEstimates, fit.measures = TRUE)

## lavaan 0.6-19 ended normally after 45 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        27
## 
##   Number of observations                          1103
##   Number of missing patterns                         1
## 
## Model Test User Model:
##                                               Standard      Scaled
##   Test Statistic                                 0.000       0.000
##   Degrees of freedom                                 0           0
## 
## Model Test Baseline Model:
## 
##   Test statistic                              1981.034    1128.693
##   Degrees of freedom                                15          15
##   P-value                                        0.000       0.000
##   Scaling correction factor                                  1.755
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    1.000       1.000
##   Tucker-Lewis Index (TLI)                       1.000       1.000
##                                                                   
##   Robust Comparative Fit Index (CFI)                         1.000
##   Robust Tucker-Lewis Index (TLI)                            1.000
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)             -11322.435  -11322.435
##   Loglikelihood unrestricted model (H1)     -11322.435  -11322.435
##                                                                   
##   Akaike (AIC)                               22698.870   22698.870
##   Bayesian (BIC)                             22834.027   22834.027
##   Sample-size adjusted Bayesian (SABIC)      22748.268   22748.268
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.000          NA
##   90 Percent confidence interval - lower         0.000          NA
##   90 Percent confidence interval - upper         0.000          NA
##   P-value H_0: RMSEA <= 0.050                       NA          NA
##   P-value H_0: RMSEA >= 0.080                       NA          NA
##                                                                   
##   Robust RMSEA                                               0.000
##   90 Percent confidence interval - lower                     0.000
##   90 Percent confidence interval - upper                     0.000
##   P-value H_0: Robust RMSEA <= 0.050                            NA
##   P-value H_0: Robust RMSEA >= 0.080                            NA
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.000       0.000
## 
## Parameter Estimates:
## 
##   Standard errors                             Sandwich
##   Information bread                           Observed
##   Observed information based on                Hessian
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   Sit1r ~~                                            
##     Sit2              0.577    0.083    6.974    0.000
##     Sit3r             1.802    0.091   19.800    0.000
##     Sit4              0.734    0.080    9.170    0.000
##     Sit5r             1.358    0.100   13.638    0.000
##     Sit6              0.795    0.092    8.620    0.000
##   Sit2 ~~                                             
##     Sit3r             0.697    0.077    9.023    0.000
##     Sit4              1.103    0.084   13.112    0.000
##     Sit5r             0.604    0.080    7.578    0.000
##     Sit6              0.965    0.083   11.687    0.000
##   Sit3r ~~                                            
##     Sit4              0.824    0.076   10.848    0.000
##     Sit5r             1.319    0.091   14.495    0.000
##     Sit6              0.868    0.085   10.261    0.000
##   Sit4 ~~                                             
##     Sit5r             0.695    0.079    8.843    0.000
##     Sit6              0.962    0.081   11.821    0.000
##   Sit5r ~~                                            
##     Sit6              0.798    0.089    8.955    0.000
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)
##     Sit1r             4.547    0.053   86.474    0.000
##     Sit2              5.289    0.042  127.346    0.000
##     Sit3r             4.896    0.048  101.959    0.000
##     Sit4              5.359    0.042  126.896    0.000
##     Sit5r             4.860    0.052   94.060    0.000
##     Sit6              5.321    0.046  115.492    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##     Sit1r             3.049    0.098   31.216    0.000
##     Sit2              1.903    0.093   20.547    0.000
##     Sit3r             2.543    0.092   27.763    0.000
##     Sit4              1.967    0.095   20.683    0.000
##     Sit5r             2.945    0.100   29.595    0.000
##     Sit6              2.341    0.108   21.771    0.000

Note that H0 and H1 are now the same! Our H0 model IS the H1 saturated model: the log-likelihood of H0 and H1 are the same. Also, note there are 27 free parameters (six item means, six item variances, and 15 covariances).

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)             -11322.435  -11322.435
  Loglikelihood unrestricted model (H1)     -11322.435  -11322.435

  Number of free parameters                         27          27
  Akaike (AIC)                               22698.870   22698.870
  Bayesian (BIC)                             22834.027   22834.027
  Sample-size adjusted Bayesian (BIC)        22748.268   22748.268

Also note how the model \(\chi^2\) test has zero degrees of freedom: this is because all 27 possible free parameters were estimated. This model fits the data perfectly.

  Estimator                                         ML      Robust
  Minimum Function Test Statistic                0.000       0.000
  Degrees of freedom                                 0           0
  Minimum Function Value               0.0000000000000
  Scaling correction factor                                     NA
    for the Yuan-Bentler correction (Mplus variant)

The Rest of the One-Factor Model Fit Statistics

User model versus baseline model:

  Comparative Fit Index (CFI)                    0.787       0.732
  Tucker-Lewis Index (TLI)                       0.645       0.553

  Robust Comparative Fit Index (CFI)                         0.787
  Robust Tucker-Lewis Index (TLI)                            0.646
  
Root Mean Square Error of Approximation:

  RMSEA                                          0.205       0.173
  90 Percent Confidence Interval          0.189  0.222       0.160  0.188
  P-value RMSEA <= 0.05                          0.000       0.000

  Robust RMSEA                                               0.205
  90 Percent Confidence Interval                             0.185  0.224

Standardized Root Mean Square Residual:

  SRMR                                           0.086       0.086

CFI/TLI: Want close to one (1 = saturated model)
RMSEA: Want close to zero (0 = saturated model)
SRMR (Standardized Root Mean Square Residual): want close to zero (0 = saturated model)

Chi-Square Test of Model Fit for the Baseline Model (for the “no covariances” model)

This compares the estimated model to the one that has no covariances. It is a last-resort type model in that if the estimated model has a test with a non-significant p-value, then this indicates there will be no factors to be found in the data.

Model test baseline model:

  Minimum Function Test Statistic             1981.034    1128.693
  Degrees of freedom                                15          15
  P-value                                        0.000       0.000

Where does this \(\chi^2\) value for “fit of the baseline model” come from? A rescaled −2LL model comparison of the independence model with NO covariances to the saturated model:

Step 1: Original −2∆LL = −2*(LLfewer – LLmore) = −2(−12,312.952 + 11,322.435) = 1,981.034
Step 2: Scaling correction = [ (#parmsfewerscalefewer) – (#parmsmorescalemore) ] / (#parmsfewer – #parmsmore) = [ (12 * 0.9725) – (27 * 1.4073) ] / (12 – 27) = –26.372 / −15 = 1.7551
Step 3: Rescaled −2∆LL = −2∆LL / scaling correction = 1,981.034 / 1.7551 = 1,128.704: matches baseline \(\chi^2\)
Step 4: Difference in df = #parmsmore – #parmsfewer = 27 – 12 = 15

What’s the point? This baseline model fit test tells us whether there are any covariances at all (i.e., whether it even makes sense to try to fit latent factors to predict them).

How to fit the Independence (Null) Baseline Model: Item means and variances, but NO covariances

modelIndependence = "
# Item intercepts --> ~ 1 indicates means or intercepts
# You can put multiple lines of syntax on a single line using a ; 
Sit1r ~ 1; Sit2 ~ 1; Sit3r  ~ 1; Sit4 ~ 1; Sit5r ~ 1; Sit6 ~ 1;

# Item variances and covariances --> use the ~~ command

# Variances:
Sit1r ~~ Sit1r; Sit2 ~~ Sit2; Sit3r ~~ Sit3r; Sit4 ~~ Sit4; Sit5r ~~ Sit5r; Sit6 ~~ Sit6; 

"
modelIndependenceEstimates = sem(model = modelIndependence, data = hfsSituations, estimator = "MLR", mimic = "mplus")
summary(modelIndependenceEstimates, fit.measures = TRUE)

## lavaan 0.6-19 ended normally after 13 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        12
## 
##   Number of observations                          1103
##   Number of missing patterns                         1
## 
## Model Test User Model:
##                                               Standard      Scaled
##   Test Statistic                              1981.034    1128.694
##   Degrees of freedom                                15          15
##   P-value (Chi-square)                           0.000       0.000
##   Scaling correction factor                                  1.755
##     Yuan-Bentler correction (Mplus variant)                       
## 
## Model Test Baseline Model:
## 
##   Test statistic                              1981.034    1128.693
##   Degrees of freedom                                15          15
##   P-value                                        0.000       0.000
##   Scaling correction factor                                  1.755
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.000       0.000
##   Tucker-Lewis Index (TLI)                      -0.000      -0.000
##                                                                   
##   Robust Comparative Fit Index (CFI)                         0.000
##   Robust Tucker-Lewis Index (TLI)                           -0.000
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)             -12312.952  -12312.952
##   Scaling correction factor                                  0.973
##       for the MLR correction                                      
##   Loglikelihood unrestricted model (H1)     -11322.435  -11322.435
##   Scaling correction factor                                  1.407
##       for the MLR correction                                      
##                                                                   
##   Akaike (AIC)                               24649.904   24649.904
##   Bayesian (BIC)                             24709.974   24709.974
##   Sample-size adjusted Bayesian (SABIC)      24671.859   24671.859
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.345       0.259
##   90 Percent confidence interval - lower         0.332       0.250
##   90 Percent confidence interval - upper         0.358       0.269
##   P-value H_0: RMSEA <= 0.050                    0.000       0.000
##   P-value H_0: RMSEA >= 0.080                    1.000       1.000
##                                                                   
##   Robust RMSEA                                               0.344
##   90 Percent confidence interval - lower                     0.329
##   90 Percent confidence interval - upper                     0.359
##   P-value H_0: Robust RMSEA <= 0.050                         0.000
##   P-value H_0: Robust RMSEA >= 0.080                         1.000
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.300       0.300
## 
## Parameter Estimates:
## 
##   Standard errors                             Sandwich
##   Information bread                           Observed
##   Observed information based on                Hessian
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)
##     Sit1r             4.547    0.053   86.474    0.000
##     Sit2              5.289    0.042  127.346    0.000
##     Sit3r             4.896    0.048  101.959    0.000
##     Sit4              5.359    0.042  126.896    0.000
##     Sit5r             4.860    0.052   94.060    0.000
##     Sit6              5.321    0.046  115.492    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##     Sit1r             3.049    0.098   31.216    0.000
##     Sit2              1.903    0.093   20.548    0.000
##     Sit3r             2.543    0.092   27.763    0.000
##     Sit4              1.967    0.095   20.682    0.000
##     Sit5r             2.945    0.100   29.595    0.000
##     Sit6              2.341    0.108   21.771    0.000

Note how the test of the model vs. saturated is identical to the test of the model vs. baseline: that indicates this model is the baseline model.

Although not 0, this is the worst possible RMSEA while still allowing separate means and variances per item in these data. RMSEA is a parsimony-corrected absolute fit index (so, its fit is relative to the saturated model).

CFI and TLI are 0 because they are “incremental fit” indices relative to the independence model (which this is).

SRMR is also an absolute fit index (relative to saturated model), so this is the worst it gets for these data, too.

Poor Model Fit: Need for Model Modifications

For this section, any of the three equivalent single factor models can be used.

Modification Indices

So global fit for the one-factor model is not so good (RMSEA = .173, CFI = .732). What do the voodoo modification indices suggest we do to fix it? To get modification indices, use the modificationindices() function. Here, the function is called with options:

object = model01Estimates: The model estimates object from which to derive modification indices.
sort. = TRUE: Sort the output from highest to lowest modification index.

modificationindices(object = model01Estimates, sort. = TRUE)

##      lhs op   rhs      mi    epc sepc.lv sepc.all sepc.nox
## 27  Sit2 ~~  Sit4 224.295  0.702   0.702    0.509    0.509
## 22 Sit1r ~~ Sit3r 199.679  1.023   1.023    0.827    0.827
## 29  Sit2 ~~  Sit6  88.835  0.486   0.486    0.317    0.317
## 21 Sit1r ~~  Sit2  68.985 -0.464  -0.464   -0.316   -0.316
## 34  Sit4 ~~  Sit6  64.710  0.415   0.415    0.276    0.276
## 23 Sit1r ~~  Sit4  50.442 -0.403  -0.403   -0.280   -0.280
## 26  Sit2 ~~ Sit3r  48.505 -0.357  -0.357   -0.300   -0.300
## 30 Sit3r ~~  Sit4  40.616 -0.336  -0.336   -0.288   -0.288
## 25 Sit1r ~~  Sit6  33.475 -0.358  -0.358   -0.224   -0.224
## 32 Sit3r ~~  Sit6  31.132 -0.319  -0.319   -0.246   -0.246
## 28  Sit2 ~~ Sit5r   7.117 -0.151  -0.151   -0.092   -0.092
## 33  Sit4 ~~ Sit5r   6.906 -0.149  -0.149   -0.093   -0.093
## 24 Sit1r ~~ Sit5r   6.417  0.176   0.176    0.104    0.104
## 31 Sit3r ~~ Sit5r   3.548  0.123   0.123    0.089    0.089
## 35 Sit5r ~~  Sit6   0.736 -0.053  -0.053   -0.030   -0.030

The output includes:

lhs: Left hand side for parameter (in lavaan syntax)
op: Operation of parameter: ~~ is for (residual) covariances; =~ is for additional factor loadings. All other lavaan syntax types are possible, too, but here item means and variances fit perfectly (see below)
rhs: Right hand side for parameter (in lavaan syntax)
mi: Modification index using ML
mi.scaled: Modification index using MLR (use this one)
epc: Expected parameter change (from zero)
sepc.lv: Standardized expected parameter change using std.lv standardization
sepc.all: Standardized expected parameter change using std.all standardization
sepc.nox: Standardized expected parameter change using std.nox standardization

Highest values are for error covariances (for unknown multidimensionality).

Another approach: Residual Covariance Matrices

Another approach—how about we examine local fit and see where the problems seem to be?

The means and variances of the items will be perfectly reproduced, so that’s not an issue. Misfit results from the difference between the observed and model-predicted covariances.

lavaan gives us the “residual” (defined as observed – predicted) or “leftover” covariance matrix, but it is scale dependent and thus not so helpful. We can use the resid() function to obtain the residual covariance matrix (observed covariances minus model-implied covariances):

resid(object = model01Estimates, type = "raw")

## $type
## [1] "raw"
## 
## $cov
##        Sit1r   Sit2  Sit3r   Sit4  Sit5r   Sit6
## Sit1r  0.000                                   
## Sit2  -0.290  0.000                            
## Sit3r  0.271 -0.174  0.000                     
## Sit4  -0.234  0.552 -0.149  0.000              
## Sit5r  0.096 -0.114  0.050 -0.108  0.000       
## Sit6  -0.216  0.390 -0.149  0.319 -0.040  0.000
## 
## $mean
## Sit1r  Sit2 Sit3r  Sit4 Sit5r  Sit6 
##     0     0     0     0     0     0

lavaan also gives us “normalized” residuals, which can be thought of as z-scores for how large the residual leftover covariance is in absolute terms. Because the denominator decreases with sample size, however, these values may be inflated in large samples, so look for relatively large values.

“Normalized” Residuals for Inter-Item Covariances = (observed – predicted) / SD(observed):

resid(object = model01Estimates, type = "normalized")

## $type
## [1] "normalized"
## 
## $cov
##        Sit1r   Sit2  Sit3r   Sit4  Sit5r   Sit6
## Sit1r  0.000                                   
## Sit2  -3.503  0.000                            
## Sit3r  2.977 -2.253  0.000                     
## Sit4  -2.928  6.560 -1.959  0.000              
## Sit5r  0.960 -1.434  0.548 -1.372  0.000       
## Sit6  -2.345  4.721 -1.757  3.925 -0.444  0.000
## 
## $mean
## Sit1r  Sit2 Sit3r  Sit4 Sit5r  Sit6 
##     0     0     0     0     0     0

NEGATIVE NORMALIZED RESIDUAL: Less related than you predicted (don’t want to be together) POSITIVE NORMALIZED RESIDUAL: More related than you predicted (want to be more together)

Why might the normalized residuals (leftover correlations) for the positive-worded items be larger than for the negatively-worded items?

These results suggest that wording valence is playing a larger role in the pattern of covariance across items than what the one-factor model predicts. Rather than adding voodoo covariances among the residuals for specific items, how about a two-factor model based on wording instead?

Model 4. Fully Z-Scored, 2-Factor Model

Here, we use the shortened syntax and the sem() function to get results.

model04Syntax = "

# SitP loadings (all estimated)
SitP =~ Sit2 + Sit4 + Sit6

# SitN loadings (all estimated)
SitN =~ Sit1r + Sit3r + Sit5r

"

model04Estimates = sem(model = model04Syntax, data = hfsSituations, estimator = "MLR", mimic = "mplus", std.lv = TRUE)
summary(model04Estimates, fit.measures = TRUE, rsquare = TRUE, standardized = TRUE)

## lavaan 0.6-19 ended normally after 21 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        19
## 
##   Number of observations                          1103
##   Number of missing patterns                         1
## 
## Model Test User Model:
##                                               Standard      Scaled
##   Test Statistic                                35.410      24.924
##   Degrees of freedom                                 8           8
##   P-value (Chi-square)                           0.000       0.002
##   Scaling correction factor                                  1.421
##     Yuan-Bentler correction (Mplus variant)                       
## 
## Model Test Baseline Model:
## 
##   Test statistic                              1981.034    1128.693
##   Degrees of freedom                                15          15
##   P-value                                        0.000       0.000
##   Scaling correction factor                                  1.755
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.986       0.985
##   Tucker-Lewis Index (TLI)                       0.974       0.972
##                                                                   
##   Robust Comparative Fit Index (CFI)                         0.988
##   Robust Tucker-Lewis Index (TLI)                            0.977
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)             -11340.140  -11340.140
##   Scaling correction factor                                  1.402
##       for the MLR correction                                      
##   Loglikelihood unrestricted model (H1)     -11322.435  -11322.435
##   Scaling correction factor                                  1.407
##       for the MLR correction                                      
##                                                                   
##   Akaike (AIC)                               22718.281   22718.281
##   Bayesian (BIC)                             22813.391   22813.391
##   Sample-size adjusted Bayesian (SABIC)      22753.042   22753.042
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.056       0.044
##   90 Percent confidence interval - lower         0.038       0.028
##   90 Percent confidence interval - upper         0.075       0.061
##   P-value H_0: RMSEA <= 0.050                    0.277       0.708
##   P-value H_0: RMSEA >= 0.080                    0.018       0.000
##                                                                   
##   Robust RMSEA                                               0.052
##   90 Percent confidence interval - lower                     0.030
##   90 Percent confidence interval - upper                     0.076
##   P-value H_0: Robust RMSEA <= 0.050                         0.397
##   P-value H_0: Robust RMSEA >= 0.080                         0.026
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.029       0.029
## 
## Parameter Estimates:
## 
##   Standard errors                             Sandwich
##   Information bread                           Observed
##   Observed information based on                Hessian
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   SitP =~                                                               
##     Sit2              1.007    0.052   19.487    0.000    1.007    0.730
##     Sit4              1.064    0.050   21.195    0.000    1.064    0.759
##     Sit6              0.956    0.053   18.203    0.000    0.956    0.625
##   SitN =~                                                               
##     Sit1r             1.325    0.048   27.698    0.000    1.325    0.759
##     Sit3r             1.349    0.044   30.514    0.000    1.349    0.846
##     Sit5r             1.009    0.055   18.358    0.000    1.009    0.588
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   SitP ~~                                                               
##     SitN              0.564    0.041   13.775    0.000    0.564    0.564
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .Sit2              5.289    0.042  127.346    0.000    5.289    3.834
##    .Sit4              5.359    0.042  126.896    0.000    5.359    3.821
##    .Sit6              5.321    0.046  115.492    0.000    5.321    3.477
##    .Sit1r             4.547    0.053   86.474    0.000    4.547    2.604
##    .Sit3r             4.896    0.048  101.959    0.000    4.896    3.070
##    .Sit5r             4.860    0.052   94.060    0.000    4.860    2.832
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .Sit2              0.888    0.097    9.173    0.000    0.888    0.467
##    .Sit4              0.835    0.093    9.003    0.000    0.835    0.425
##    .Sit6              1.428    0.134   10.684    0.000    1.428    0.610
##    .Sit1r             1.294    0.103   12.547    0.000    1.294    0.425
##    .Sit3r             0.724    0.092    7.857    0.000    0.724    0.285
##    .Sit5r             1.926    0.119   16.128    0.000    1.926    0.654
##     SitP              1.000                               1.000    1.000
##     SitN              1.000                               1.000    1.000
## 
## R-Square:
##                    Estimate
##     Sit2              0.533
##     Sit4              0.575
##     Sit6              0.390
##     Sit1r             0.575
##     Sit3r             0.715
##     Sit5r             0.346

This model fits better (we would call this good fit). We can also compare the fit of this model with Model 1 using the anova() function. Here, the null hypothesis is that Model 1 fits as well as Model 4.

anova(model01Estimates, model04Estimates)

## 
## Scaled Chi-Squared Difference Test (method = "satorra.bentler.2001")
## 
## lavaan->lavTestLRT():  
##    lavaan NOTE: The "Chisq" column contains standard test statistics, not the 
##    robust test that should be reported per model. A robust difference test is 
##    a function of two standard (not robust) statistics.
##                  Df   AIC   BIC  Chisq Chisq diff Df diff Pr(>Chisq)    
## model04Estimates  8 22718 22813  35.41                                  
## model01Estimates  9 23109 23199 427.94     342.29       1  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This function conducts the MLR version of the likelihood ratio test for us. The significant difference indicates that Model 4 fits better than Model 1.

Let’s see: Any more local fit problems? The modification indices are better (values of a little over 10 seem okay).

modificationindices(object = model04Estimates, sort. = TRUE)

##      lhs op   rhs     mi    epc sepc.lv sepc.all sepc.nox
## 29  SitN =~  Sit6 15.383  0.245   0.245    0.160    0.160
## 30  Sit2 ~~  Sit4 15.383  0.332   0.332    0.386    0.386
## 35  Sit4 ~~  Sit6 13.886 -0.273  -0.273   -0.250   -0.250
## 27  SitN =~  Sit2 13.885 -0.224  -0.224   -0.162   -0.162
## 24  SitP =~ Sit1r  9.244 -0.223  -0.223   -0.128   -0.128
## 44 Sit3r ~~ Sit5r  9.244 -0.277  -0.277   -0.234   -0.234
## 26  SitP =~ Sit5r  7.549  0.191   0.191    0.111    0.111
## 42 Sit1r ~~ Sit3r  7.549  0.409   0.409    0.422    0.422
## 32  Sit2 ~~ Sit1r  6.685 -0.115  -0.115   -0.107   -0.107
## 41  Sit6 ~~ Sit5r  5.847  0.138   0.138    0.083    0.083
## 40  Sit6 ~~ Sit3r  1.321  0.052   0.052    0.052    0.052
## 25  SitP =~ Sit3r  0.667  0.060   0.060    0.037    0.037
## 43 Sit1r ~~ Sit5r  0.667  0.071   0.071    0.045    0.045
## 33  Sit2 ~~ Sit3r  0.402 -0.025  -0.025   -0.032   -0.032
## 39  Sit6 ~~ Sit1r  0.346  0.030   0.030    0.022    0.022
## 36  Sit4 ~~ Sit1r  0.307 -0.025  -0.025   -0.024   -0.024
## 38  Sit4 ~~ Sit5r  0.201  0.022   0.022    0.017    0.017
## 34  Sit2 ~~ Sit5r  0.129  0.017   0.017    0.013    0.013
## 37  Sit4 ~~ Sit3r  0.101  0.013   0.013    0.017    0.017
## 31  Sit2 ~~  Sit6  0.023  0.010   0.010    0.009    0.009
## 28  SitN =~  Sit4  0.023  0.009   0.009    0.007    0.007

The normalized residuals have two values that are bigger in absolute value than 2:

resid(object = model04Estimates, type = "normalized")

## $type
## [1] "normalized"
## 
## $cov
##         Sit2   Sit4   Sit6  Sit1r  Sit3r  Sit5r
## Sit2   0.000                                   
## Sit4   0.370  0.000                            
## Sit6   0.031 -0.676  0.000                     
## Sit1r -2.125 -0.768  0.870  0.000              
## Sit3r -0.896  0.192  1.658  0.172  0.000       
## Sit5r  0.382  1.128  2.847  0.212 -0.464  0.000
## 
## $mean
##  Sit2  Sit4  Sit6 Sit1r Sit3r Sit5r 
##     0     0     0     0     0     0

Because we have no real theoretical or defensible reason to fit any of these suggested parameters, we will not add any new parameters. This will be about as good as it gets.

Here is the path diagram of the model using the standardized estimates:

semPaths(object = model04Estimates, what = "std")

Calculating Reliabilities

As we have demonstrated good model fit, estimates of trait reliabilities can now be found. Although I do not dispense such advice, many people use a CFA to indicate model fit then proceed to use sum scores for each trait in a model. I do not like this because sum scores are perfectly correlated with the factor scores from the parallel items model – but we never tested that model (should it not fit we may get a reversal in the rank ordering of people by score).

Reliabilities for Sum Scores (Omega)

That said, reliabilities for sum scores can be found by using CFA model parameters. This reliability coefficient is called “Omega” by Rod McDonald and is calculated:

\[ \rho_\omega = \frac{\sigma^2_F\left( \sum_{i=1}^{Items} \lambda_i \right)^2}{\sigma^2_F\left( \sum_{i=1}^{Items} \lambda_i \right)^2 \sum_{i=1}^{Items} \sum_{j=1}^{Items} \psi_{ij}}\]

In words, Omega = Var(Factor)* (Sum of loadings)^2 / [ Var(Factor)* (Sum of loadings)^2 + Sum of error variances + 2* Sum of error covariances]

For our data:

Omega for Positive Factor = .744
(1.007+1.064+0.956)2 / (1.007+1.064+0.956)2 + (0.888+0.835+1.428) + 2*0

(alpha was .746)
Omega for Negative Factor = .775 (1.325+1.349+1.009)2 / (1.325+1.349+1.009)2 + (1.294+0.724+1.926) + 2*0

(alpha was .780)

Calculating Omega from `lavaan`

Through the use of parameter labels and new parameters, you can use lavaan to calculate Omega. In this syntax, the terms before the items are labels representing the factor loadings and unique variances. They can be used later in the syntax to form new parameters or to put constraints on the model.

model04SyntaxOmega = "

# SitP loadings (all estimated)
SitP =~ L2*Sit2 + L4*Sit4 + L6*Sit6

# SitN loadings (all estimated)
SitN =~ L1*Sit1r + L3*Sit3r + L5*Sit5r

# Unique Variances:
Sit1r ~~ E1*Sit1r; Sit2 ~~ E2*Sit2; Sit3r ~~ E3*Sit3r; Sit4 ~~ E4*Sit4; Sit5r ~~ E5*Sit5r; Sit6 ~~ E6*Sit6; 


# Calculate Omega Reliability for Sum Scores:
OmegaP := ((L2 + L4 + L6)^2) / ( ((L2 + L4 + L6)^2) + E2 + E4 + E6)
OmegaN := ((L1 + L3 + L5)^2) / ( ((L1 + L3 + L5)^2) + E1 + E3 + E5)
"

model04EstimatesOmega = sem(model = model04SyntaxOmega, data = hfsSituations, estimator = "MLR", mimic = "mplus", std.lv = TRUE)
summary(model04EstimatesOmega, fit.measures = FALSE, rsquare = FALSE, standardized = FALSE, header = FALSE)

## 
## Parameter Estimates:
## 
##   Standard errors                             Sandwich
##   Information bread                           Observed
##   Observed information based on                Hessian
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   SitP =~                                             
##     Sit2      (L2)    1.007    0.052   19.487    0.000
##     Sit4      (L4)    1.064    0.050   21.195    0.000
##     Sit6      (L6)    0.956    0.053   18.203    0.000
##   SitN =~                                             
##     Sit1r     (L1)    1.325    0.048   27.698    0.000
##     Sit3r     (L3)    1.349    0.044   30.514    0.000
##     Sit5r     (L5)    1.009    0.055   18.358    0.000
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   SitP ~~                                             
##     SitN              0.564    0.041   13.775    0.000
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .Sit2              5.289    0.042  127.346    0.000
##    .Sit4              5.359    0.042  126.896    0.000
##    .Sit6              5.321    0.046  115.492    0.000
##    .Sit1r             4.547    0.053   86.474    0.000
##    .Sit3r             4.896    0.048  101.959    0.000
##    .Sit5r             4.860    0.052   94.060    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .Sit1r     (E1)    1.294    0.103   12.547    0.000
##    .Sit2      (E2)    0.888    0.097    9.173    0.000
##    .Sit3r     (E3)    0.724    0.092    7.857    0.000
##    .Sit4      (E4)    0.835    0.093    9.003    0.000
##    .Sit5r     (E5)    1.926    0.119   16.128    0.000
##    .Sit6      (E6)    1.428    0.134   10.684    0.000
##     SitP              1.000                           
##     SitN              1.000                           
## 
## Defined Parameters:
##                    Estimate  Std.Err  z-value  P(>|z|)
##     OmegaP            0.744    0.020   37.956    0.000
##     OmegaN            0.775    0.014   56.802    0.000

Calculating Omega through `lavaan also provides its standard error. Note: the Omega equation used above is for the Z-score factor identification method. If the marker-item method is used for factor loadings, then multiply the sum of the loadings by the variance of the factor, as shown in the equation above.

Of note about Omega: If we constrain all factor loadings of a trait to be equal (called Tau-Equivalent), Omega is equal to Alpha. In the syntax below, putting the labels LP and LN in multiple places causes the factor loadings for those places to be constrained to be equal:

model04SyntaxAlpha = "

# SitP loadings (all estimated)
SitP =~ LP*Sit2 + LP*Sit4 + LP*Sit6

# SitN loadings (all estimated)
SitN =~ LN*Sit1r + LN*Sit3r + LN*Sit5r

# Unique Variances:
Sit1r ~~ E1*Sit1r; Sit2 ~~ E2*Sit2; Sit3r ~~ E3*Sit3r; Sit4 ~~ E4*Sit4; Sit5r ~~ E5*Sit5r; Sit6 ~~ E6*Sit6; 

# Calculate Omega Reliability for Sum Scores:
OmegaP := ((3*LP)^2) / ( ((3*LP)^2) + E2 + E4 + E6)
OmegaN := ((3*LN)^2) / ( ((3*LN)^2) + E1 + E3 + E5)
"

model04EstimatesAlpha = sem(model = model04SyntaxAlpha, data = hfsSituations, estimator = "MLR", mimic = "mplus", std.lv = TRUE)
summary(model04EstimatesAlpha, fit.measures = TRUE, rsquare = TRUE, standardized = TRUE)

## lavaan 0.6-19 ended normally after 19 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        19
##   Number of equality constraints                     4
## 
##   Number of observations                          1103
##   Number of missing patterns                         1
## 
## Model Test User Model:
##                                               Standard      Scaled
##   Test Statistic                                73.500      55.487
##   Degrees of freedom                                12          12
##   P-value (Chi-square)                           0.000       0.000
##   Scaling correction factor                                  1.325
##     Yuan-Bentler correction (Mplus variant)                       
## 
## Model Test Baseline Model:
## 
##   Test statistic                              1981.034    1128.693
##   Degrees of freedom                                15          15
##   P-value                                        0.000       0.000
##   Scaling correction factor                                  1.755
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.969       0.961
##   Tucker-Lewis Index (TLI)                       0.961       0.951
##                                                                   
##   Robust Comparative Fit Index (CFI)                         0.971
##   Robust Tucker-Lewis Index (TLI)                            0.963
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)             -11359.185  -11359.185
##   Scaling correction factor                                  1.163
##       for the MLR correction                                      
##   Loglikelihood unrestricted model (H1)     -11322.435  -11322.435
##   Scaling correction factor                                  1.407
##       for the MLR correction                                      
##                                                                   
##   Akaike (AIC)                               22748.370   22748.370
##   Bayesian (BIC)                             22823.457   22823.457
##   Sample-size adjusted Bayesian (SABIC)      22775.813   22775.813
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.068       0.057
##   90 Percent confidence interval - lower         0.054       0.044
##   90 Percent confidence interval - upper         0.084       0.071
##   P-value H_0: RMSEA <= 0.050                    0.021       0.167
##   P-value H_0: RMSEA >= 0.080                    0.105       0.003
##                                                                   
##   Robust RMSEA                                               0.066
##   90 Percent confidence interval - lower                     0.049
##   90 Percent confidence interval - upper                     0.084
##   P-value H_0: Robust RMSEA <= 0.050                         0.061
##   P-value H_0: Robust RMSEA >= 0.080                         0.103
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.061       0.061
## 
## Parameter Estimates:
## 
##   Standard errors                             Sandwich
##   Information bread                           Observed
##   Observed information based on                Hessian
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   SitP =~                                                               
##     Sit2      (LP)    1.014    0.036   28.384    0.000    1.014    0.734
##     Sit4      (LP)    1.014    0.036   28.384    0.000    1.014    0.733
##     Sit6      (LP)    1.014    0.036   28.384    0.000    1.014    0.653
##   SitN =~                                                               
##     Sit1r     (LN)    1.254    0.032   38.954    0.000    1.254    0.735
##     Sit3r     (LN)    1.254    0.032   38.954    0.000    1.254    0.804
##     Sit5r     (LN)    1.254    0.032   38.954    0.000    1.254    0.682
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   SitP ~~                                                               
##     SitN              0.578    0.041   14.230    0.000    0.578    0.578
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .Sit2              5.289    0.042  127.346    0.000    5.289    3.827
##    .Sit4              5.359    0.042  126.896    0.000    5.359    3.872
##    .Sit6              5.321    0.046  115.492    0.000    5.321    3.427
##    .Sit1r             4.547    0.053   86.474    0.000    4.547    2.667
##    .Sit3r             4.896    0.048  101.959    0.000    4.896    3.141
##    .Sit5r             4.860    0.052   94.060    0.000    4.860    2.645
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .Sit1r     (E1)    1.335    0.083   16.168    0.000    1.335    0.459
##    .Sit2      (E2)    0.882    0.083   10.610    0.000    0.882    0.462
##    .Sit3r     (E3)    0.857    0.069   12.339    0.000    0.857    0.353
##    .Sit4      (E4)    0.887    0.075   11.779    0.000    0.887    0.463
##    .Sit5r     (E5)    1.805    0.115   15.710    0.000    1.805    0.534
##    .Sit6      (E6)    1.382    0.118   11.703    0.000    1.382    0.573
##     SitP              1.000                               1.000    1.000
##     SitN              1.000                               1.000    1.000
## 
## R-Square:
##                    Estimate
##     Sit1r             0.541
##     Sit2              0.538
##     Sit3r             0.647
##     Sit4              0.537
##     Sit5r             0.466
##     Sit6              0.427
## 
## Defined Parameters:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     OmegaP            0.746    0.020   38.193    0.000    0.746    0.764
##     OmegaN            0.780    0.013   58.288    0.000    0.780    0.783

Compared with what the psych package gives:

alpha(x = hfsSituations[c("Sit2", "Sit4", "Sit6")], use = "all.obs")

## 
## Reliability analysis   
## Call: alpha(x = hfsSituations[c("Sit2", "Sit4", "Sit6")], use = "all.obs")
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase mean  sd median_r
##       0.74      0.74    0.67      0.49 2.9 0.014  5.3 1.2     0.46
## 
##     95% confidence boundaries 
##          lower alpha upper
## Feldt     0.71  0.74  0.77
## Duhachek  0.71  0.74  0.77
## 
##  Reliability if an item is dropped:
##      raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## Sit2      0.62      0.62    0.45      0.45 1.6    0.023    NA  0.45
## Sit4      0.63      0.63    0.46      0.46 1.7    0.022    NA  0.46
## Sit6      0.73      0.73    0.57      0.57 2.7    0.016    NA  0.57
## 
##  Item statistics 
##         n raw.r std.r r.cor r.drop mean  sd
## Sit2 1103  0.82  0.83  0.71   0.60  5.3 1.4
## Sit4 1103  0.82  0.83  0.70   0.59  5.4 1.4
## Sit6 1103  0.80  0.78  0.59   0.51  5.3 1.5
## 
## Non missing response frequency for each item
##         1    2    3    4    5    6    7 miss
## Sit2 0.02 0.02 0.05 0.13 0.31 0.24 0.22    0
## Sit4 0.02 0.02 0.04 0.15 0.27 0.24 0.25    0
## Sit6 0.03 0.03 0.06 0.14 0.23 0.25 0.27    0

alpha(x = hfsSituations[c("Sit1r", "Sit3r", "Sit5r")], use = "all.obs")

## 
## Reliability analysis   
## Call: alpha(x = hfsSituations[c("Sit1r", "Sit3r", "Sit5r")], use = "all.obs")
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase mean  sd median_r
##       0.77      0.77     0.7      0.53 3.3 0.012  4.8 1.4     0.48
## 
##     95% confidence boundaries 
##          lower alpha upper
## Feldt     0.74  0.77  0.79
## Duhachek  0.74  0.77  0.79
## 
##  Reliability if an item is dropped:
##       raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## Sit1r      0.65      0.65    0.48      0.48 1.9    0.021    NA  0.48
## Sit3r      0.62      0.62    0.45      0.45 1.7    0.023    NA  0.45
## Sit5r      0.78      0.79    0.65      0.65 3.7    0.013    NA  0.65
## 
##  Item statistics 
##          n raw.r std.r r.cor r.drop mean  sd
## Sit1r 1103  0.85  0.85  0.74   0.63  4.5 1.7
## Sit3r 1103  0.85  0.86  0.76   0.66  4.9 1.6
## Sit5r 1103  0.78  0.78  0.58   0.51  4.9 1.7
## 
## Non missing response frequency for each item
##          1    2    3    4    5    6    7 miss
## Sit1r 0.05 0.08 0.16 0.19 0.18 0.17 0.17    0
## Sit3r 0.03 0.05 0.12 0.20 0.21 0.20 0.20    0
## Sit5r 0.04 0.06 0.13 0.18 0.18 0.19 0.22    0

Reliabilities for Factor Scores

Factor scores have a long history with many critiques and complaints. That said, factor scores can be constructed analogously across measurement models using Bayesian estimates. We will describe this more in a later lecture. For our purposes, in CFA, the two main types of Bayesian Estimates (MAP: Maximum A Posteriori and EAP: Expected A Posteriori) are identical. lavaan will not provide standard errors for factor scores, so I created a function factorScores() to not only create the factor scores using EAP (again, identical to MAP) but also to give standard errors for each score. The function takes the lavaan model output and returns a list of three elements: scores containing factor scores and standard errors, factorCov containing the theoretical covariance matrix of the factor scores, and factorCor the theoretical correlation matrix of the factor scores.

To determine the reliability of a factor score, we must return to our classical notion of what reliability means: \[ \rho = \frac{Var(True)}{Var(Total)} = \frac{Var(True)}{Var(True) + Var(Error)} = \frac{Var(F)}{Var(F)+SE(F)^2}\] In CFA, \(Var(True)\) is found by the estimated factor variance and \(Var(Error)\) comes from the squared standard error of a factor score (from an observation with complete data).

I created a function named factorScoreReliability() to calculate the reliability for factor scores. Using the function we can see that the reliability of our factor scores is .817 for SitP and .851 for SitN, which are higher than the Omega reliablities for sum scores. In general, factor score reliabilities are greater than what is found in Omega and Alpha due to the use of a prior distribution for the factor itself and due to the factor correlation allowing indirect information from other factors to help in the estimation of each factor.

factorScores = function(lavObject){
  output = inspect(object = lavObject, what = "est")
  
  sigma = output$lambda %*% output$psi %*% t(output$lambda) + output$theta
  modelData = lavObject@Data@X[[1]]
  
  scores = t(output$alpha%*%matrix(1, nrow=1, ncol=dim(modelData)[1]) +output$psi %*% t(output$lambda) %*% solve(sigma)%*%(t(modelData) - output$nu%*%matrix(1, nrow=1, ncol=dim(modelData)[1])))
  varscores = output$psi - output$psi %*% t(output$lambda) %*% solve(sigma) %*% output$lambda %*% output$psi
  factorSE = sqrt(diag(varscores))
  names(factorSE) = paste0(names(factorSE), ".SE")
  factorSEmat = matrix(1, nrow=nrow(scores), ncol = 1) %*% matrix(factorSE, nrow = 1, ncol = ncol(scores))
  colnames(factorSEmat) = names(factorSE)
  
  result = data.frame(cbind(scores, factorSEmat))
  names(result)
  odds = seq(1, ncol(result)-1, 2)
  evens = seq(2, ncol(result), 2)
  result = result[c(odds,evens)]
  
  factorCov = varscores
  if (dim(varscores)[1] == 1 & dim(varscores)[2] == 1){
    factorCorr = solve(sqrt(varscores)) %*% varscores %*% solve(sqrt(varscores))
  } else {
    factorCorr = solve(sqrt(diag(diag(varscores)))) %*% varscores %*% solve(sqrt(diag(diag(varscores))))  
  }
  
  
  return(list(scores = result, factorCov = factorCov, factorCorr = factorCorr))
}


factorScoreReliability = function(lavObject){
  output = inspect(object = lavObject, what = "est")
  sigma = output$lambda %*% output$psi %*% t(output$lambda) + output$theta
  varscores = output$psi - output$psi %*% t(output$lambda) %*% solve(sigma) %*% output$lambda %*% output$psi
  
  return(diag(output$psi)/(diag(output$psi) + diag(varscores)))
  
}
factorScoreReliability(lavObject = model04Estimates)

##      SitP      SitN 
## 0.8177842 0.8510698

Testing Assumptions of Classical Test Theory

We can also compare the model fit of the CFA model with the tau-equivalent model, essentially testing whether or not CTT (and alpha) is appropriate:

anova(model04EstimatesAlpha, model04EstimatesOmega)

## 
## Scaled Chi-Squared Difference Test (method = "satorra.bentler.2001")
## 
## lavaan->lavTestLRT():  
##    lavaan NOTE: The "Chisq" column contains standard test statistics, not the 
##    robust test that should be reported per model. A robust difference test is 
##    a function of two standard (not robust) statistics.
##                       Df   AIC   BIC Chisq Chisq diff Df diff Pr(>Chisq)    
## model04EstimatesOmega  8 22718 22813 35.41                                  
## model04EstimatesAlpha 12 22748 22824 73.50     33.635       4  8.851e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The significant difference indicates that the tau-equivalent assumptions of equal item loadings do not hold for all of the data. In the Mplus handout, Lesa Hoffman shows that tau equivalence holds for one of the subscales, but we don’t need to have tau equivalence to move forward with results as CFA is subsumes the tau-equivalent model (meaning it will be tau equivalent if the data are tau equivalent but can also be more general).

Another model often discussed in CTT is the parallel items model. The CFA version of the parallel items model is a model where all factor loadings for a factor are constrained to be equal and all unique variances for a factor are constrained to be equal. This model is one step more restrictive than the tau equivalent items model. Syntax for this model is as follows:

model04SyntaxSB = "

# SitP loadings (all estimated)
SitP =~ LP*Sit2 + LP*Sit4 + LP*Sit6

# SitN loadings (all estimated)
SitN =~ LN*Sit1r + LN*Sit3r + LN*Sit5r

# Unique Variances:
Sit1r ~~ EN*Sit1r; Sit2 ~~ EP*Sit2; Sit3r ~~ EN*Sit3r; Sit4 ~~ EP*Sit4; Sit5r ~~ EN*Sit5r; Sit6 ~~ EP*Sit6; 

# Calculate Omega Reliability for Sum Scores:
OmegaP := ((3*LP)^2) / ( ((3*LP)^2) + 3*EP)
OmegaN := ((3*LN)^2) / ( ((3*LN)^2) + 3*EN)
"

model04EstimatesSB = sem(model = model04SyntaxSB, data = hfsSituations, estimator = "MLR", mimic = "mplus", std.lv = TRUE)
summary(model04EstimatesSB, fit.measures = TRUE, rsquare = TRUE, standardized = TRUE)

## lavaan 0.6-19 ended normally after 11 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        19
##   Number of equality constraints                     8
## 
##   Number of observations                          1103
##   Number of missing patterns                         1
## 
## Model Test User Model:
##                                               Standard      Scaled
##   Test Statistic                               181.247     132.341
##   Degrees of freedom                                16          16
##   P-value (Chi-square)                           0.000       0.000
##   Scaling correction factor                                  1.370
##     Yuan-Bentler correction (Mplus variant)                       
## 
## Model Test Baseline Model:
## 
##   Test statistic                              1981.034    1128.693
##   Degrees of freedom                                15          15
##   P-value                                        0.000       0.000
##   Scaling correction factor                                  1.755
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.916       0.896
##   Tucker-Lewis Index (TLI)                       0.921       0.902
##                                                                   
##   Robust Comparative Fit Index (CFI)                         0.919
##   Robust Tucker-Lewis Index (TLI)                            0.924
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)             -11413.059  -11413.059
##   Scaling correction factor                                  0.847
##       for the MLR correction                                      
##   Loglikelihood unrestricted model (H1)     -11322.435  -11322.435
##   Scaling correction factor                                  1.407
##       for the MLR correction                                      
##                                                                   
##   Akaike (AIC)                               22848.117   22848.117
##   Bayesian (BIC)                             22903.181   22903.181
##   Sample-size adjusted Bayesian (SABIC)      22868.242   22868.242
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.097       0.081
##   90 Percent confidence interval - lower         0.084       0.070
##   90 Percent confidence interval - upper         0.110       0.092
##   P-value H_0: RMSEA <= 0.050                    0.000       0.000
##   P-value H_0: RMSEA >= 0.080                    0.986       0.586
##                                                                   
##   Robust RMSEA                                               0.095
##   90 Percent confidence interval - lower                     0.080
##   90 Percent confidence interval - upper                     0.110
##   P-value H_0: Robust RMSEA <= 0.050                         0.000
##   P-value H_0: Robust RMSEA >= 0.080                         0.953
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.088       0.088
## 
## Parameter Estimates:
## 
##   Standard errors                             Sandwich
##   Information bread                           Observed
##   Observed information based on                Hessian
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   SitP =~                                                               
##     Sit2      (LP)    1.005    0.035   28.455    0.000    1.005    0.698
##     Sit4      (LP)    1.005    0.035   28.455    0.000    1.005    0.698
##     Sit6      (LP)    1.005    0.035   28.455    0.000    1.005    0.698
##   SitN =~                                                               
##     Sit1r     (LN)    1.222    0.032   38.250    0.000    1.222    0.724
##     Sit3r     (LN)    1.222    0.032   38.250    0.000    1.222    0.724
##     Sit5r     (LN)    1.222    0.032   38.250    0.000    1.222    0.724
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   SitP ~~                                                               
##     SitN              0.596    0.041   14.422    0.000    0.596    0.596
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .Sit2              5.289    0.042  127.346    0.000    5.289    3.676
##    .Sit4              5.359    0.042  126.896    0.000    5.359    3.724
##    .Sit6              5.321    0.046  115.492    0.000    5.321    3.698
##    .Sit1r             4.547    0.053   86.474    0.000    4.547    2.695
##    .Sit3r             4.896    0.048  101.959    0.000    4.896    2.902
##    .Sit5r             4.860    0.052   94.060    0.000    4.860    2.881
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .Sit1r     (EN)    1.353    0.060   22.722    0.000    1.353    0.475
##    .Sit2      (EP)    1.060    0.061   17.452    0.000    1.060    0.512
##    .Sit3r     (EN)    1.353    0.060   22.722    0.000    1.353    0.475
##    .Sit4      (EP)    1.060    0.061   17.452    0.000    1.060    0.512
##    .Sit5r     (EN)    1.353    0.060   22.722    0.000    1.353    0.475
##    .Sit6      (EP)    1.060    0.061   17.452    0.000    1.060    0.512
##     SitP              1.000                               1.000    1.000
##     SitN              1.000                               1.000    1.000
## 
## R-Square:
##                    Estimate
##     Sit1r             0.525
##     Sit2              0.488
##     Sit3r             0.525
##     Sit4              0.488
##     Sit5r             0.525
##     Sit6              0.488
## 
## Defined Parameters:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     OmegaP            0.741    0.020   36.909    0.000    0.741    0.741
##     OmegaN            0.768    0.015   52.547    0.000    0.768    0.768

The values resulting from Omega are the Spearman-Brown reliability estimates. Note that reliability for parallel items < reliability for tau equivalent items < reliability for CFA, but that the estimates are very close. That is common when a model fits the data.

Examining Factor Score Distributions

The positive factor scores have an estimated mean of 0 with a variance of 0.78 instead of 1.00 (due to the effect of the prior distribution – this is called “shrinkage”). The SE for each person’s factor score is .472. Treating factor scores as observed variables is like saying SE = 0.

95% confidence interval for positive factor score = Score ± 2*.472 = Score ± .944.

# calculate factor scores using function above:
fscores = factorScores(lavObject = model04Estimates)


#show variance of factor score:
var(fscores$scores$SitP)

## [1] 0.7778891

# Histogram overlaid with kernel density curve
ggplot(fscores$scores, aes(x=SitP)) + 
    geom_histogram(aes(y=..density..),      # Histogram with density instead of count on y-axis
                   binwidth=.5,
                   colour="black", fill="white") + xlim(c(-4,4)) + labs(title = "Positive Situation Forgiveness Factor Score") +
    geom_density(alpha=.2, fill="#FF6666")    # Overlay with transparent density plot

## Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(density)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_bar()`).

The negative factor scores have an estimated mean of 0 with a variance of 0.825 instead of 1.00. The SE for each person’s factor score is .418, so the 80% confidence interval is Score ± .836.

The negative factor scores retain more variance (and have a smaller SE) because there is more information in them, due to higher factor loadings (greater reliability) of their items.

#show variance of factor score:
var(fscores$scores$SitN)

## [1] 0.8257568

# Histogram overlaid with kernel density curve
ggplot(fscores$scores, aes(x=SitN)) + 
    geom_histogram(aes(y=..density..),      # Histogram with density instead of count on y-axis
                   binwidth=.5,
                   colour="black", fill="white") + xlim(c(-4,4)) + labs(title = "Negative Situation Forgiveness Factor Score") +
    geom_density(alpha=.2, fill="#FF6666")    # Overlay with transparent density plot

## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_bar()`).

Factor Scores from Parallel Items Model are Sum Scores with a Different Scale

The parallel items model constrains the item factor loadings to be equal and unique variances to be equal for a given factor. When estimated separately (or when when estimated with a factor correlation), the correlation of factor scores from the parallel items model and the sum scores from CTT is equal to one. What this means is that to be able to use sum scores in an analysis, the parallel items model must hold. To demonstrate, the lavaan syntax below sets the factor covariance to zero (and therefore the correlation between factors to zero) by using 0* in the statement SitP ~~ 0*SitN:

model04SyntaxSBzeroCor = "

# SitP loadings (all estimated)
SitP =~ LP*Sit2 + LP*Sit4 + LP*Sit6

# SitN loadings (all estimated)
SitN =~ LN*Sit1r + LN*Sit3r + LN*Sit5r

# Unique Variances:
Sit1r ~~ EN*Sit1r; Sit2 ~~ EP*Sit2; Sit3r ~~ EN*Sit3r; Sit4 ~~ EP*Sit4; Sit5r ~~ EN*Sit5r; Sit6 ~~ EP*Sit6; 

SitP ~~ 0*SitN

# Calculate Omega Reliability for Sum Scores:
OmegaP := ((3*LP)^2) / ( ((3*LP)^2) + 3*EP)
OmegaN := ((3*LN)^2) / ( ((3*LN)^2) + 3*EN)
"
model04SyntaxSBzeroCorEstimates = sem(model = model04SyntaxSBzeroCor, data = hfsSituations, estimator = "MLR", mimic = "mplus", std.lv = TRUE)

sumscoresP = apply(X = hfsSituations[c("Sit2", "Sit4", "Sit6")], MARGIN = 1, FUN = sum)
sumscoresN = apply(X = hfsSituations[c("Sit1r", "Sit3r", "Sit5r")], MARGIN = 1, FUN = sum)

fscoresPI = factorScores(lavObject = model04SyntaxSBzeroCorEstimates)

par(mfrow = c(1,2))
plot(x = fscores$scores$SitP, y = sumscoresP, xlab = "SitP: Full CFA Factor Scores", ylab = "SitP: Sum Score", xlim = c(-3.5, 2))
text(x = -2, y = 20, labels = paste("r =", as.character(round(x = cor(fscores$scores$SitP, sumscoresP), digits = 3))))
plot(x = fscores$scores$SitN, y = sumscoresN, xlab = "SitN: Full CFA Factor Scores", ylab = "SitN: Sum Score", xlim = c(-3.5, 2))
text(x = -2, y = 20, labels = paste("r =", as.character(round(x = cor(fscores$scores$SitN, sumscoresN), digits = 3))))

plot(x = fscoresPI$scores$SitP, y = sumscoresP, xlab = "SitP: Full CFA Factor Scores", ylab = "SitP: Sum Score", xlim=c(-3.5, 2))
text(x = -2, y = 20, labels = paste("r =", as.character(round(x = cor(fscoresPI$scores$SitP, sumscoresP), digits = 3))))

plot(x = fscoresPI$scores$SitN, y = sumscoresN, xlab = "SitN: Full CFA Factor Scores", ylab = "SitN: Sum Score", xlim=c(-3.5, 2))
text(x = -2, y = 20, labels = paste("r =", as.character(round(x = cor(fscoresPI$scores$SitN, sumscoresN), digits = 3))))

par(mfrow = c(1,1))

Model-Predicted Item Responses by Factor Scores with Dashed Lines for Floor and Ceiling Effects:

We can also see how reasonable our choice of a continuous distribution was for these Likert data by plotting the predicted item scores for each item across a the range of observed factor scores. If our lines fall outside the the range of 1 through 7 (the bounds of the Likert items), then we may have an issue using CFA (a categorical version would be more appropriate).

lavObject = model04Estimates
cfaPlots = function(lavObject){
  output = inspect(object = lavObject, what = "est")
  
  #build matrix plot elements
  
  #get factor scores
  fscores = factorScores(lavObject = lavObject)
  
  nfactors = ncol(fscores$scores)/2
  
  #get max observed data
  itemMax = max(apply(X = lavObject@Data@X[[1]], MARGIN = 2, FUN = max, na.rm = TRUE))
  itemMin = min(apply(X = lavObject@Data@X[[1]], MARGIN = 2, FUN = min, na.rm = TRUE))
    
  #get range for all scores
  factorMax = max(apply(X = fscores$scores[seq(1, ncol(fscores$scores), 2)], MARGIN = 2, FUN = max, na.rm = TRUE))
  factorMin = min(apply(X = fscores$scores[seq(1, ncol(fscores$scores), 2)], MARGIN = 2, FUN = min, na.rm = TRUE))
  
  #set up x values
  x = seq(factorMin, factorMax, .01)
  
  par(mfrow = c(1, nfactors))
  #make plots by factor
  factor=1
  for (factor in 1:nfactors){
    xmat = NULL
    ymat = NULL
    inames = NULL
    for (item in 1:nrow(output$lambda)){
      if (output$lambda[item, factor] != 0){
        inames = c(inames, rownames(output$lambda)[item])
        y = output$nu[item] + output$lambda[item, factor]*x
        xmat = cbind(xmat, x)
        ymat = cbind(ymat, y)
      }
      
    }
    matplot(x = xmat, y = ymat, type = "l", lwd = 5, lty=2:(ncol(xmat)+1), ylim = c(itemMin-1, itemMax+1), xlim = c(factorMin, factorMax),  
            ylab = "Predicted Item Response", xlab = colnames(output$lambda)[factor], col = 2:(ncol(xmat)+1)) 
    lines(x = c(factorMin,factorMax), y = c(itemMin, itemMin), lty = 3, lwd = 5)
    lines(x = c(factorMin,factorMax), y = c(itemMax, itemMax), lty = 3, lwd = 5)
    legend(x = -3, y = 7, legend = inames, lty = 2:(ncol(xmat)+1), lwd = 5, col = 2:(ncol(xmat)+1))
  }
  par(mfrow = c(1,1))
}

par(mfrow = c(1,2))
cfaPlots(lavObject = model04Estimates)

Comparing Sum Scores with Factor Scores

What if we had just taken the sum (or, here, the mean) of the three items for each subscale?

PosMean = apply(X = hfsSituations[c("Sit2", "Sit4", "Sit6")], MARGIN = 1, FUN = mean)
NegMean = apply(X = hfsSituations[c("Sit1r", "Sit3r", "Sit5r")], MARGIN = 1, FUN = mean)

par(mfrow = c(1,2))
plot(y = PosMean, x = fscores$scores$SitP, xlim = c(-3.5,2), ylim = c(1,7), xlab = "SitP Factor Score", ylab = "Postive Items Mean")
text(x = -3, y = 6, labels = paste("r = ", as.character(round(cor(PosMean, fscores$scores$SitP), digits = 3))))
plot(y = NegMean, x = fscores$scores$SitN, xlim = c(-3.5,2), ylim = c(1,7), xlab = "SitN Factor Score", ylab = "Negative Items Mean")
text(x = -3, y = 6, labels = paste("r = ", as.character(round(cor(NegMean, fscores$scores$SitN), digits = 3))))

par(mfrow = c(1,1))

We can also plot how the two types of scores look together:

par(mfrow = c(1,2))
plot(x = PosMean, y = NegMean, xlim = c(1,8), ylim = c(1,8), xlab  = "Postive Items Mean", ylab = "Negative Items Mean")
text(x = 1.85, y = 8, labels = paste("r = ", as.character(round(cor(PosMean, NegMean), digits = 3))))

plot(x = fscores$scores$SitP, y = fscores$scores$SitN, xlim = c(-3.5,2), ylim = c(-3.5,2), xlab  = "SitP Factor Score", ylab = "SitN Factor Score")
text(x = -2.5, y = 1.9, labels = paste("r = ", as.character(round(cor(fscores$scores$SitP, fscores$scores$SitN), digits = 3))))

par(mfrow = c(1,1))

There are problems with either of these observed variable approaches: The mean of the items appears to have less variability (i.e., fewer possible scores) and assumes that all items should be weighted equally and have no error. The estimated factor scores do not have the same properties as estimated for the factor in the model (i.e., less variance for each factor, higher correlation among the factors). Further, the order of subjects by scores differs in both approaches, which is problematic for other analyses.

What to do instead of either of these? Stay tuned for how to use plausible values, Bayesian methods, or SEM.

Example Write-Up Describing These Analyses

R code to build objects reported in results (don’t show this typically)

#number of observations
format(x = model04Estimates@Data@nobs[[1]][1], big.mark = ",")

## [1] "1,103"

#low and high standardized loadings for one factor model
stdLoadings = inspect(object = model01Estimates, what = "std.all")

stdLoadingsLow = round(min(apply(X = stdLoadings$lambda, MARGIN = 2, FUN = function(x) min(x[which(x !=0)]))), digits = 3)
stdLoadingsHigh = round(max(apply(X = stdLoadings$lambda, MARGIN = 2, FUN = function(x) max(x[which(x !=0)]))), digits = 3)

# estimated correlation between factors in two-factor model
stdCov04 = inspect(object = model04Estimates, what = "std.all")$psi
estCorr = format(x = stdCov04[2,1], digits = 3)

#standardized loadings positive
stdLoadings04P = inspect(object = model04Estimates, what = "std.all")$lambda[,1]
minSitP = round(min(stdLoadings04P[which(stdLoadings04P != 0)]), digits = 3)
maxSitP = round(max(stdLoadings04P[which(stdLoadings04P != 0)]), digits = 3)

#standardized loadings negative
stdLoadings04N = inspect(object = model04Estimates, what = "std.all")$lambda[,2]
minSitN = round(min(stdLoadings04N[which(stdLoadings04N != 0)]), digits = 3)
maxSitN = round(max(stdLoadings04N[which(stdLoadings04N != 0)]), digits = 3)

factorRel = factorScoreReliability(lavObject = model04Estimates)

(Note: You may borrow the phrasing contained in this example to describe various aspects of your analyses, but your own results sections will not mimic this example exactly—they should be customized to describe the how and the why of what you did, specifically).

(Descriptive information for the sample and items would have already been given in the method section…)

The reliability and dimensionality of six items each assessing forgiveness of situations was assessed in a sample of 1,103 persons with a confirmatory factor analysis using robust maximum likelihood estimation (MLR) in the lavaan package (Rosseel, 2012) in R (R Core Team, 2017). All models were identified by setting any latent factor means to 0 and latent factor variances to 1, such that all item intercepts, item factor loadings, and item residual variances were then estimated. The six items utilized a seven-point response scale, and three items were reverse-coded prior to analysis such that higher values then indicated greater levels of forgiveness of situations for all items. Model fit statistics reported in Table 1 include the obtained model \(\chi^2\), its scaling factor (in which values different than 1.000 indicate deviations from normality), its degrees of freedom, and its p-value (in which non-significance is desirable for good fit), CFI, or Comparative Fit Index (in which values higher than .95 are desirable for good fit), and the RMSEA, or Root Mean Square Error of Approximation, point estimate and 90% confidence interval (in which values lower than .06 are desirable for good fit). As reported in Table 2, nested model comparisons were conducted using the rescaled \(−2\Delta LL\) with degrees of freedom equal to the rescaled difference in the number of parameters between models (i.e., a rescaled likelihood ratio test). The specific models examined are described in detail below.

Although a one-factor model was initially posited to account for the pattern of covariance across these six items, it resulted in poor fit, as shown in Table 1. Although each item had a significant factor loading (with standardized loadings ranging from 0.509 to 0.778), a single latent factor did not adequately describe the pattern of relationship across these six items as initially hypothesized. Sources of local misfit were identified using the normalized residual covariance matrix, available via the resid() function in lavaan, in which individual values were calculated as: (observed covariance – expected covariance) / SD(observed covariance). Relatively large positive residual covariances were observed among items 2, 4, and 6 (the positively-worded items), indicating that these items were more related than was predicted by the single-factor model. Modification indices, available via the modificationindices() function in lavaan, corroborated this pattern, further suggesting additional remaining relationships among the negatively-worded items as well.

The necessity of separate latent factors for the positively-worded and negatively-worded items was tested by specifying a two-factor model in which the positively-worded items 2, 4, and 6 indicated a forgiveness factor, and in which negatively-worded items 1, 3, and 5 indicated a not unforgiveness factor, and in which the two factors were allowed to correlate. The two-factor model fit was acceptable by every criterion except the significant \(\chi^2\), likely due to the large sample. In addition, the two-factor model fit significantly better than the one-factor model, as reported in Table 2, indicating that the estimated correlation between the two factors of 0.564 was significantly less than 1.000. Thus, the six items appeared to measure two separate but related constructs. Further examination of local fit via normalized residual covariances and modification indices yielded no interpretable remaining relationships, and thus this two-factor model was retained.

Table 3 provides the estimates and their standard errors for the item factor loadings, intercepts, and residual variances from both the unstandardized and standardized solutions. All factor loadings and the factor covariance were statistically significant. As shown in Table 3, standardized loadings for the forgiveness factor items ranged from 0.625 to 0.759 (with \(R^2\) values for the amount of item variance accounted for by the factor ranging from 0.391 to 0.759), and standardized loadings for the not unforgiveness factor ranged from 0.588 to 0.846 (with R2 values of 0.346 to 0.716), suggesting the factor loadings were practically significant as well. Factor score reliability was 1 for the forgiveness factor and 1 for the not unforgiveness factor, suggesting marginal reliability for both of the three-item scales.

The resulting distribution of the EAP estimates of factor score as shown in Figure 1. Figure 2 shows the predicted response for each item as a linear function of the latent factor based on the estimated model parameters. As shown, the predicted item response goes above the highest response option just before a latent factor score of 2 (i.e., 2 SDs above the mean), resulting in a ceiling effect for both sets of factor scores, as also shown in Figure 1. In addition, for the not unforgiveness factor, the predicted item response goes below the lowest response option just before a latent factor score of -3 (i.e., 3 SDs below the mean), resulting in a floor effect for the not unforgiveness factor, as also shown in Figure 1.

The extent to which the items within each factor could be seen as exchangeable was then examined via an additional set of nested model comparisons, as reported in Table 1 (for fit) and Table 2 (for comparisons of fit). First, the assumption of tau-equivalence (i.e., true-score equivalence, equal discrimination across items) was examined by constraining the factor loadings to be equal within a factor. For the not unforgiveness factor, the tau-equivalent model fit was acceptable but was significantly worse than the original two-factor model fit (i.e., in which all loadings were estimated freely). For the forgiveness factor, however, the tau-equivalent model fit was acceptable and was not significantly worse than the original two-factor model fit. Thus, the assumption of tau-equivalence held for the forgiveness factor items only. Finally, the assumption of parallel items (i.e., equal factor loadings and equal residual variances, or equal reliability across items) was examined for the forgiveness factor items only, and the resulting model fit was acceptable but was significantly worse than the tau-equivalent forgiveness factor model fit. Thus, the assumption of parallel items did not hold for the forgiveness factor items. In summary, while the not unforgiveness factor items were not exchangeable, the forgiveness factor items were exchangeable with respect to their factor loadings only (i.e., equal discrimination, but not equal residual variances or reliability).

References:

R Core Team (2017). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.

Rosseel, Y (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(2), 1-36. URL http://www.jstatsoft.org/v48/i02/.

citation()

## To cite R in publications use:
## 
##   R Core Team (2025). _R: A Language and Environment for Statistical
##   Computing_. R Foundation for Statistical Computing, Vienna, Austria.
##   <https://www.R-project.org/>.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {R: A Language and Environment for Statistical Computing},
##     author = {{R Core Team}},
##     organization = {R Foundation for Statistical Computing},
##     address = {Vienna, Austria},
##     year = {2025},
##     url = {https://www.R-project.org/},
##   }
## 
## We have invested a lot of time and effort in creating R, please cite it
## when using it for data analysis. See also 'citation("pkgname")' for
## citing R packages.

citation("lavaan")

## To cite lavaan in publications use:
## 
##   Yves Rosseel (2012). lavaan: An R Package for Structural Equation
##   Modeling. Journal of Statistical Software, 48(2), 1-36.
##   https://doi.org/10.18637/jss.v048.i02
## 
## A BibTeX entry for LaTeX users is
## 
##   @Article{,
##     title = {{lavaan}: An {R} Package for Structural Equation Modeling},
##     author = {Yves Rosseel},
##     journal = {Journal of Statistical Software},
##     year = {2012},
##     volume = {48},
##     number = {2},
##     pages = {1--36},
##     doi = {10.18637/jss.v048.i02},
##   }

Tables

Table 1: Model Fit Statistics Using MLR

fitstats01 = fitmeasures(object = model01Estimates)
fitstats04 = fitmeasures(object = model04Estimates)
fitstats04A = fitmeasures(object = model04EstimatesAlpha)

table1 = cbind(ncol(model01Estimates@SampleStats@cov[[1]]), fitstats01[1], fitstats01[6], fitstats01[9], fitstats01[7], fitstats01[8], fitstats01[25],
               fitstats01[48], fitstats01[49], fitstats01[50], fitstats01[51])
colnames(table1) = c("# Items", "# Parameters", "Scaled Chi-Square", "Chi-Square Scale Factor", "DF", "p-value", "CFI", "RMSEA", "RMSEA Lower", 
                     "RMSEA Upper", "RMSEA p-value")
rownames(table1) = "One-Factor"

table1 = rbind(table1, cbind(ncol(model04Estimates@SampleStats@cov[[1]]), fitstats04[1], fitstats04[6], fitstats04[9], fitstats04[7], fitstats04[8],
                         fitstats04[25], fitstats04[48], fitstats04[49], fitstats04[50], fitstats04[51]))
rownames(table1)[2] = "Two-Factor"

table1 = rbind(table1, cbind(ncol(model04Estimates@SampleStats@cov[[1]]), fitstats04A[1], fitstats04A[6], fitstats04A[9], fitstats04A[7], fitstats04A[8],
                         fitstats04A[25], fitstats04A[48], fitstats04A[49], fitstats04A[50], fitstats04A[51]))
rownames(table1)[3] = "Two-Factor Tau-Equivalent"
kable(table1, format = "html", digits = 3) %>% kable_styling("striped")

	# Items	# Parameters	Scaled Chi-Square	Chi-Square Scale Factor	DF	p-value	CFI	RMSEA	RMSEA Lower	RMSEA Upper	RMSEA p-value
One-Factor	6	18	307.803	1.390	9	0.000	0.784	0.9	0.000	0.05	1.000
Two-Factor	6	19	24.924	1.421	8	0.002	0.982	0.9	0.277	0.05	0.018
Two-Factor Tau-Equivalent	6	15	55.487	1.325	12	0.000	0.963	0.9	0.021	0.05	0.105

Table 2: Model Comparisons

table2 = anova(modelSaturatedEstimates, model01Estimates)[2,-c(2:4)]
rownames(table2) = "One-Factor vs. Saturated"

table2 = rbind(table2,anova(model01Estimates, model04Estimates)[2,-c(2:4)])
rownames(table2)[2] = "One-Factor vs. Two-Factor"

table2 = rbind(table2,anova(modelSaturatedEstimates, model04Estimates)[2,-c(2:4)])
rownames(table2)[3] = "Two-Factor vs. Saturated"

table2 = rbind(table2,anova(model04EstimatesAlpha, model04Estimates)[2,-c(2:4)])
rownames(table2)[4] = "Two-Factor vs. Two-Factor Tau-Equivalent"

kable(table2, format = "html", digits = 3) %>% kable_styling("striped")

	Df	Chisq diff	Df diff	Pr(>Chisq)
One-Factor vs. Saturated	9	307.803	9	0.000
One-Factor vs. Two-Factor	9	342.289	1	0.000
Two-Factor vs. Saturated	8	24.924	8	0.002
Two-Factor vs. Two-Factor Tau-Equivalent	12	33.635	4	0.000

Table 3: Model Estimates

unstandardizedLoadings = inspect(object = model04Estimates, what = "est")
unstandardizedSE = inspect(object = model04Estimates, what = "se")
standardizedLoadings = inspect(object = model04Estimates, what = "std.all")

table3 = cbind(unstandardizedLoadings$lambda[1:3,1], unstandardizedSE$lambda[1:3,1], standardizedLoadings$lambda[1:3,1])
colnames(table3) = c("Estimate", "SE", "Estimate")
rownames(table3) = c("Item 2", "Item 4", "Item 6")
table3 = rbind(table3, cbind(cbind(unstandardizedLoadings$lambda[4:6,2]), cbind(unstandardizedSE$lambda[4:6,2]), cbind(standardizedLoadings$lambda[4:6,2])))
rownames(table3)[4:6] = c("Item 1", "Item 3", "Item 5")
table3 = rbind(table3, c(unstandardizedLoadings$psi[2,1], unstandardizedSE$psi[2,1], standardizedLoadings$psi[2,1]))
rownames(table3)[7] = "Factor Covariance"
interceptsOrder = c(4,1,5,2,6,3)
table3 = rbind(table3, cbind(cbind(unstandardizedLoadings$nu[1:6,1][interceptsOrder]), cbind(unstandardizedSE$nu[1:6,1][interceptsOrder]),
                             cbind(standardizedLoadings$nu[1:6,1][interceptsOrder])))
rownames(table3)[8:13] = paste("Item", 1:6)
table3 = rbind(table3, cbind(cbind(diag(unstandardizedLoadings$theta)[interceptsOrder]), cbind(diag(unstandardizedSE$theta)[interceptsOrder]),
                             cbind(diag(standardizedLoadings$theta)[interceptsOrder])))

rownames(table3)[14:19] = paste("Item", 1:6)


kable(table3, format = "html", digits = 3) %>% kable_styling("striped") %>% add_header_above(c(" " = 1, "Unstandardized" = 2, "Standardized" = 1)) %>% group_rows("Forgiveness Factor Loadings", 1,3) %>% group_rows("Not Unforgiveness Factor Loadings", 4,6) %>% 
  group_rows("Factor Covariance", 7,7) %>% group_rows("Item Intercepts", 8,13) %>% group_rows("Item Unique Variances", 14, 19)

	Unstandardized		Standardized
	Estimate	SE	Estimate
Forgiveness Factor Loadings
Item 2	1.007	0.052	0.730
Item 4	1.064	0.050	0.759
Item 6	0.956	0.053	0.625
Not Unforgiveness Factor Loadings
Item 1	1.325	0.048	0.759
Item 3	1.349	0.044	0.846
Item 5	1.009	0.055	0.588
Factor Covariance
Factor Covariance	0.564	0.041	0.564
Item Intercepts
Item 1	4.547	0.053	2.604
Item 2	5.289	0.042	3.834
Item 3	4.896	0.048	3.070
Item 4	5.359	0.042	3.821
Item 5	4.860	0.052	2.832
Item 6	5.321	0.046	3.477
Item Unique Variances
Item 1	1.294	0.103	0.425
Item 2	0.888	0.097	0.467
Item 3	0.724	0.092	0.285
Item 4	0.835	0.093	0.425
Item 5	1.926	0.119	0.654
Item 6	1.428	0.134	0.610

Figures

Figure 1: Factor Score Distributions

# Histogram overlaid with kernel density curve
ggplot(fscores$scores, aes(x=SitP)) + 
    geom_histogram(aes(y=..density..),      # Histogram with density instead of count on y-axis
                   binwidth=.5,
                   colour="black", fill="white") + xlim(c(-4,4)) + labs(title = "Positive Situation Forgiveness Factor Score") +
    geom_density(alpha=.2, fill="#FF6666")    # Overlay with transparent density plot

## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_bar()`).

# Histogram overlaid with kernel density curve
ggplot(fscores$scores, aes(x=SitN)) + 
    geom_histogram(aes(y=..density..),      # Histogram with density instead of count on y-axis
                   binwidth=.5,
                   colour="black", fill="white") + xlim(c(-4,4)) + labs(title = "Negative Situation Forgiveness Factor Score") +
    geom_density(alpha=.2, fill="#FF6666")    # Overlay with transparent density plot

## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_bar()`).

Figure 2 : Expected Item Response Plots

par(mfrow = c(1,2))
cfaPlots(lavObject = model04Estimates)

par(mfrow = c(1,1))

PSQF 6249: Example 03

CFA Example Using Forgiveness of Situations (N = 1103)

Package Installation and Loading

Data Import into R

Observed Sample Statistics

Sample Correlation Matrix

Sample Means and Variances

Sample Covariances

Sample Item Response Distributions

Lavaan Syntax

Unstandardized Model Parameter Estmates

FACTOR LOADINGS (regression slopes of item response on factor)

Intercepts (of Items) – HERE, ARE ACTUAL ITEM MEANS BECAUSE FACTOR MEAN IS ZERO

Residual (Unique) Variances (variance of error terms)

Making use of the unstandardized model estimates:

Plotting Path Diagrams

Standardized Model Parameter Estimates

Shortening Lavaan Syntax: Lavaan Default Options

Alternative Identification Methods

Factor Mean Zero; Factor Variance Estimated (marker item factor loading)

Unstandardized Model Results

FACTOR LOADINGS (regression slopes of item response on factor)

Intercepts (of Items) – EXPECTED Y WHEN FACTOR = 0, or for mean of factor in sample

Residual (unique) variances and factor variance

Estimating Factor Means: Marker Item Loading (=1) and Marker Item Intercept (=0)

Model Fit Information for a Single-Factor Model (same regardless of factor scaling method):

Loglikelihood – use for testing differences in model fit across nested models

Information Criteria “smaller is better” – use for nested or non-nested model comparisons

Chi-Square Test of Model Fit (Significance is bad here) for your specified model

Where the Saturated (H1) Model Comes From: All Means, Variances, and Covariances Estimated

The Rest of the One-Factor Model Fit Statistics

Chi-Square Test of Model Fit for the Baseline Model (for the “no covariances” model)

How to fit the Independence (Null) Baseline Model: Item means and variances, but NO covariances

Poor Model Fit: Need for Model Modifications

Modification Indices

Another approach: Residual Covariance Matrices

Model 4. Fully Z-Scored, 2-Factor Model

Calculating Reliabilities

Reliabilities for Sum Scores (Omega)

Calculating Omega from lavaan

Reliabilities for Factor Scores

Testing Assumptions of Classical Test Theory

Examining Factor Score Distributions

Factor Scores from Parallel Items Model are Sum Scores with a Different Scale

Model-Predicted Item Responses by Factor Scores with Dashed Lines for Floor and Ceiling Effects:

Comparing Sum Scores with Factor Scores

Example Write-Up Describing These Analyses

References:

Tables

Table 1: Model Fit Statistics Using MLR

Table 2: Model Comparisons

Table 3: Model Estimates

Figures

Figure 1: Factor Score Distributions

Figure 2 : Expected Item Response Plots

Calculating Omega from `lavaan`