The Forgiveness of Situations Subscale includes six items, three of which are reverse-coded, on a seven-point scale:
Response Anchors:
For this example, we will be using the ggplot2
(for
general plotting), melt2
(for data reshaping before
plotting), and lavaan
(for CFA) packages.
if (require(ggplot2)==FALSE){
install.packages("ggplot2")
}
## Loading required package: ggplot2
library(ggplot2)
if (require(reshape2) == FALSE){
install.packages("reshape2")
}
## Loading required package: reshape2
if (require(lavaan) == FALSE){
install.packages("lavaan")
}
## Loading required package: lavaan
## This is lavaan 0.6-19
## lavaan is FREE software! Please report any bugs.
library(lavaan)
if (require(psych) == FALSE){
install.packages("psych")
}
## Loading required package: psych
##
## Attaching package: 'psych'
## The following object is masked from 'package:lavaan':
##
## cor2cov
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
if (require(knitr) == FALSE){
install.packages("knitr")
}
## Loading required package: knitr
if (require(kableExtra) == FALSE){
install.packages("kableExtra")
}
## Loading required package: kableExtra
if (require(semPlot) == FALSE){
install.packages("semPlot")
}
## Loading required package: semPlot
The data are in a text file named Study2.dat
originally
used in Mplus (so no column names were included at the top of the file).
The file contains more items than we will use, so we select only the
“Forgiveness of Situations” items from the whole file.
#read in data file (Mplus file format so having to label columns)
hfsData = read.table(file = "Study2.dat", header = FALSE, na.strings = "99999", col.names = c("PersonID", "Self1", "Self2r", "Self3", "Self4r", "Self5", "Self6r", "Other1r", "Other2", "Other3r", "Other4", "Other5r", "Other6", "Sit1r", "Sit2", "Sit3r", "Sit4", "Sit5r", "Sit6", "Selfsub", "Othsub", "Sitsub", "HFSsum"))
#select Situations items and PersonID variables
hfsSituations = hfsData[c("PersonID", "Sit1r", "Sit2", "Sit3r", "Sit4", "Sit5r", "Sit6")]
The observed correlation matrix, rounded to three digits:
#here the c() function selects only the variables, not the PersionID variable
round(cor(hfsSituations[c("Sit1r", "Sit2", "Sit3r", "Sit4", "Sit5r", "Sit6")]), digits = 3)
## Sit1r Sit2 Sit3r Sit4 Sit5r Sit6
## Sit1r 1.000 0.240 0.647 0.300 0.453 0.297
## Sit2 0.240 1.000 0.317 0.570 0.255 0.457
## Sit3r 0.647 0.317 1.000 0.369 0.482 0.356
## Sit4 0.300 0.570 0.369 1.000 0.289 0.448
## Sit5r 0.453 0.255 0.482 0.289 1.000 0.304
## Sit6 0.297 0.457 0.356 0.448 0.304 1.000
The observed means, rounded to three digits:
apply(X = hfsSituations[c("Sit1r", "Sit2", "Sit3r", "Sit4", "Sit5r", "Sit6")], MARGIN = 2, FUN = function(x) round(mean(x), digits = 3))
## Sit1r Sit2 Sit3r Sit4 Sit5r Sit6
## 4.547 5.289 4.896 5.359 4.860 5.321
The observed variances (using \(N\) in the denominator to match ML estimated output and Mplus example), rounded to three digits:
apply(X = hfsSituations[c("Sit1r", "Sit2", "Sit3r", "Sit4", "Sit5r", "Sit6")], MARGIN = 2, FUN = function(x) round(var(x, na.rm = TRUE)*1102/1103, digits = 3))
## Sit1r Sit2 Sit3r Sit4 Sit5r Sit6
## 3.049 1.903 2.543 1.967 2.945 2.341
To do a CFA analysis, you only really need means, variances, and either correlations or covariances among items. That said, modern methods of estimation use the raw data (often called full information) rather than the summary statistics as the raw data enable better missing data assumptions when using maximum likelihood and Bayesian estimation methods.
The sample covariance matrix can be found from the sample correlations and variances. Each covariance between a pair of variables \(y_1\) and \(y_2\) is denoted with a \(\sigma_{y_1, y_2}\) and each correlation is denoted with a \(\rho_{y_1, y_2}\). The variance of a variable is denoted by \(\sigma^2_{y_1}\) and the standard deviation of a variable is the square root of the variance \(\sqrt{\sigma^2_{y_1}}\). The covariance can be found by taking the correlation and multiplying it by the product of the standard deviations.
\[\sigma_{y_1, y_2} = \rho_{y_1, y_2}\sqrt{\sigma^2_{y_1}}\sqrt{\sigma^2_{y_2}}. \]
Inversely, the correlation can be found by taking the covariance and dividing it by the product of the standard deviations:
\[\rho_{y_1, y_2} = \frac{\sigma_{y_1, y_2}}{\sqrt{\sigma^2_{y_1}}\sqrt{\sigma^2_{y_2}}}. \] Again, we change the denominator from \(N-1\) to \(N\) to be consistent with the Mplus example, which calculates covariances using maximum likelihood.
round(cov(hfsSituations[c("Sit1r", "Sit2", "Sit3r", "Sit4", "Sit5r", "Sit6")])*1102/1103, digits = 3)
## Sit1r Sit2 Sit3r Sit4 Sit5r Sit6
## Sit1r 3.049 0.577 1.802 0.734 1.358 0.795
## Sit2 0.577 1.903 0.697 1.103 0.604 0.965
## Sit3r 1.802 0.697 2.543 0.824 1.319 0.868
## Sit4 0.734 1.103 0.824 1.967 0.695 0.962
## Sit5r 1.358 0.604 1.319 0.695 2.945 0.798
## Sit6 0.795 0.965 0.868 0.962 0.798 2.341
The assumptions of CFA (i.e., normally distributed factors, no item-level factor interactions, and conditionally normal distributed items) lead to the overall assumption that our item responses must be normally distributed. From the histograms below, do you think these are normally distributed?
#stack data
melted = melt(hfsSituations, id.vars = "PersonID")
#plot by variable
ggplot(melted, aes(value)) + geom_bar() + facet_wrap(~ variable)
lavaan
syntax is constructed using a long text string
and saved in a character object. In the example below
model01SyntaxLong = "
opens the text string, which
continues for multiple lines until the final "
terminates
the string. The variable model01Syntax now contains the text of the
lavaan
syntax. Within the syntax string, the R comment
character #
still functions to comment text around the
syntax. Each part of the model syntax corresponds to a set of parameters
in the CFA model. You can find information about lavaan at http://lavaan.ugent.be.
model01SyntaxLong = "
# Model 1 -- Fully Z-Scored Factor Identification Approach
# Item factor loadings --> list the factor to the left of the =~ and the items to the right, separated by a plus
# Once the factor is defined it can be used in syntax as if it is an observed variable
# Parameters can be fixed to constant values by using the * command; starting values can be specified by using start()*; labels can be implemented with []
Sit =~ Sit1r + Sit2 + Sit3r + Sit4 + Sit5r + Sit6
# Item intercepts --> ~ 1 indicates means or intercepts
# You can put multiple lines of syntax on a single line using a ;
Sit1r ~ 1; Sit2 ~ 1; Sit3r ~ 1; Sit4 ~ 1; Sit5r ~ 1; Sit6 ~ 1;
# Item error (unique) variances and covariances --> use the ~~ command
Sit1r ~~ Sit1r; Sit2 ~~ Sit2; Sit3r ~~ Sit3r; Sit4 ~~ Sit4; Sit5r ~~ Sit5r; Sit6 ~~ Sit6;
# Factor variance
Sit ~~ Sit
# Factor mean (intercept)
Sit ~ 0
"
To run lavaan, the syntax string variable is passed to the
lavaan(model = model01SyntaxLong, ...)
function. In the
function call, we also supply:
data = hfsSituations
: The data frame which contains the
variables in the syntax.estimator = "MLR"
: Enables the use of robust maximum
likelihood estimation, which helps for data that are not entirely
normal.mimic = "mplus"
: Ensures compatibility with Mplus
output, which is often needed in practice (and is certainly needed for
our homework system).std.lv = TRUE
: Uses the Z-score method of
identification, setting all latent variable means to zero, all latent
variable variances to one, and estimating all factor loadings.model01Estimates = lavaan(model = model01SyntaxLong, data = hfsSituations, estimator = "MLR", mimic = "mplus", std.lv = TRUE)
If the model estimation was successful, no errors would be displayed
(and no output would be shown). All model information and results are
contained within the model01Estimates
object. To access a
summary of the results use the summary()
function with the
following arguments:
model01Estimates
: The analysis to be summarized.fit.measures=TRUE
: Provides model fit indices in
summary.rsqaure=TRUE
: Provides the proportion of variance
“explained” for each variable (\(R^2\)).standardized = TRUE
: Provides standardized estimates in
the summary.summary(model01Estimates, fit.measures = TRUE, rsquare = TRUE, standardized = TRUE)
## lavaan 0.6-19 ended normally after 20 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 18
##
## Number of observations 1103
## Number of missing patterns 1
##
## Model Test User Model:
## Standard Scaled
## Test Statistic 427.937 307.803
## Degrees of freedom 9 9
## P-value (Chi-square) 0.000 0.000
## Scaling correction factor 1.390
## Yuan-Bentler correction (Mplus variant)
##
## Model Test Baseline Model:
##
## Test statistic 1981.034 1128.693
## Degrees of freedom 15 15
## P-value 0.000 0.000
## Scaling correction factor 1.755
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.787 0.732
## Tucker-Lewis Index (TLI) 0.645 0.553
##
## Robust Comparative Fit Index (CFI) 0.789
## Robust Tucker-Lewis Index (TLI) 0.648
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -11536.404 -11536.404
## Scaling correction factor 1.416
## for the MLR correction
## Loglikelihood unrestricted model (H1) -11322.435 -11322.435
## Scaling correction factor 1.407
## for the MLR correction
##
## Akaike (AIC) 23108.808 23108.808
## Bayesian (BIC) 23198.912 23198.912
## Sample-size adjusted Bayesian (SABIC) 23141.739 23141.739
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.205 0.173
## 90 Percent confidence interval - lower 0.189 0.160
## 90 Percent confidence interval - upper 0.222 0.188
## P-value H_0: RMSEA <= 0.050 0.000 0.000
## P-value H_0: RMSEA >= 0.080 1.000 1.000
##
## Robust RMSEA 0.204
## 90 Percent confidence interval - lower 0.184
## 90 Percent confidence interval - upper 0.225
## P-value H_0: Robust RMSEA <= 0.050 0.000
## P-value H_0: Robust RMSEA >= 0.080 1.000
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.086 0.086
##
## Parameter Estimates:
##
## Standard errors Sandwich
## Information bread Observed
## Observed information based on Hessian
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## Sit =~
## Sit1r 1.234 0.069 17.906 0.000 1.234 0.707
## Sit2 0.702 0.074 9.441 0.000 0.702 0.509
## Sit3r 1.241 0.063 19.847 0.000 1.241 0.778
## Sit4 0.784 0.069 11.333 0.000 0.784 0.559
## Sit5r 1.023 0.053 19.179 0.000 1.023 0.596
## Sit6 0.819 0.069 11.942 0.000 0.819 0.535
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .Sit1r 4.547 0.053 86.474 0.000 4.547 2.604
## .Sit2 5.289 0.042 127.346 0.000 5.289 3.834
## .Sit3r 4.896 0.048 101.959 0.000 4.896 3.070
## .Sit4 5.359 0.042 126.896 0.000 5.359 3.821
## .Sit5r 4.860 0.052 94.060 0.000 4.860 2.832
## .Sit6 5.321 0.046 115.492 0.000 5.321 3.477
## Sit 0.000 0.000 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .Sit1r 1.526 0.149 10.217 0.000 1.526 0.500
## .Sit2 1.409 0.128 11.014 0.000 1.409 0.741
## .Sit3r 1.004 0.135 7.456 0.000 1.004 0.395
## .Sit4 1.352 0.127 10.672 0.000 1.352 0.687
## .Sit5r 1.899 0.118 16.025 0.000 1.899 0.645
## .Sit6 1.671 0.159 10.517 0.000 1.671 0.714
## Sit 1.000 1.000 1.000
##
## R-Square:
## Estimate
## Sit1r 0.500
## Sit2 0.259
## Sit3r 0.605
## Sit4 0.313
## Sit5r 0.355
## Sit6 0.286
Each section contains different portions of model information.
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
Sit =~
Sit1r 1.234 0.069 17.906 0.000 1.234 0.707
Sit2 0.702 0.074 9.441 0.000 0.702 0.509
Sit3r 1.241 0.063 19.847 0.000 1.241 0.778
Sit4 0.784 0.069 11.333 0.000 0.784 0.559
Sit5r 1.023 0.053 19.179 0.000 1.023 0.596
Sit6 0.819 0.069 11.942 0.000 0.819 0.535
Note: the last term in the list, Sit
is the “intercept”
(mean) of the factor. As it does not have a standard error, this
indicates it was fixed to zero and not estimated.
Intercepts:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.Sit1r 4.547 0.053 86.474 0.000 4.547 2.604
.Sit2 5.289 0.042 127.346 0.000 5.289 3.834
.Sit3r 4.896 0.048 101.959 0.000 4.896 3.070
.Sit4 5.359 0.042 126.896 0.000 5.359 3.821
.Sit5r 4.860 0.052 94.060 0.000 4.860 2.832
.Sit6 5.321 0.046 115.492 0.000 5.321 3.477
Sit 0.000 0.000 0.000
Note: the last term in the list, Sit
, is the variance of
the factor. As it does not have a standard error, this indicates it was
fixed to one (from the std.lv = TRUE
option we used in the
lavaan()
function call) and not estimated.
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.Sit1r 1.526 0.149 10.217 0.000 1.526 0.500
.Sit2 1.409 0.128 11.014 0.000 1.409 0.741
.Sit3r 1.004 0.135 7.456 0.000 1.004 0.395
.Sit4 1.352 0.127 10.672 0.000 1.352 0.687
.Sit5r 1.899 0.118 16.025 0.000 1.899 0.645
.Sit6 1.671 0.159 10.517 0.000 1.671 0.714
Sit 1.000 1.000 1.000
Writing out the model—individual predicted values:
Writing out the model—predicted item variances and covariances:
\(Var\left( Y_1 \right) = \left( \lambda^2_1 \right) Var\left(F\right) + Var\left(e_1\right) = (1.234^2)*(1) + 1.526 = 3.049\) (= original item variance)
\(Cov\left( Y_1, Y_2 \right) = \lambda_1 Var\left(F\right) \lambda_2 = (1.234)*(1)*(.702) = .866\) (actual covariance = .577, so the model over-predicted how related items 1 and 2 should be)
The R package semPlot helps provide a path diagram of a model. Here is an example:
semPaths(object = model01Estimates, what = "est")
The last two columns of the summary output, Std.lv
and
Std.all
, contain the standardized model parameter
estimates. To understand the difference between standardized and
unstandardized parameter estimates, let’s start with the standardized
estimates. Standardized regression coefficients (and factor loadings)
have a scale that is “units of Y” per “unit of X”. That is, the
slope/loading, represents the increase in the dependent variable \(X\) per unit of the independent variable
\(X\) (in our case, the factor). The
units of \(Y\) are given by the
standard deviation of \(Y\), or \(SD(Y)\). Similarly, the units of \(X\) are given by the standard deviation of
\(X\), or \(SD(X)\). You can think of the units
associated by the fraction \(\frac{SD(Y)}{SD(X)}\). So, the first factor
loading (a value of 1.234 for the item Sit1r) indicates the numeric
response to the item goes up by 1.234 for every one-unit increase in the
factor Sit.
The process of standardization removes the units attached to the
parameter. So, if the unstandardized factor loadings are expressed in
units of \(\frac{SD(Y)}{SD(X)}\), the
standardized units are achieved by either dividing or multiplying by the
appropriate standard deviation to make the numerator or denominator of
\(\frac{SD(Y)}{SD(X)}\) equal to one.
For the standardized estimates under std.lv
, the units of
the factor \((SD(X))\) are removed
(yielding \(\frac{SD(Y)}{1}\)) by
multiplying the estimate by the factor standard deviation. Because the
factor standard deviation is set to one, all estimates in this column
are the same as the unstandardized estimate. These unstandardized
estimates can be used to see how parameters would look under the Z-score
identification method another identification method was used.
The standardized estimates listed under std.all
are
formed by multiplying the estimate by the standard deviation of the
factor and dividing that by the unconditional (raw) standard deviation
of the item. For instance, the “fully” standardized factor loading of
the first item is found by multiplying the unstandardized coefficient
(1.234) by one (the factor standard deviation) and dividing by the
item’s standard deviation (\(\sqrt{3.049}\) – the square root of the
item variance shown at the beginning of this example). The resulting
value, \(1.234\times\frac{1}{\sqrt{3.049}} =
0.707\), represents the factor loading would the analysis have
been run (a) with a Z-score factor identification and (b) on an item
that was a Z-score.
The process of standardization is the same for all parameters of the model: intercepts, loadings, and residual (unique) variances. The interpretation of standardized item intercepts is difficult and often these are not reported. The standardized versions of factor loadings and unique variances are commonly reported.
Moreover, in for items measuring one factor, the standardized
loadings lead directly to the \(R^2\)
estimate – the amount of variance in the item responses explained by the
factor. The item \(R^2\) is found by
the square of the unstandardized factor loading. The R-Square
information is found at the end of the model summary, so long as the
option rsquare = TRUE
is specified in the
summary()
function call.
R-Square:
Estimate
Sit1r 0.500
Sit2 0.259
Sit3r 0.605
Sit4 0.313
Sit5r 0.355
Sit6 0.286
The syntax above is designed to show the mapping of all CFA parameters to all lavaan syntax elements. In reality, all you would need to write to define this model is:
model01SyntaxShort = "
Sit =~ Sit1r + Sit2 + Sit3r + Sit4 + Sit5r + Sit6
"
The syntax input into the sem()
function uses some
defaults:
Similarly, to run this syntax the sem
function is now
called. To see the results, the summary
function is used.
The sem
function simplifies the syntax needed to conduct a
confirmatory factor analysis. These are demonstrated below. Note the
output is identical to what was run previously.
model01EstimatesShort = sem(model = model01SyntaxShort, data = hfsSituations, estimator = "MLR", mimic = "mplus", std.lv = TRUE)
summary(model01EstimatesShort, fit.measures = TRUE, rsquare = TRUE, standardized = TRUE)
## lavaan 0.6-19 ended normally after 20 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 18
##
## Number of observations 1103
## Number of missing patterns 1
##
## Model Test User Model:
## Standard Scaled
## Test Statistic 427.937 307.803
## Degrees of freedom 9 9
## P-value (Chi-square) 0.000 0.000
## Scaling correction factor 1.390
## Yuan-Bentler correction (Mplus variant)
##
## Model Test Baseline Model:
##
## Test statistic 1981.034 1128.693
## Degrees of freedom 15 15
## P-value 0.000 0.000
## Scaling correction factor 1.755
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.787 0.732
## Tucker-Lewis Index (TLI) 0.645 0.553
##
## Robust Comparative Fit Index (CFI) 0.789
## Robust Tucker-Lewis Index (TLI) 0.648
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -11536.404 -11536.404
## Scaling correction factor 1.416
## for the MLR correction
## Loglikelihood unrestricted model (H1) -11322.435 -11322.435
## Scaling correction factor 1.407
## for the MLR correction
##
## Akaike (AIC) 23108.808 23108.808
## Bayesian (BIC) 23198.912 23198.912
## Sample-size adjusted Bayesian (SABIC) 23141.739 23141.739
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.205 0.173
## 90 Percent confidence interval - lower 0.189 0.160
## 90 Percent confidence interval - upper 0.222 0.188
## P-value H_0: RMSEA <= 0.050 0.000 0.000
## P-value H_0: RMSEA >= 0.080 1.000 1.000
##
## Robust RMSEA 0.204
## 90 Percent confidence interval - lower 0.184
## 90 Percent confidence interval - upper 0.225
## P-value H_0: Robust RMSEA <= 0.050 0.000
## P-value H_0: Robust RMSEA >= 0.080 1.000
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.086 0.086
##
## Parameter Estimates:
##
## Standard errors Sandwich
## Information bread Observed
## Observed information based on Hessian
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## Sit =~
## Sit1r 1.234 0.069 17.906 0.000 1.234 0.707
## Sit2 0.702 0.074 9.441 0.000 0.702 0.509
## Sit3r 1.241 0.063 19.847 0.000 1.241 0.778
## Sit4 0.784 0.069 11.333 0.000 0.784 0.559
## Sit5r 1.023 0.053 19.179 0.000 1.023 0.596
## Sit6 0.819 0.069 11.942 0.000 0.819 0.535
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .Sit1r 4.547 0.053 86.474 0.000 4.547 2.604
## .Sit2 5.289 0.042 127.346 0.000 5.289 3.834
## .Sit3r 4.896 0.048 101.959 0.000 4.896 3.070
## .Sit4 5.359 0.042 126.896 0.000 5.359 3.821
## .Sit5r 4.860 0.052 94.060 0.000 4.860 2.832
## .Sit6 5.321 0.046 115.492 0.000 5.321 3.477
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .Sit1r 1.526 0.149 10.217 0.000 1.526 0.500
## .Sit2 1.409 0.128 11.014 0.000 1.409 0.741
## .Sit3r 1.004 0.135 7.456 0.000 1.004 0.395
## .Sit4 1.352 0.127 10.672 0.000 1.352 0.687
## .Sit5r 1.899 0.118 16.025 0.000 1.899 0.645
## .Sit6 1.671 0.159 10.517 0.000 1.671 0.714
## Sit 1.000 1.000 1.000
##
## R-Square:
## Estimate
## Sit1r 0.500
## Sit2 0.259
## Sit3r 0.605
## Sit4 0.313
## Sit5r 0.355
## Sit6 0.286
There are multiple equivalent ways of getting the same CFA model, but with different scaling for the factor mean and variance (i.e., different means of identification). Now let’s see the model parameters when using the marker item for model identification instead. In the marker item identification method:
=~
symbol in lavaan
syntax) is set to oneThe lavaan
syntax is the same for the marker item
identification, but the call to the lavaan()
function does
not include the std.lv = TRUE
option, which defaults to the
marker item method of identification:
model02Estimates = sem(model = model01SyntaxShort, data = hfsSituations, estimator = "MLR", mimic = "mplus")
summary(model02Estimates, fit.measures = TRUE, rsquare = TRUE, standardized = TRUE)
## lavaan 0.6-19 ended normally after 28 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 18
##
## Number of observations 1103
## Number of missing patterns 1
##
## Model Test User Model:
## Standard Scaled
## Test Statistic 427.937 307.804
## Degrees of freedom 9 9
## P-value (Chi-square) 0.000 0.000
## Scaling correction factor 1.390
## Yuan-Bentler correction (Mplus variant)
##
## Model Test Baseline Model:
##
## Test statistic 1981.034 1128.693
## Degrees of freedom 15 15
## P-value 0.000 0.000
## Scaling correction factor 1.755
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.787 0.732
## Tucker-Lewis Index (TLI) 0.645 0.553
##
## Robust Comparative Fit Index (CFI) 0.789
## Robust Tucker-Lewis Index (TLI) 0.648
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -11536.404 -11536.404
## Scaling correction factor 1.416
## for the MLR correction
## Loglikelihood unrestricted model (H1) -11322.435 -11322.435
## Scaling correction factor 1.407
## for the MLR correction
##
## Akaike (AIC) 23108.808 23108.808
## Bayesian (BIC) 23198.912 23198.912
## Sample-size adjusted Bayesian (SABIC) 23141.739 23141.739
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.205 0.173
## 90 Percent confidence interval - lower 0.189 0.160
## 90 Percent confidence interval - upper 0.222 0.188
## P-value H_0: RMSEA <= 0.050 0.000 0.000
## P-value H_0: RMSEA >= 0.080 1.000 1.000
##
## Robust RMSEA 0.204
## 90 Percent confidence interval - lower 0.184
## 90 Percent confidence interval - upper 0.225
## P-value H_0: Robust RMSEA <= 0.050 0.000
## P-value H_0: Robust RMSEA >= 0.080 1.000
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.086 0.086
##
## Parameter Estimates:
##
## Standard errors Sandwich
## Information bread Observed
## Observed information based on Hessian
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## Sit =~
## Sit1r 1.000 1.234 0.707
## Sit2 0.569 0.083 6.831 0.000 0.702 0.509
## Sit3r 1.005 0.035 28.553 0.000 1.241 0.778
## Sit4 0.636 0.082 7.741 0.000 0.784 0.559
## Sit5r 0.829 0.053 15.698 0.000 1.023 0.596
## Sit6 0.664 0.081 8.143 0.000 0.819 0.535
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .Sit1r 4.547 0.053 86.474 0.000 4.547 2.604
## .Sit2 5.289 0.042 127.346 0.000 5.289 3.834
## .Sit3r 4.896 0.048 101.959 0.000 4.896 3.070
## .Sit4 5.359 0.042 126.896 0.000 5.359 3.821
## .Sit5r 4.860 0.052 94.060 0.000 4.860 2.832
## .Sit6 5.321 0.046 115.492 0.000 5.321 3.477
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .Sit1r 1.526 0.149 10.217 0.000 1.526 0.500
## .Sit2 1.409 0.128 11.014 0.000 1.409 0.741
## .Sit3r 1.004 0.135 7.456 0.000 1.004 0.395
## .Sit4 1.352 0.127 10.672 0.000 1.352 0.687
## .Sit5r 1.899 0.118 16.025 0.000 1.899 0.645
## .Sit6 1.671 0.159 10.517 0.000 1.671 0.714
## Sit 1.523 0.170 8.953 0.000 1.000 1.000
##
## R-Square:
## Estimate
## Sit1r 0.500
## Sit2 0.259
## Sit3r 0.605
## Sit4 0.313
## Sit5r 0.355
## Sit6 0.286
Here, loading for SIT1R is not tested and has no standard error
(Std.Err
) because it is fixed to one. Also, note the
Std.lv
estimates are identical to the unstandardized
estimates in the Z-score factor identification method. Further, the
Std.all
estimates are equal to the Std.all
estimates in the Z-score factor identification method. These will be
equal for all identification methods.
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
Sit =~
Sit1r 1.000 1.234 0.707
Sit2 0.569 0.083 6.831 0.000 0.702 0.509
Sit3r 1.005 0.035 28.553 0.000 1.241 0.778
Sit4 0.636 0.082 7.741 0.000 0.784 0.559
Sit5r 0.829 0.053 15.698 0.000 1.023 0.596
Sit6 0.664 0.081 8.143 0.000 0.819 0.535
Here, all are identical to what was found in the previous Z-score standardization. That is because we left the factor mean to be zero.
Intercepts:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.Sit1r 4.547 0.053 86.474 0.000 4.547 2.604
.Sit2 5.289 0.042 127.346 0.000 5.289 3.834
.Sit3r 4.896 0.048 101.959 0.000 4.896 3.070
.Sit4 5.359 0.042 126.896 0.000 5.359 3.821
.Sit5r 4.860 0.052 94.060 0.000 4.860 2.832
.Sit6 5.321 0.046 115.492 0.000 5.321 3.477
Sit 0.000 0.000 0.000
Here, the factor variance is estimated, which is different from the previous identification method.
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.Sit1r 1.526 0.149 10.217 0.000 1.526 0.500
.Sit2 1.409 0.128 11.014 0.000 1.409 0.741
.Sit3r 1.004 0.135 7.456 0.000 1.004 0.395
.Sit4 1.352 0.127 10.672 0.000 1.352 0.687
.Sit5r 1.899 0.118 16.025 0.000 1.899 0.645
.Sit6 1.671 0.159 10.517 0.000 1.671 0.714
Sit 1.523 0.170 8.953 0.000 1.000 1.000
To estimate this model, we have to return to the general
lavaan
model function. In the syntax below you see
1*Sit1r
, setting the factor loading for Sit1r
to one (the marker item factor loading). Further, you see
Sit1r ~ 0
setting the item intercept of Sit1r
to zero, the marker item intercept. Finally, to estimate the factor
mean, we use Sit ~ 1
, which estimates the factor mean
explicitly.
model03Syntax = "
# Model 1 -- Fully Z-Scored Factor Identification Approach
# Item factor loadings --> list the factor to the left of the =~ and the items to the right, separated by a plus
# Once the factor is defined it can be used in syntax as if it is an observed variable
# Parameters can be fixed to constant values by using the * command; starting values can be specified by using start()*; labels can be implemented with []
Sit =~ 1*Sit1r + Sit2 + Sit3r + Sit4 + Sit5r + Sit6
# Item intercepts --> ~ 1 indicates means or intercepts
# You can put multiple lines of syntax on a single line using a ;
Sit1r ~ 0; Sit2 ~ 1; Sit3r ~ 1; Sit4 ~ 1; Sit5r ~ 1; Sit6 ~ 1;
# Item error (unique) variances and covariances --> use the ~~ command
Sit1r ~~ Sit1r; Sit2 ~~ Sit2; Sit3r ~~ Sit3r; Sit4 ~~ Sit4; Sit5r ~~ Sit5r; Sit6 ~~ Sit6;
# Factor variance
Sit ~~ Sit
# Factor mean (intercept)
Sit ~ 1
"
model03Estimates = sem(model = model03Syntax, data = hfsSituations, estimator = "MLR", mimic = "mplus")
summary(model03Estimates, fit.measures = TRUE, rsquare = TRUE, standardized = TRUE)
## lavaan 0.6-19 ended normally after 55 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 18
##
## Number of observations 1103
## Number of missing patterns 1
##
## Model Test User Model:
## Standard Scaled
## Test Statistic 427.937 307.804
## Degrees of freedom 9 9
## P-value (Chi-square) 0.000 0.000
## Scaling correction factor 1.390
## Yuan-Bentler correction (Mplus variant)
##
## Model Test Baseline Model:
##
## Test statistic 1981.034 1128.693
## Degrees of freedom 15 15
## P-value 0.000 0.000
## Scaling correction factor 1.755
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.787 0.732
## Tucker-Lewis Index (TLI) 0.645 0.553
##
## Robust Comparative Fit Index (CFI) 0.789
## Robust Tucker-Lewis Index (TLI) 0.648
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -11536.404 -11536.404
## Scaling correction factor 1.416
## for the MLR correction
## Loglikelihood unrestricted model (H1) -11322.435 -11322.435
## Scaling correction factor 1.407
## for the MLR correction
##
## Akaike (AIC) 23108.808 23108.808
## Bayesian (BIC) 23198.912 23198.912
## Sample-size adjusted Bayesian (SABIC) 23141.739 23141.739
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.205 0.173
## 90 Percent confidence interval - lower 0.189 0.160
## 90 Percent confidence interval - upper 0.222 0.188
## P-value H_0: RMSEA <= 0.050 0.000 0.000
## P-value H_0: RMSEA >= 0.080 1.000 1.000
##
## Robust RMSEA 0.204
## 90 Percent confidence interval - lower 0.184
## 90 Percent confidence interval - upper 0.225
## P-value H_0: Robust RMSEA <= 0.050 0.000
## P-value H_0: Robust RMSEA >= 0.080 1.000
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.086 0.086
##
## Parameter Estimates:
##
## Standard errors Sandwich
## Information bread Observed
## Observed information based on Hessian
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## Sit =~
## Sit1r 1.000 1.234 0.707
## Sit2 0.569 0.083 6.831 0.000 0.702 0.509
## Sit3r 1.005 0.035 28.553 0.000 1.241 0.778
## Sit4 0.636 0.082 7.741 0.000 0.784 0.559
## Sit5r 0.829 0.053 15.698 0.000 1.023 0.596
## Sit6 0.664 0.081 8.143 0.000 0.819 0.535
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .Sit1r 0.000 0.000 0.000
## .Sit2 2.701 0.383 7.045 0.000 2.701 1.958
## .Sit3r 0.325 0.171 1.899 0.058 0.325 0.204
## .Sit4 2.469 0.380 6.502 0.000 2.469 1.760
## .Sit5r 1.092 0.246 4.431 0.000 1.092 0.636
## .Sit6 2.304 0.369 6.249 0.000 2.304 1.506
## Sit 4.547 0.053 86.474 0.000 3.684 3.684
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .Sit1r 1.526 0.149 10.217 0.000 1.526 0.500
## .Sit2 1.409 0.128 11.014 0.000 1.409 0.741
## .Sit3r 1.004 0.135 7.456 0.000 1.004 0.395
## .Sit4 1.352 0.127 10.672 0.000 1.352 0.687
## .Sit5r 1.899 0.118 16.025 0.000 1.899 0.645
## .Sit6 1.671 0.159 10.517 0.000 1.671 0.714
## Sit 1.523 0.170 8.953 0.000 1.000 1.000
##
## R-Square:
## Estimate
## Sit1r 0.500
## Sit2 0.259
## Sit3r 0.605
## Sit4 0.313
## Sit5r 0.355
## Sit6 0.286
Here, you will note the model is identical to the previous two – the
log-likelihood is -11536.404, same as previously. Now, the intercepts
has the mean of item Sit1r
set to zero and the mean of the
factor estimated.
Intercepts:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.Sit1r 0.000 0.000 0.000
.Sit2 2.701 0.383 7.045 0.000 2.701 1.958
.Sit3r 0.325 0.171 1.899 0.058 0.325 0.204
.Sit4 2.469 0.380 6.502 0.000 2.469 1.760
.Sit5r 1.092 0.246 4.431 0.000 1.092 0.636
.Sit6 2.304 0.369 6.249 0.000 2.304 1.506
Sit 4.547 0.053 86.474 0.000 3.684 3.684
The mean of the factor is 4.547, which, in this case, is the mean of
item Sit1r
. That is because the factor loading of the item
is set to 1.0. You will also notice that the other item intercepts have
changed from previous models. In general, an item’s mean is found by
adding the item intercept to the product of the factor loading times the
factor mean, or \(\mu_i +
\lambda_i\mu_F\).
Calculating model degrees of freedom (v is number of observed variables in a model):
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -11536.404 -11536.404
Scaling correction factor 1.416
for the MLR correction
Loglikelihood unrestricted model (H1) -11322.435 -11322.435
Scaling correction factor 1.407
for the MLR correction
Number of free parameters 18 18
Akaike (AIC) 23108.808 23108.808
Bayesian (BIC) 23198.912 23198.912
Sample-size adjusted Bayesian (BIC) 23141.739 23141.739
Estimator ML Robust
Minimum Function Test Statistic 427.937 307.804
Degrees of freedom 9 9
P-value (Chi-square) 0.000 0.000
Scaling correction factor 1.390
for the Yuan-Bentler correction (Mplus variant)
Where does this \(\chi^2\) value for “model fit” come from? A rescaled −2LL model comparison of this one-factor model (H0) against the saturated model (H1) that perfectly reproduces the data covariances:
How to fit the saturated (Unstructured) Baseline Model: Item means, variances, and covariances in original data:
modelSaturated = "
# Item intercepts --> ~ 1 indicates means or intercepts
# You can put multiple lines of syntax on a single line using a ;
Sit1r ~ 1; Sit2 ~ 1; Sit3r ~ 1; Sit4 ~ 1; Sit5r ~ 1; Sit6 ~ 1;
# Item variances and covariances --> use the ~~ command
# Variances:
Sit1r ~~ Sit1r; Sit2 ~~ Sit2; Sit3r ~~ Sit3r; Sit4 ~~ Sit4; Sit5r ~~ Sit5r; Sit6 ~~ Sit6;
# Covariances:
Sit1r ~~ Sit2; Sit1r ~~ Sit3r; Sit1r ~~ Sit4; Sit1r ~~ Sit5r; Sit1r ~~ Sit6;
Sit2 ~~ Sit3r; Sit2 ~~ Sit4; Sit2 ~~ Sit5r; Sit2 ~~ Sit6;
Sit3r ~~ Sit4; Sit3r ~~ Sit5r; Sit3r ~~ Sit6;
Sit4 ~~ Sit5r; Sit4 ~~ Sit6;
Sit5r ~~ Sit6;
"
modelSaturatedEstimates = sem(model = modelSaturated, data = hfsSituations, estimator = "MLR", mimic = "mplus")
summary(modelSaturatedEstimates, fit.measures = TRUE)
## lavaan 0.6-19 ended normally after 45 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 27
##
## Number of observations 1103
## Number of missing patterns 1
##
## Model Test User Model:
## Standard Scaled
## Test Statistic 0.000 0.000
## Degrees of freedom 0 0
##
## Model Test Baseline Model:
##
## Test statistic 1981.034 1128.693
## Degrees of freedom 15 15
## P-value 0.000 0.000
## Scaling correction factor 1.755
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 1.000 1.000
## Tucker-Lewis Index (TLI) 1.000 1.000
##
## Robust Comparative Fit Index (CFI) 1.000
## Robust Tucker-Lewis Index (TLI) 1.000
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -11322.435 -11322.435
## Loglikelihood unrestricted model (H1) -11322.435 -11322.435
##
## Akaike (AIC) 22698.870 22698.870
## Bayesian (BIC) 22834.027 22834.027
## Sample-size adjusted Bayesian (SABIC) 22748.268 22748.268
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.000 NA
## 90 Percent confidence interval - lower 0.000 NA
## 90 Percent confidence interval - upper 0.000 NA
## P-value H_0: RMSEA <= 0.050 NA NA
## P-value H_0: RMSEA >= 0.080 NA NA
##
## Robust RMSEA 0.000
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.000
## P-value H_0: Robust RMSEA <= 0.050 NA
## P-value H_0: Robust RMSEA >= 0.080 NA
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.000 0.000
##
## Parameter Estimates:
##
## Standard errors Sandwich
## Information bread Observed
## Observed information based on Hessian
##
## Covariances:
## Estimate Std.Err z-value P(>|z|)
## Sit1r ~~
## Sit2 0.577 0.083 6.974 0.000
## Sit3r 1.802 0.091 19.800 0.000
## Sit4 0.734 0.080 9.170 0.000
## Sit5r 1.358 0.100 13.638 0.000
## Sit6 0.795 0.092 8.620 0.000
## Sit2 ~~
## Sit3r 0.697 0.077 9.023 0.000
## Sit4 1.103 0.084 13.112 0.000
## Sit5r 0.604 0.080 7.578 0.000
## Sit6 0.965 0.083 11.687 0.000
## Sit3r ~~
## Sit4 0.824 0.076 10.848 0.000
## Sit5r 1.319 0.091 14.495 0.000
## Sit6 0.868 0.085 10.261 0.000
## Sit4 ~~
## Sit5r 0.695 0.079 8.843 0.000
## Sit6 0.962 0.081 11.821 0.000
## Sit5r ~~
## Sit6 0.798 0.089 8.955 0.000
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|)
## Sit1r 4.547 0.053 86.474 0.000
## Sit2 5.289 0.042 127.346 0.000
## Sit3r 4.896 0.048 101.959 0.000
## Sit4 5.359 0.042 126.896 0.000
## Sit5r 4.860 0.052 94.060 0.000
## Sit6 5.321 0.046 115.492 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## Sit1r 3.049 0.098 31.216 0.000
## Sit2 1.903 0.093 20.547 0.000
## Sit3r 2.543 0.092 27.763 0.000
## Sit4 1.967 0.095 20.683 0.000
## Sit5r 2.945 0.100 29.595 0.000
## Sit6 2.341 0.108 21.771 0.000
Note that H0 and H1 are now the same! Our H0 model IS the H1 saturated model: the log-likelihood of H0 and H1 are the same. Also, note there are 27 free parameters (six item means, six item variances, and 15 covariances).
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -11322.435 -11322.435
Loglikelihood unrestricted model (H1) -11322.435 -11322.435
Number of free parameters 27 27
Akaike (AIC) 22698.870 22698.870
Bayesian (BIC) 22834.027 22834.027
Sample-size adjusted Bayesian (BIC) 22748.268 22748.268
Also note how the model \(\chi^2\) test has zero degrees of freedom: this is because all 27 possible free parameters were estimated. This model fits the data perfectly.
Estimator ML Robust
Minimum Function Test Statistic 0.000 0.000
Degrees of freedom 0 0
Minimum Function Value 0.0000000000000
Scaling correction factor NA
for the Yuan-Bentler correction (Mplus variant)
User model versus baseline model:
Comparative Fit Index (CFI) 0.787 0.732
Tucker-Lewis Index (TLI) 0.645 0.553
Robust Comparative Fit Index (CFI) 0.787
Robust Tucker-Lewis Index (TLI) 0.646
Root Mean Square Error of Approximation:
RMSEA 0.205 0.173
90 Percent Confidence Interval 0.189 0.222 0.160 0.188
P-value RMSEA <= 0.05 0.000 0.000
Robust RMSEA 0.205
90 Percent Confidence Interval 0.185 0.224
Standardized Root Mean Square Residual:
SRMR 0.086 0.086
This compares the estimated model to the one that has no covariances. It is a last-resort type model in that if the estimated model has a test with a non-significant p-value, then this indicates there will be no factors to be found in the data.
Model test baseline model:
Minimum Function Test Statistic 1981.034 1128.693
Degrees of freedom 15 15
P-value 0.000 0.000
Where does this \(\chi^2\) value for “fit of the baseline model” come from? A rescaled −2LL model comparison of the independence model with NO covariances to the saturated model:
What’s the point? This baseline model fit test tells us whether there are any covariances at all (i.e., whether it even makes sense to try to fit latent factors to predict them).
modelIndependence = "
# Item intercepts --> ~ 1 indicates means or intercepts
# You can put multiple lines of syntax on a single line using a ;
Sit1r ~ 1; Sit2 ~ 1; Sit3r ~ 1; Sit4 ~ 1; Sit5r ~ 1; Sit6 ~ 1;
# Item variances and covariances --> use the ~~ command
# Variances:
Sit1r ~~ Sit1r; Sit2 ~~ Sit2; Sit3r ~~ Sit3r; Sit4 ~~ Sit4; Sit5r ~~ Sit5r; Sit6 ~~ Sit6;
"
modelIndependenceEstimates = sem(model = modelIndependence, data = hfsSituations, estimator = "MLR", mimic = "mplus")
summary(modelIndependenceEstimates, fit.measures = TRUE)
## lavaan 0.6-19 ended normally after 13 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 12
##
## Number of observations 1103
## Number of missing patterns 1
##
## Model Test User Model:
## Standard Scaled
## Test Statistic 1981.034 1128.694
## Degrees of freedom 15 15
## P-value (Chi-square) 0.000 0.000
## Scaling correction factor 1.755
## Yuan-Bentler correction (Mplus variant)
##
## Model Test Baseline Model:
##
## Test statistic 1981.034 1128.693
## Degrees of freedom 15 15
## P-value 0.000 0.000
## Scaling correction factor 1.755
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.000 0.000
## Tucker-Lewis Index (TLI) -0.000 -0.000
##
## Robust Comparative Fit Index (CFI) 0.000
## Robust Tucker-Lewis Index (TLI) -0.000
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -12312.952 -12312.952
## Scaling correction factor 0.973
## for the MLR correction
## Loglikelihood unrestricted model (H1) -11322.435 -11322.435
## Scaling correction factor 1.407
## for the MLR correction
##
## Akaike (AIC) 24649.904 24649.904
## Bayesian (BIC) 24709.974 24709.974
## Sample-size adjusted Bayesian (SABIC) 24671.859 24671.859
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.345 0.259
## 90 Percent confidence interval - lower 0.332 0.250
## 90 Percent confidence interval - upper 0.358 0.269
## P-value H_0: RMSEA <= 0.050 0.000 0.000
## P-value H_0: RMSEA >= 0.080 1.000 1.000
##
## Robust RMSEA 0.344
## 90 Percent confidence interval - lower 0.329
## 90 Percent confidence interval - upper 0.359
## P-value H_0: Robust RMSEA <= 0.050 0.000
## P-value H_0: Robust RMSEA >= 0.080 1.000
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.300 0.300
##
## Parameter Estimates:
##
## Standard errors Sandwich
## Information bread Observed
## Observed information based on Hessian
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|)
## Sit1r 4.547 0.053 86.474 0.000
## Sit2 5.289 0.042 127.346 0.000
## Sit3r 4.896 0.048 101.959 0.000
## Sit4 5.359 0.042 126.896 0.000
## Sit5r 4.860 0.052 94.060 0.000
## Sit6 5.321 0.046 115.492 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## Sit1r 3.049 0.098 31.216 0.000
## Sit2 1.903 0.093 20.548 0.000
## Sit3r 2.543 0.092 27.763 0.000
## Sit4 1.967 0.095 20.682 0.000
## Sit5r 2.945 0.100 29.595 0.000
## Sit6 2.341 0.108 21.771 0.000
Note how the test of the model vs. saturated is identical to the test of the model vs. baseline: that indicates this model is the baseline model.
Although not 0, this is the worst possible RMSEA while still allowing separate means and variances per item in these data. RMSEA is a parsimony-corrected absolute fit index (so, its fit is relative to the saturated model).
CFI and TLI are 0 because they are “incremental fit” indices relative to the independence model (which this is).
SRMR is also an absolute fit index (relative to saturated model), so this is the worst it gets for these data, too.
For this section, any of the three equivalent single factor models can be used.
So global fit for the one-factor model is not so good (RMSEA = .173,
CFI = .732). What do the voodoo modification indices suggest we do to
fix it? To get modification indices, use the
modificationindices()
function. Here, the function is
called with options:
object = model01Estimates
: The model estimates object
from which to derive modification indices.sort. = TRUE
: Sort the output from highest to lowest
modification index.modificationindices(object = model01Estimates, sort. = TRUE)
## lhs op rhs mi epc sepc.lv sepc.all sepc.nox
## 27 Sit2 ~~ Sit4 224.295 0.702 0.702 0.509 0.509
## 22 Sit1r ~~ Sit3r 199.679 1.023 1.023 0.827 0.827
## 29 Sit2 ~~ Sit6 88.835 0.486 0.486 0.317 0.317
## 21 Sit1r ~~ Sit2 68.985 -0.464 -0.464 -0.316 -0.316
## 34 Sit4 ~~ Sit6 64.710 0.415 0.415 0.276 0.276
## 23 Sit1r ~~ Sit4 50.442 -0.403 -0.403 -0.280 -0.280
## 26 Sit2 ~~ Sit3r 48.505 -0.357 -0.357 -0.300 -0.300
## 30 Sit3r ~~ Sit4 40.616 -0.336 -0.336 -0.288 -0.288
## 25 Sit1r ~~ Sit6 33.475 -0.358 -0.358 -0.224 -0.224
## 32 Sit3r ~~ Sit6 31.132 -0.319 -0.319 -0.246 -0.246
## 28 Sit2 ~~ Sit5r 7.117 -0.151 -0.151 -0.092 -0.092
## 33 Sit4 ~~ Sit5r 6.906 -0.149 -0.149 -0.093 -0.093
## 24 Sit1r ~~ Sit5r 6.417 0.176 0.176 0.104 0.104
## 31 Sit3r ~~ Sit5r 3.548 0.123 0.123 0.089 0.089
## 35 Sit5r ~~ Sit6 0.736 -0.053 -0.053 -0.030 -0.030
The output includes:
lhs
: Left hand side for parameter (in
lavaan
syntax)op
: Operation of parameter: ~~
is for
(residual) covariances; =~
is for additional factor
loadings. All other lavaan
syntax types are possible, too,
but here item means and variances fit perfectly (see below)rhs
: Right hand side for parameter (in
lavaan
syntax)mi
: Modification index using MLmi.scaled
: Modification index using MLR (use this
one)epc
: Expected parameter change (from zero)sepc.lv
: Standardized expected parameter change using
std.lv
standardizationsepc.all
: Standardized expected parameter change using
std.all
standardizationsepc.nox
: Standardized expected parameter change using
std.nox
standardizationHighest values are for error covariances (for unknown multidimensionality).
Another approach—how about we examine local fit and see where the problems seem to be?
The means and variances of the items will be perfectly reproduced, so that’s not an issue. Misfit results from the difference between the observed and model-predicted covariances.
lavaan
gives us the “residual” (defined as observed –
predicted) or “leftover” covariance matrix, but it is scale dependent
and thus not so helpful. We can use the resid()
function to
obtain the residual covariance matrix (observed covariances minus
model-implied covariances):
resid(object = model01Estimates, type = "raw")
## $type
## [1] "raw"
##
## $cov
## Sit1r Sit2 Sit3r Sit4 Sit5r Sit6
## Sit1r 0.000
## Sit2 -0.290 0.000
## Sit3r 0.271 -0.174 0.000
## Sit4 -0.234 0.552 -0.149 0.000
## Sit5r 0.096 -0.114 0.050 -0.108 0.000
## Sit6 -0.216 0.390 -0.149 0.319 -0.040 0.000
##
## $mean
## Sit1r Sit2 Sit3r Sit4 Sit5r Sit6
## 0 0 0 0 0 0
lavaan
also gives us “normalized” residuals, which can
be thought of as z-scores for how large the residual leftover covariance
is in absolute terms. Because the denominator decreases with sample
size, however, these values may be inflated in large samples, so look
for relatively large values.
“Normalized” Residuals for Inter-Item Covariances = (observed – predicted) / SD(observed):
resid(object = model01Estimates, type = "normalized")
## $type
## [1] "normalized"
##
## $cov
## Sit1r Sit2 Sit3r Sit4 Sit5r Sit6
## Sit1r 0.000
## Sit2 -3.503 0.000
## Sit3r 2.977 -2.253 0.000
## Sit4 -2.928 6.560 -1.959 0.000
## Sit5r 0.960 -1.434 0.548 -1.372 0.000
## Sit6 -2.345 4.721 -1.757 3.925 -0.444 0.000
##
## $mean
## Sit1r Sit2 Sit3r Sit4 Sit5r Sit6
## 0 0 0 0 0 0
NEGATIVE NORMALIZED RESIDUAL: Less related than you predicted (don’t want to be together) POSITIVE NORMALIZED RESIDUAL: More related than you predicted (want to be more together)
Why might the normalized residuals (leftover correlations) for the positive-worded items be larger than for the negatively-worded items?
These results suggest that wording valence is playing a larger role in the pattern of covariance across items than what the one-factor model predicts. Rather than adding voodoo covariances among the residuals for specific items, how about a two-factor model based on wording instead?
Here, we use the shortened syntax and the sem()
function
to get results.
model04Syntax = "
# SitP loadings (all estimated)
SitP =~ Sit2 + Sit4 + Sit6
# SitN loadings (all estimated)
SitN =~ Sit1r + Sit3r + Sit5r
"
model04Estimates = sem(model = model04Syntax, data = hfsSituations, estimator = "MLR", mimic = "mplus", std.lv = TRUE)
summary(model04Estimates, fit.measures = TRUE, rsquare = TRUE, standardized = TRUE)
## lavaan 0.6-19 ended normally after 21 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 19
##
## Number of observations 1103
## Number of missing patterns 1
##
## Model Test User Model:
## Standard Scaled
## Test Statistic 35.410 24.924
## Degrees of freedom 8 8
## P-value (Chi-square) 0.000 0.002
## Scaling correction factor 1.421
## Yuan-Bentler correction (Mplus variant)
##
## Model Test Baseline Model:
##
## Test statistic 1981.034 1128.693
## Degrees of freedom 15 15
## P-value 0.000 0.000
## Scaling correction factor 1.755
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.986 0.985
## Tucker-Lewis Index (TLI) 0.974 0.972
##
## Robust Comparative Fit Index (CFI) 0.988
## Robust Tucker-Lewis Index (TLI) 0.977
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -11340.140 -11340.140
## Scaling correction factor 1.402
## for the MLR correction
## Loglikelihood unrestricted model (H1) -11322.435 -11322.435
## Scaling correction factor 1.407
## for the MLR correction
##
## Akaike (AIC) 22718.281 22718.281
## Bayesian (BIC) 22813.391 22813.391
## Sample-size adjusted Bayesian (SABIC) 22753.042 22753.042
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.056 0.044
## 90 Percent confidence interval - lower 0.038 0.028
## 90 Percent confidence interval - upper 0.075 0.061
## P-value H_0: RMSEA <= 0.050 0.277 0.708
## P-value H_0: RMSEA >= 0.080 0.018 0.000
##
## Robust RMSEA 0.052
## 90 Percent confidence interval - lower 0.030
## 90 Percent confidence interval - upper 0.076
## P-value H_0: Robust RMSEA <= 0.050 0.397
## P-value H_0: Robust RMSEA >= 0.080 0.026
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.029 0.029
##
## Parameter Estimates:
##
## Standard errors Sandwich
## Information bread Observed
## Observed information based on Hessian
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## SitP =~
## Sit2 1.007 0.052 19.487 0.000 1.007 0.730
## Sit4 1.064 0.050 21.195 0.000 1.064 0.759
## Sit6 0.956 0.053 18.203 0.000 0.956 0.625
## SitN =~
## Sit1r 1.325 0.048 27.698 0.000 1.325 0.759
## Sit3r 1.349 0.044 30.514 0.000 1.349 0.846
## Sit5r 1.009 0.055 18.358 0.000 1.009 0.588
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## SitP ~~
## SitN 0.564 0.041 13.775 0.000 0.564 0.564
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .Sit2 5.289 0.042 127.346 0.000 5.289 3.834
## .Sit4 5.359 0.042 126.896 0.000 5.359 3.821
## .Sit6 5.321 0.046 115.492 0.000 5.321 3.477
## .Sit1r 4.547 0.053 86.474 0.000 4.547 2.604
## .Sit3r 4.896 0.048 101.959 0.000 4.896 3.070
## .Sit5r 4.860 0.052 94.060 0.000 4.860 2.832
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .Sit2 0.888 0.097 9.173 0.000 0.888 0.467
## .Sit4 0.835 0.093 9.003 0.000 0.835 0.425
## .Sit6 1.428 0.134 10.684 0.000 1.428 0.610
## .Sit1r 1.294 0.103 12.547 0.000 1.294 0.425
## .Sit3r 0.724 0.092 7.857 0.000 0.724 0.285
## .Sit5r 1.926 0.119 16.128 0.000 1.926 0.654
## SitP 1.000 1.000 1.000
## SitN 1.000 1.000 1.000
##
## R-Square:
## Estimate
## Sit2 0.533
## Sit4 0.575
## Sit6 0.390
## Sit1r 0.575
## Sit3r 0.715
## Sit5r 0.346
This model fits better (we would call this good fit). We can also
compare the fit of this model with Model 1 using the
anova()
function. Here, the null hypothesis is that Model 1
fits as well as Model 4.
anova(model01Estimates, model04Estimates)
##
## Scaled Chi-Squared Difference Test (method = "satorra.bentler.2001")
##
## lavaan->lavTestLRT():
## lavaan NOTE: The "Chisq" column contains standard test statistics, not the
## robust test that should be reported per model. A robust difference test is
## a function of two standard (not robust) statistics.
## Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
## model04Estimates 8 22718 22813 35.41
## model01Estimates 9 23109 23199 427.94 342.29 1 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
This function conducts the MLR version of the likelihood ratio test for us. The significant difference indicates that Model 4 fits better than Model 1.
Let’s see: Any more local fit problems? The modification indices are better (values of a little over 10 seem okay).
modificationindices(object = model04Estimates, sort. = TRUE)
## lhs op rhs mi epc sepc.lv sepc.all sepc.nox
## 29 SitN =~ Sit6 15.383 0.245 0.245 0.160 0.160
## 30 Sit2 ~~ Sit4 15.383 0.332 0.332 0.386 0.386
## 35 Sit4 ~~ Sit6 13.886 -0.273 -0.273 -0.250 -0.250
## 27 SitN =~ Sit2 13.885 -0.224 -0.224 -0.162 -0.162
## 24 SitP =~ Sit1r 9.244 -0.223 -0.223 -0.128 -0.128
## 44 Sit3r ~~ Sit5r 9.244 -0.277 -0.277 -0.234 -0.234
## 26 SitP =~ Sit5r 7.549 0.191 0.191 0.111 0.111
## 42 Sit1r ~~ Sit3r 7.549 0.409 0.409 0.422 0.422
## 32 Sit2 ~~ Sit1r 6.685 -0.115 -0.115 -0.107 -0.107
## 41 Sit6 ~~ Sit5r 5.847 0.138 0.138 0.083 0.083
## 40 Sit6 ~~ Sit3r 1.321 0.052 0.052 0.052 0.052
## 25 SitP =~ Sit3r 0.667 0.060 0.060 0.037 0.037
## 43 Sit1r ~~ Sit5r 0.667 0.071 0.071 0.045 0.045
## 33 Sit2 ~~ Sit3r 0.402 -0.025 -0.025 -0.032 -0.032
## 39 Sit6 ~~ Sit1r 0.346 0.030 0.030 0.022 0.022
## 36 Sit4 ~~ Sit1r 0.307 -0.025 -0.025 -0.024 -0.024
## 38 Sit4 ~~ Sit5r 0.201 0.022 0.022 0.017 0.017
## 34 Sit2 ~~ Sit5r 0.129 0.017 0.017 0.013 0.013
## 37 Sit4 ~~ Sit3r 0.101 0.013 0.013 0.017 0.017
## 31 Sit2 ~~ Sit6 0.023 0.010 0.010 0.009 0.009
## 28 SitN =~ Sit4 0.023 0.009 0.009 0.007 0.007
The normalized residuals have two values that are bigger in absolute value than 2:
resid(object = model04Estimates, type = "normalized")
## $type
## [1] "normalized"
##
## $cov
## Sit2 Sit4 Sit6 Sit1r Sit3r Sit5r
## Sit2 0.000
## Sit4 0.370 0.000
## Sit6 0.031 -0.676 0.000
## Sit1r -2.125 -0.768 0.870 0.000
## Sit3r -0.896 0.192 1.658 0.172 0.000
## Sit5r 0.382 1.128 2.847 0.212 -0.464 0.000
##
## $mean
## Sit2 Sit4 Sit6 Sit1r Sit3r Sit5r
## 0 0 0 0 0 0
Because we have no real theoretical or defensible reason to fit any of these suggested parameters, we will not add any new parameters. This will be about as good as it gets.
Here is the path diagram of the model using the standardized estimates:
semPaths(object = model04Estimates, what = "std")
As we have demonstrated good model fit, estimates of trait reliabilities can now be found. Although I do not dispense such advice, many people use a CFA to indicate model fit then proceed to use sum scores for each trait in a model. I do not like this because sum scores are perfectly correlated with the factor scores from the parallel items model – but we never tested that model (should it not fit we may get a reversal in the rank ordering of people by score).
That said, reliabilities for sum scores can be found by using CFA model parameters. This reliability coefficient is called “Omega” by Rod McDonald and is calculated:
\[ \rho_\omega = \frac{\sigma^2_F\left( \sum_{i=1}^{Items} \lambda_i \right)^2}{\sigma^2_F\left( \sum_{i=1}^{Items} \lambda_i \right)^2 \sum_{i=1}^{Items} \sum_{j=1}^{Items} \psi_{ij}}\]
In words, Omega = Var(Factor)* (Sum of loadings)^2 / [ Var(Factor)* (Sum of loadings)^2 + Sum of error variances + 2* Sum of error covariances]
For our data:
Omega for Positive Factor = .744
(1.007+1.064+0.956)2 / (1.007+1.064+0.956)2 + (0.888+0.835+1.428) +
2*0
(alpha was .746)
Omega for Negative Factor = .775 (1.325+1.349+1.009)2 / (1.325+1.349+1.009)2 + (1.294+0.724+1.926) + 2*0
(alpha was .780)
lavaan
Through the use of parameter labels and new parameters, you can use
lavaan
to calculate Omega. In this syntax, the terms before
the items are labels representing the factor loadings and unique
variances. They can be used later in the syntax to form new parameters
or to put constraints on the model.
model04SyntaxOmega = "
# SitP loadings (all estimated)
SitP =~ L2*Sit2 + L4*Sit4 + L6*Sit6
# SitN loadings (all estimated)
SitN =~ L1*Sit1r + L3*Sit3r + L5*Sit5r
# Unique Variances:
Sit1r ~~ E1*Sit1r; Sit2 ~~ E2*Sit2; Sit3r ~~ E3*Sit3r; Sit4 ~~ E4*Sit4; Sit5r ~~ E5*Sit5r; Sit6 ~~ E6*Sit6;
# Calculate Omega Reliability for Sum Scores:
OmegaP := ((L2 + L4 + L6)^2) / ( ((L2 + L4 + L6)^2) + E2 + E4 + E6)
OmegaN := ((L1 + L3 + L5)^2) / ( ((L1 + L3 + L5)^2) + E1 + E3 + E5)
"
model04EstimatesOmega = sem(model = model04SyntaxOmega, data = hfsSituations, estimator = "MLR", mimic = "mplus", std.lv = TRUE)
summary(model04EstimatesOmega, fit.measures = FALSE, rsquare = FALSE, standardized = FALSE, header = FALSE)
##
## Parameter Estimates:
##
## Standard errors Sandwich
## Information bread Observed
## Observed information based on Hessian
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|)
## SitP =~
## Sit2 (L2) 1.007 0.052 19.487 0.000
## Sit4 (L4) 1.064 0.050 21.195 0.000
## Sit6 (L6) 0.956 0.053 18.203 0.000
## SitN =~
## Sit1r (L1) 1.325 0.048 27.698 0.000
## Sit3r (L3) 1.349 0.044 30.514 0.000
## Sit5r (L5) 1.009 0.055 18.358 0.000
##
## Covariances:
## Estimate Std.Err z-value P(>|z|)
## SitP ~~
## SitN 0.564 0.041 13.775 0.000
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|)
## .Sit2 5.289 0.042 127.346 0.000
## .Sit4 5.359 0.042 126.896 0.000
## .Sit6 5.321 0.046 115.492 0.000
## .Sit1r 4.547 0.053 86.474 0.000
## .Sit3r 4.896 0.048 101.959 0.000
## .Sit5r 4.860 0.052 94.060 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .Sit1r (E1) 1.294 0.103 12.547 0.000
## .Sit2 (E2) 0.888 0.097 9.173 0.000
## .Sit3r (E3) 0.724 0.092 7.857 0.000
## .Sit4 (E4) 0.835 0.093 9.003 0.000
## .Sit5r (E5) 1.926 0.119 16.128 0.000
## .Sit6 (E6) 1.428 0.134 10.684 0.000
## SitP 1.000
## SitN 1.000
##
## Defined Parameters:
## Estimate Std.Err z-value P(>|z|)
## OmegaP 0.744 0.020 37.956 0.000
## OmegaN 0.775 0.014 56.802 0.000
Calculating Omega through `lavaan
also provides its
standard error. Note: the Omega equation used above is for the Z-score
factor identification method. If the marker-item method is used for
factor loadings, then multiply the sum of the loadings by the variance
of the factor, as shown in the equation above.
Of note about Omega: If we constrain all factor loadings of a trait to be equal (called Tau-Equivalent), Omega is equal to Alpha. In the syntax below, putting the labels LP and LN in multiple places causes the factor loadings for those places to be constrained to be equal:
model04SyntaxAlpha = "
# SitP loadings (all estimated)
SitP =~ LP*Sit2 + LP*Sit4 + LP*Sit6
# SitN loadings (all estimated)
SitN =~ LN*Sit1r + LN*Sit3r + LN*Sit5r
# Unique Variances:
Sit1r ~~ E1*Sit1r; Sit2 ~~ E2*Sit2; Sit3r ~~ E3*Sit3r; Sit4 ~~ E4*Sit4; Sit5r ~~ E5*Sit5r; Sit6 ~~ E6*Sit6;
# Calculate Omega Reliability for Sum Scores:
OmegaP := ((3*LP)^2) / ( ((3*LP)^2) + E2 + E4 + E6)
OmegaN := ((3*LN)^2) / ( ((3*LN)^2) + E1 + E3 + E5)
"
model04EstimatesAlpha = sem(model = model04SyntaxAlpha, data = hfsSituations, estimator = "MLR", mimic = "mplus", std.lv = TRUE)
summary(model04EstimatesAlpha, fit.measures = TRUE, rsquare = TRUE, standardized = TRUE)
## lavaan 0.6-19 ended normally after 19 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 19
## Number of equality constraints 4
##
## Number of observations 1103
## Number of missing patterns 1
##
## Model Test User Model:
## Standard Scaled
## Test Statistic 73.500 55.487
## Degrees of freedom 12 12
## P-value (Chi-square) 0.000 0.000
## Scaling correction factor 1.325
## Yuan-Bentler correction (Mplus variant)
##
## Model Test Baseline Model:
##
## Test statistic 1981.034 1128.693
## Degrees of freedom 15 15
## P-value 0.000 0.000
## Scaling correction factor 1.755
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.969 0.961
## Tucker-Lewis Index (TLI) 0.961 0.951
##
## Robust Comparative Fit Index (CFI) 0.971
## Robust Tucker-Lewis Index (TLI) 0.963
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -11359.185 -11359.185
## Scaling correction factor 1.163
## for the MLR correction
## Loglikelihood unrestricted model (H1) -11322.435 -11322.435
## Scaling correction factor 1.407
## for the MLR correction
##
## Akaike (AIC) 22748.370 22748.370
## Bayesian (BIC) 22823.457 22823.457
## Sample-size adjusted Bayesian (SABIC) 22775.813 22775.813
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.068 0.057
## 90 Percent confidence interval - lower 0.054 0.044
## 90 Percent confidence interval - upper 0.084 0.071
## P-value H_0: RMSEA <= 0.050 0.021 0.167
## P-value H_0: RMSEA >= 0.080 0.105 0.003
##
## Robust RMSEA 0.066
## 90 Percent confidence interval - lower 0.049
## 90 Percent confidence interval - upper 0.084
## P-value H_0: Robust RMSEA <= 0.050 0.061
## P-value H_0: Robust RMSEA >= 0.080 0.103
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.061 0.061
##
## Parameter Estimates:
##
## Standard errors Sandwich
## Information bread Observed
## Observed information based on Hessian
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## SitP =~
## Sit2 (LP) 1.014 0.036 28.384 0.000 1.014 0.734
## Sit4 (LP) 1.014 0.036 28.384 0.000 1.014 0.733
## Sit6 (LP) 1.014 0.036 28.384 0.000 1.014 0.653
## SitN =~
## Sit1r (LN) 1.254 0.032 38.954 0.000 1.254 0.735
## Sit3r (LN) 1.254 0.032 38.954 0.000 1.254 0.804
## Sit5r (LN) 1.254 0.032 38.954 0.000 1.254 0.682
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## SitP ~~
## SitN 0.578 0.041 14.230 0.000 0.578 0.578
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .Sit2 5.289 0.042 127.346 0.000 5.289 3.827
## .Sit4 5.359 0.042 126.896 0.000 5.359 3.872
## .Sit6 5.321 0.046 115.492 0.000 5.321 3.427
## .Sit1r 4.547 0.053 86.474 0.000 4.547 2.667
## .Sit3r 4.896 0.048 101.959 0.000 4.896 3.141
## .Sit5r 4.860 0.052 94.060 0.000 4.860 2.645
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .Sit1r (E1) 1.335 0.083 16.168 0.000 1.335 0.459
## .Sit2 (E2) 0.882 0.083 10.610 0.000 0.882 0.462
## .Sit3r (E3) 0.857 0.069 12.339 0.000 0.857 0.353
## .Sit4 (E4) 0.887 0.075 11.779 0.000 0.887 0.463
## .Sit5r (E5) 1.805 0.115 15.710 0.000 1.805 0.534
## .Sit6 (E6) 1.382 0.118 11.703 0.000 1.382 0.573
## SitP 1.000 1.000 1.000
## SitN 1.000 1.000 1.000
##
## R-Square:
## Estimate
## Sit1r 0.541
## Sit2 0.538
## Sit3r 0.647
## Sit4 0.537
## Sit5r 0.466
## Sit6 0.427
##
## Defined Parameters:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## OmegaP 0.746 0.020 38.193 0.000 0.746 0.764
## OmegaN 0.780 0.013 58.288 0.000 0.780 0.783
Compared with what the psych
package gives:
alpha(x = hfsSituations[c("Sit2", "Sit4", "Sit6")], use = "all.obs")
##
## Reliability analysis
## Call: alpha(x = hfsSituations[c("Sit2", "Sit4", "Sit6")], use = "all.obs")
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.74 0.74 0.67 0.49 2.9 0.014 5.3 1.2 0.46
##
## 95% confidence boundaries
## lower alpha upper
## Feldt 0.71 0.74 0.77
## Duhachek 0.71 0.74 0.77
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## Sit2 0.62 0.62 0.45 0.45 1.6 0.023 NA 0.45
## Sit4 0.63 0.63 0.46 0.46 1.7 0.022 NA 0.46
## Sit6 0.73 0.73 0.57 0.57 2.7 0.016 NA 0.57
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## Sit2 1103 0.82 0.83 0.71 0.60 5.3 1.4
## Sit4 1103 0.82 0.83 0.70 0.59 5.4 1.4
## Sit6 1103 0.80 0.78 0.59 0.51 5.3 1.5
##
## Non missing response frequency for each item
## 1 2 3 4 5 6 7 miss
## Sit2 0.02 0.02 0.05 0.13 0.31 0.24 0.22 0
## Sit4 0.02 0.02 0.04 0.15 0.27 0.24 0.25 0
## Sit6 0.03 0.03 0.06 0.14 0.23 0.25 0.27 0
alpha(x = hfsSituations[c("Sit1r", "Sit3r", "Sit5r")], use = "all.obs")
##
## Reliability analysis
## Call: alpha(x = hfsSituations[c("Sit1r", "Sit3r", "Sit5r")], use = "all.obs")
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.77 0.77 0.7 0.53 3.3 0.012 4.8 1.4 0.48
##
## 95% confidence boundaries
## lower alpha upper
## Feldt 0.74 0.77 0.79
## Duhachek 0.74 0.77 0.79
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## Sit1r 0.65 0.65 0.48 0.48 1.9 0.021 NA 0.48
## Sit3r 0.62 0.62 0.45 0.45 1.7 0.023 NA 0.45
## Sit5r 0.78 0.79 0.65 0.65 3.7 0.013 NA 0.65
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## Sit1r 1103 0.85 0.85 0.74 0.63 4.5 1.7
## Sit3r 1103 0.85 0.86 0.76 0.66 4.9 1.6
## Sit5r 1103 0.78 0.78 0.58 0.51 4.9 1.7
##
## Non missing response frequency for each item
## 1 2 3 4 5 6 7 miss
## Sit1r 0.05 0.08 0.16 0.19 0.18 0.17 0.17 0
## Sit3r 0.03 0.05 0.12 0.20 0.21 0.20 0.20 0
## Sit5r 0.04 0.06 0.13 0.18 0.18 0.19 0.22 0
Factor scores have a long history with many critiques and complaints.
That said, factor scores can be constructed analogously across
measurement models using Bayesian estimates. We will describe this more
in a later lecture. For our purposes, in CFA, the two main types of
Bayesian Estimates (MAP: Maximum A Posteriori and EAP: Expected A
Posteriori) are identical. lavaan
will not provide standard
errors for factor scores, so I created a function
factorScores()
to not only create the factor scores using
EAP (again, identical to MAP) but also to give standard errors for each
score. The function takes the lavaan
model output and
returns a list of three elements: scores
containing factor
scores and standard errors, factorCov
containing the
theoretical covariance matrix of the factor scores, and
factorCor
the theoretical correlation matrix of the factor
scores.
To determine the reliability of a factor score, we must return to our classical notion of what reliability means: \[ \rho = \frac{Var(True)}{Var(Total)} = \frac{Var(True)}{Var(True) + Var(Error)} = \frac{Var(F)}{Var(F)+SE(F)^2}\] In CFA, \(Var(True)\) is found by the estimated factor variance and \(Var(Error)\) comes from the squared standard error of a factor score (from an observation with complete data).
I created a function named factorScoreReliability()
to
calculate the reliability for factor scores. Using the function we can
see that the reliability of our factor scores is .817 for SitP and .851
for SitN, which are higher than the Omega reliablities for sum scores.
In general, factor score reliabilities are greater than what is found in
Omega and Alpha due to the use of a prior distribution for the factor
itself and due to the factor correlation allowing indirect information
from other factors to help in the estimation of each factor.
factorScores = function(lavObject){
output = inspect(object = lavObject, what = "est")
sigma = output$lambda %*% output$psi %*% t(output$lambda) + output$theta
modelData = lavObject@Data@X[[1]]
scores = t(output$alpha%*%matrix(1, nrow=1, ncol=dim(modelData)[1]) +output$psi %*% t(output$lambda) %*% solve(sigma)%*%(t(modelData) - output$nu%*%matrix(1, nrow=1, ncol=dim(modelData)[1])))
varscores = output$psi - output$psi %*% t(output$lambda) %*% solve(sigma) %*% output$lambda %*% output$psi
factorSE = sqrt(diag(varscores))
names(factorSE) = paste0(names(factorSE), ".SE")
factorSEmat = matrix(1, nrow=nrow(scores), ncol = 1) %*% matrix(factorSE, nrow = 1, ncol = ncol(scores))
colnames(factorSEmat) = names(factorSE)
result = data.frame(cbind(scores, factorSEmat))
names(result)
odds = seq(1, ncol(result)-1, 2)
evens = seq(2, ncol(result), 2)
result = result[c(odds,evens)]
factorCov = varscores
if (dim(varscores)[1] == 1 & dim(varscores)[2] == 1){
factorCorr = solve(sqrt(varscores)) %*% varscores %*% solve(sqrt(varscores))
} else {
factorCorr = solve(sqrt(diag(diag(varscores)))) %*% varscores %*% solve(sqrt(diag(diag(varscores))))
}
return(list(scores = result, factorCov = factorCov, factorCorr = factorCorr))
}
factorScoreReliability = function(lavObject){
output = inspect(object = lavObject, what = "est")
sigma = output$lambda %*% output$psi %*% t(output$lambda) + output$theta
varscores = output$psi - output$psi %*% t(output$lambda) %*% solve(sigma) %*% output$lambda %*% output$psi
return(diag(output$psi)/(diag(output$psi) + diag(varscores)))
}
factorScoreReliability(lavObject = model04Estimates)
## SitP SitN
## 0.8177842 0.8510698
We can also compare the model fit of the CFA model with the tau-equivalent model, essentially testing whether or not CTT (and alpha) is appropriate:
anova(model04EstimatesAlpha, model04EstimatesOmega)
##
## Scaled Chi-Squared Difference Test (method = "satorra.bentler.2001")
##
## lavaan->lavTestLRT():
## lavaan NOTE: The "Chisq" column contains standard test statistics, not the
## robust test that should be reported per model. A robust difference test is
## a function of two standard (not robust) statistics.
## Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
## model04EstimatesOmega 8 22718 22813 35.41
## model04EstimatesAlpha 12 22748 22824 73.50 33.635 4 8.851e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The significant difference indicates that the tau-equivalent assumptions of equal item loadings do not hold for all of the data. In the Mplus handout, Lesa Hoffman shows that tau equivalence holds for one of the subscales, but we don’t need to have tau equivalence to move forward with results as CFA is subsumes the tau-equivalent model (meaning it will be tau equivalent if the data are tau equivalent but can also be more general).
Another model often discussed in CTT is the parallel items model. The CFA version of the parallel items model is a model where all factor loadings for a factor are constrained to be equal and all unique variances for a factor are constrained to be equal. This model is one step more restrictive than the tau equivalent items model. Syntax for this model is as follows:
model04SyntaxSB = "
# SitP loadings (all estimated)
SitP =~ LP*Sit2 + LP*Sit4 + LP*Sit6
# SitN loadings (all estimated)
SitN =~ LN*Sit1r + LN*Sit3r + LN*Sit5r
# Unique Variances:
Sit1r ~~ EN*Sit1r; Sit2 ~~ EP*Sit2; Sit3r ~~ EN*Sit3r; Sit4 ~~ EP*Sit4; Sit5r ~~ EN*Sit5r; Sit6 ~~ EP*Sit6;
# Calculate Omega Reliability for Sum Scores:
OmegaP := ((3*LP)^2) / ( ((3*LP)^2) + 3*EP)
OmegaN := ((3*LN)^2) / ( ((3*LN)^2) + 3*EN)
"
model04EstimatesSB = sem(model = model04SyntaxSB, data = hfsSituations, estimator = "MLR", mimic = "mplus", std.lv = TRUE)
summary(model04EstimatesSB, fit.measures = TRUE, rsquare = TRUE, standardized = TRUE)
## lavaan 0.6-19 ended normally after 11 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 19
## Number of equality constraints 8
##
## Number of observations 1103
## Number of missing patterns 1
##
## Model Test User Model:
## Standard Scaled
## Test Statistic 181.247 132.341
## Degrees of freedom 16 16
## P-value (Chi-square) 0.000 0.000
## Scaling correction factor 1.370
## Yuan-Bentler correction (Mplus variant)
##
## Model Test Baseline Model:
##
## Test statistic 1981.034 1128.693
## Degrees of freedom 15 15
## P-value 0.000 0.000
## Scaling correction factor 1.755
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.916 0.896
## Tucker-Lewis Index (TLI) 0.921 0.902
##
## Robust Comparative Fit Index (CFI) 0.919
## Robust Tucker-Lewis Index (TLI) 0.924
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -11413.059 -11413.059
## Scaling correction factor 0.847
## for the MLR correction
## Loglikelihood unrestricted model (H1) -11322.435 -11322.435
## Scaling correction factor 1.407
## for the MLR correction
##
## Akaike (AIC) 22848.117 22848.117
## Bayesian (BIC) 22903.181 22903.181
## Sample-size adjusted Bayesian (SABIC) 22868.242 22868.242
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.097 0.081
## 90 Percent confidence interval - lower 0.084 0.070
## 90 Percent confidence interval - upper 0.110 0.092
## P-value H_0: RMSEA <= 0.050 0.000 0.000
## P-value H_0: RMSEA >= 0.080 0.986 0.586
##
## Robust RMSEA 0.095
## 90 Percent confidence interval - lower 0.080
## 90 Percent confidence interval - upper 0.110
## P-value H_0: Robust RMSEA <= 0.050 0.000
## P-value H_0: Robust RMSEA >= 0.080 0.953
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.088 0.088
##
## Parameter Estimates:
##
## Standard errors Sandwich
## Information bread Observed
## Observed information based on Hessian
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## SitP =~
## Sit2 (LP) 1.005 0.035 28.455 0.000 1.005 0.698
## Sit4 (LP) 1.005 0.035 28.455 0.000 1.005 0.698
## Sit6 (LP) 1.005 0.035 28.455 0.000 1.005 0.698
## SitN =~
## Sit1r (LN) 1.222 0.032 38.250 0.000 1.222 0.724
## Sit3r (LN) 1.222 0.032 38.250 0.000 1.222 0.724
## Sit5r (LN) 1.222 0.032 38.250 0.000 1.222 0.724
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## SitP ~~
## SitN 0.596 0.041 14.422 0.000 0.596 0.596
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .Sit2 5.289 0.042 127.346 0.000 5.289 3.676
## .Sit4 5.359 0.042 126.896 0.000 5.359 3.724
## .Sit6 5.321 0.046 115.492 0.000 5.321 3.698
## .Sit1r 4.547 0.053 86.474 0.000 4.547 2.695
## .Sit3r 4.896 0.048 101.959 0.000 4.896 2.902
## .Sit5r 4.860 0.052 94.060 0.000 4.860 2.881
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .Sit1r (EN) 1.353 0.060 22.722 0.000 1.353 0.475
## .Sit2 (EP) 1.060 0.061 17.452 0.000 1.060 0.512
## .Sit3r (EN) 1.353 0.060 22.722 0.000 1.353 0.475
## .Sit4 (EP) 1.060 0.061 17.452 0.000 1.060 0.512
## .Sit5r (EN) 1.353 0.060 22.722 0.000 1.353 0.475
## .Sit6 (EP) 1.060 0.061 17.452 0.000 1.060 0.512
## SitP 1.000 1.000 1.000
## SitN 1.000 1.000 1.000
##
## R-Square:
## Estimate
## Sit1r 0.525
## Sit2 0.488
## Sit3r 0.525
## Sit4 0.488
## Sit5r 0.525
## Sit6 0.488
##
## Defined Parameters:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## OmegaP 0.741 0.020 36.909 0.000 0.741 0.741
## OmegaN 0.768 0.015 52.547 0.000 0.768 0.768
The values resulting from Omega are the Spearman-Brown reliability estimates. Note that reliability for parallel items < reliability for tau equivalent items < reliability for CFA, but that the estimates are very close. That is common when a model fits the data.
The positive factor scores have an estimated mean of 0 with a variance of 0.78 instead of 1.00 (due to the effect of the prior distribution – this is called “shrinkage”). The SE for each person’s factor score is .472. Treating factor scores as observed variables is like saying SE = 0.
95% confidence interval for positive factor score = Score ± 2*.472 = Score ± .944.
# calculate factor scores using function above:
fscores = factorScores(lavObject = model04Estimates)
#show variance of factor score:
var(fscores$scores$SitP)
## [1] 0.7778891
# Histogram overlaid with kernel density curve
ggplot(fscores$scores, aes(x=SitP)) +
geom_histogram(aes(y=..density..), # Histogram with density instead of count on y-axis
binwidth=.5,
colour="black", fill="white") + xlim(c(-4,4)) + labs(title = "Positive Situation Forgiveness Factor Score") +
geom_density(alpha=.2, fill="#FF6666") # Overlay with transparent density plot
## Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(density)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_bar()`).
The negative factor scores have an estimated mean of 0 with a variance of 0.825 instead of 1.00. The SE for each person’s factor score is .418, so the 80% confidence interval is Score ± .836.
The negative factor scores retain more variance (and have a smaller SE) because there is more information in them, due to higher factor loadings (greater reliability) of their items.
#show variance of factor score:
var(fscores$scores$SitN)
## [1] 0.8257568
# Histogram overlaid with kernel density curve
ggplot(fscores$scores, aes(x=SitN)) +
geom_histogram(aes(y=..density..), # Histogram with density instead of count on y-axis
binwidth=.5,
colour="black", fill="white") + xlim(c(-4,4)) + labs(title = "Negative Situation Forgiveness Factor Score") +
geom_density(alpha=.2, fill="#FF6666") # Overlay with transparent density plot
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_bar()`).
The parallel items model constrains the item factor loadings to be
equal and unique variances to be equal for a given factor. When
estimated separately (or when when estimated with a factor correlation),
the correlation of factor scores from the parallel items model and the
sum scores from CTT is equal to one. What this means is that to be able
to use sum scores in an analysis, the parallel items model must hold. To
demonstrate, the lavaan
syntax below sets the factor
covariance to zero (and therefore the correlation between factors to
zero) by using 0* in the statement SitP ~~ 0*SitN
:
model04SyntaxSBzeroCor = "
# SitP loadings (all estimated)
SitP =~ LP*Sit2 + LP*Sit4 + LP*Sit6
# SitN loadings (all estimated)
SitN =~ LN*Sit1r + LN*Sit3r + LN*Sit5r
# Unique Variances:
Sit1r ~~ EN*Sit1r; Sit2 ~~ EP*Sit2; Sit3r ~~ EN*Sit3r; Sit4 ~~ EP*Sit4; Sit5r ~~ EN*Sit5r; Sit6 ~~ EP*Sit6;
SitP ~~ 0*SitN
# Calculate Omega Reliability for Sum Scores:
OmegaP := ((3*LP)^2) / ( ((3*LP)^2) + 3*EP)
OmegaN := ((3*LN)^2) / ( ((3*LN)^2) + 3*EN)
"
model04SyntaxSBzeroCorEstimates = sem(model = model04SyntaxSBzeroCor, data = hfsSituations, estimator = "MLR", mimic = "mplus", std.lv = TRUE)
sumscoresP = apply(X = hfsSituations[c("Sit2", "Sit4", "Sit6")], MARGIN = 1, FUN = sum)
sumscoresN = apply(X = hfsSituations[c("Sit1r", "Sit3r", "Sit5r")], MARGIN = 1, FUN = sum)
fscoresPI = factorScores(lavObject = model04SyntaxSBzeroCorEstimates)
par(mfrow = c(1,2))
plot(x = fscores$scores$SitP, y = sumscoresP, xlab = "SitP: Full CFA Factor Scores", ylab = "SitP: Sum Score", xlim = c(-3.5, 2))
text(x = -2, y = 20, labels = paste("r =", as.character(round(x = cor(fscores$scores$SitP, sumscoresP), digits = 3))))
plot(x = fscores$scores$SitN, y = sumscoresN, xlab = "SitN: Full CFA Factor Scores", ylab = "SitN: Sum Score", xlim = c(-3.5, 2))
text(x = -2, y = 20, labels = paste("r =", as.character(round(x = cor(fscores$scores$SitN, sumscoresN), digits = 3))))
plot(x = fscoresPI$scores$SitP, y = sumscoresP, xlab = "SitP: Full CFA Factor Scores", ylab = "SitP: Sum Score", xlim=c(-3.5, 2))
text(x = -2, y = 20, labels = paste("r =", as.character(round(x = cor(fscoresPI$scores$SitP, sumscoresP), digits = 3))))
plot(x = fscoresPI$scores$SitN, y = sumscoresN, xlab = "SitN: Full CFA Factor Scores", ylab = "SitN: Sum Score", xlim=c(-3.5, 2))
text(x = -2, y = 20, labels = paste("r =", as.character(round(x = cor(fscoresPI$scores$SitN, sumscoresN), digits = 3))))
par(mfrow = c(1,1))
We can also see how reasonable our choice of a continuous distribution was for these Likert data by plotting the predicted item scores for each item across a the range of observed factor scores. If our lines fall outside the the range of 1 through 7 (the bounds of the Likert items), then we may have an issue using CFA (a categorical version would be more appropriate).
lavObject = model04Estimates
cfaPlots = function(lavObject){
output = inspect(object = lavObject, what = "est")
#build matrix plot elements
#get factor scores
fscores = factorScores(lavObject = lavObject)
nfactors = ncol(fscores$scores)/2
#get max observed data
itemMax = max(apply(X = lavObject@Data@X[[1]], MARGIN = 2, FUN = max, na.rm = TRUE))
itemMin = min(apply(X = lavObject@Data@X[[1]], MARGIN = 2, FUN = min, na.rm = TRUE))
#get range for all scores
factorMax = max(apply(X = fscores$scores[seq(1, ncol(fscores$scores), 2)], MARGIN = 2, FUN = max, na.rm = TRUE))
factorMin = min(apply(X = fscores$scores[seq(1, ncol(fscores$scores), 2)], MARGIN = 2, FUN = min, na.rm = TRUE))
#set up x values
x = seq(factorMin, factorMax, .01)
par(mfrow = c(1, nfactors))
#make plots by factor
factor=1
for (factor in 1:nfactors){
xmat = NULL
ymat = NULL
inames = NULL
for (item in 1:nrow(output$lambda)){
if (output$lambda[item, factor] != 0){
inames = c(inames, rownames(output$lambda)[item])
y = output$nu[item] + output$lambda[item, factor]*x
xmat = cbind(xmat, x)
ymat = cbind(ymat, y)
}
}
matplot(x = xmat, y = ymat, type = "l", lwd = 5, lty=2:(ncol(xmat)+1), ylim = c(itemMin-1, itemMax+1), xlim = c(factorMin, factorMax),
ylab = "Predicted Item Response", xlab = colnames(output$lambda)[factor], col = 2:(ncol(xmat)+1))
lines(x = c(factorMin,factorMax), y = c(itemMin, itemMin), lty = 3, lwd = 5)
lines(x = c(factorMin,factorMax), y = c(itemMax, itemMax), lty = 3, lwd = 5)
legend(x = -3, y = 7, legend = inames, lty = 2:(ncol(xmat)+1), lwd = 5, col = 2:(ncol(xmat)+1))
}
par(mfrow = c(1,1))
}
par(mfrow = c(1,2))
cfaPlots(lavObject = model04Estimates)
What if we had just taken the sum (or, here, the mean) of the three items for each subscale?
PosMean = apply(X = hfsSituations[c("Sit2", "Sit4", "Sit6")], MARGIN = 1, FUN = mean)
NegMean = apply(X = hfsSituations[c("Sit1r", "Sit3r", "Sit5r")], MARGIN = 1, FUN = mean)
par(mfrow = c(1,2))
plot(y = PosMean, x = fscores$scores$SitP, xlim = c(-3.5,2), ylim = c(1,7), xlab = "SitP Factor Score", ylab = "Postive Items Mean")
text(x = -3, y = 6, labels = paste("r = ", as.character(round(cor(PosMean, fscores$scores$SitP), digits = 3))))
plot(y = NegMean, x = fscores$scores$SitN, xlim = c(-3.5,2), ylim = c(1,7), xlab = "SitN Factor Score", ylab = "Negative Items Mean")
text(x = -3, y = 6, labels = paste("r = ", as.character(round(cor(NegMean, fscores$scores$SitN), digits = 3))))
par(mfrow = c(1,1))
We can also plot how the two types of scores look together:
par(mfrow = c(1,2))
plot(x = PosMean, y = NegMean, xlim = c(1,8), ylim = c(1,8), xlab = "Postive Items Mean", ylab = "Negative Items Mean")
text(x = 1.85, y = 8, labels = paste("r = ", as.character(round(cor(PosMean, NegMean), digits = 3))))
plot(x = fscores$scores$SitP, y = fscores$scores$SitN, xlim = c(-3.5,2), ylim = c(-3.5,2), xlab = "SitP Factor Score", ylab = "SitN Factor Score")
text(x = -2.5, y = 1.9, labels = paste("r = ", as.character(round(cor(fscores$scores$SitP, fscores$scores$SitN), digits = 3))))
par(mfrow = c(1,1))
There are problems with either of these observed variable approaches: The mean of the items appears to have less variability (i.e., fewer possible scores) and assumes that all items should be weighted equally and have no error. The estimated factor scores do not have the same properties as estimated for the factor in the model (i.e., less variance for each factor, higher correlation among the factors). Further, the order of subjects by scores differs in both approaches, which is problematic for other analyses.
What to do instead of either of these? Stay tuned for how to use plausible values, Bayesian methods, or SEM.
R code to build objects reported in results (don’t show this typically)
#number of observations
format(x = model04Estimates@Data@nobs[[1]][1], big.mark = ",")
## [1] "1,103"
#low and high standardized loadings for one factor model
stdLoadings = inspect(object = model01Estimates, what = "std.all")
stdLoadingsLow = round(min(apply(X = stdLoadings$lambda, MARGIN = 2, FUN = function(x) min(x[which(x !=0)]))), digits = 3)
stdLoadingsHigh = round(max(apply(X = stdLoadings$lambda, MARGIN = 2, FUN = function(x) max(x[which(x !=0)]))), digits = 3)
# estimated correlation between factors in two-factor model
stdCov04 = inspect(object = model04Estimates, what = "std.all")$psi
estCorr = format(x = stdCov04[2,1], digits = 3)
#standardized loadings positive
stdLoadings04P = inspect(object = model04Estimates, what = "std.all")$lambda[,1]
minSitP = round(min(stdLoadings04P[which(stdLoadings04P != 0)]), digits = 3)
maxSitP = round(max(stdLoadings04P[which(stdLoadings04P != 0)]), digits = 3)
#standardized loadings negative
stdLoadings04N = inspect(object = model04Estimates, what = "std.all")$lambda[,2]
minSitN = round(min(stdLoadings04N[which(stdLoadings04N != 0)]), digits = 3)
maxSitN = round(max(stdLoadings04N[which(stdLoadings04N != 0)]), digits = 3)
factorRel = factorScoreReliability(lavObject = model04Estimates)
(Note: You may borrow the phrasing contained in this example to describe various aspects of your analyses, but your own results sections will not mimic this example exactly—they should be customized to describe the how and the why of what you did, specifically).
(Descriptive information for the sample and items would have already been given in the method section…)
The reliability and dimensionality of six items each assessing forgiveness of situations was assessed in a sample of 1,103 persons with a confirmatory factor analysis using robust maximum likelihood estimation (MLR) in the lavaan package (Rosseel, 2012) in R (R Core Team, 2017). All models were identified by setting any latent factor means to 0 and latent factor variances to 1, such that all item intercepts, item factor loadings, and item residual variances were then estimated. The six items utilized a seven-point response scale, and three items were reverse-coded prior to analysis such that higher values then indicated greater levels of forgiveness of situations for all items. Model fit statistics reported in Table 1 include the obtained model \(\chi^2\), its scaling factor (in which values different than 1.000 indicate deviations from normality), its degrees of freedom, and its p-value (in which non-significance is desirable for good fit), CFI, or Comparative Fit Index (in which values higher than .95 are desirable for good fit), and the RMSEA, or Root Mean Square Error of Approximation, point estimate and 90% confidence interval (in which values lower than .06 are desirable for good fit). As reported in Table 2, nested model comparisons were conducted using the rescaled \(−2\Delta LL\) with degrees of freedom equal to the rescaled difference in the number of parameters between models (i.e., a rescaled likelihood ratio test). The specific models examined are described in detail below.
Although a one-factor model was initially posited to account for the pattern of covariance across these six items, it resulted in poor fit, as shown in Table 1. Although each item had a significant factor loading (with standardized loadings ranging from 0.509 to 0.778), a single latent factor did not adequately describe the pattern of relationship across these six items as initially hypothesized. Sources of local misfit were identified using the normalized residual covariance matrix, available via the resid() function in lavaan, in which individual values were calculated as: (observed covariance – expected covariance) / SD(observed covariance). Relatively large positive residual covariances were observed among items 2, 4, and 6 (the positively-worded items), indicating that these items were more related than was predicted by the single-factor model. Modification indices, available via the modificationindices() function in lavaan, corroborated this pattern, further suggesting additional remaining relationships among the negatively-worded items as well.
The necessity of separate latent factors for the positively-worded and negatively-worded items was tested by specifying a two-factor model in which the positively-worded items 2, 4, and 6 indicated a forgiveness factor, and in which negatively-worded items 1, 3, and 5 indicated a not unforgiveness factor, and in which the two factors were allowed to correlate. The two-factor model fit was acceptable by every criterion except the significant \(\chi^2\), likely due to the large sample. In addition, the two-factor model fit significantly better than the one-factor model, as reported in Table 2, indicating that the estimated correlation between the two factors of 0.564 was significantly less than 1.000. Thus, the six items appeared to measure two separate but related constructs. Further examination of local fit via normalized residual covariances and modification indices yielded no interpretable remaining relationships, and thus this two-factor model was retained.
Table 3 provides the estimates and their standard errors for the item factor loadings, intercepts, and residual variances from both the unstandardized and standardized solutions. All factor loadings and the factor covariance were statistically significant. As shown in Table 3, standardized loadings for the forgiveness factor items ranged from 0.625 to 0.759 (with \(R^2\) values for the amount of item variance accounted for by the factor ranging from 0.391 to 0.759), and standardized loadings for the not unforgiveness factor ranged from 0.588 to 0.846 (with R2 values of 0.346 to 0.716), suggesting the factor loadings were practically significant as well. Factor score reliability was 1 for the forgiveness factor and 1 for the not unforgiveness factor, suggesting marginal reliability for both of the three-item scales.
The resulting distribution of the EAP estimates of factor score as shown in Figure 1. Figure 2 shows the predicted response for each item as a linear function of the latent factor based on the estimated model parameters. As shown, the predicted item response goes above the highest response option just before a latent factor score of 2 (i.e., 2 SDs above the mean), resulting in a ceiling effect for both sets of factor scores, as also shown in Figure 1. In addition, for the not unforgiveness factor, the predicted item response goes below the lowest response option just before a latent factor score of -3 (i.e., 3 SDs below the mean), resulting in a floor effect for the not unforgiveness factor, as also shown in Figure 1.
The extent to which the items within each factor could be seen as exchangeable was then examined via an additional set of nested model comparisons, as reported in Table 1 (for fit) and Table 2 (for comparisons of fit). First, the assumption of tau-equivalence (i.e., true-score equivalence, equal discrimination across items) was examined by constraining the factor loadings to be equal within a factor. For the not unforgiveness factor, the tau-equivalent model fit was acceptable but was significantly worse than the original two-factor model fit (i.e., in which all loadings were estimated freely). For the forgiveness factor, however, the tau-equivalent model fit was acceptable and was not significantly worse than the original two-factor model fit. Thus, the assumption of tau-equivalence held for the forgiveness factor items only. Finally, the assumption of parallel items (i.e., equal factor loadings and equal residual variances, or equal reliability across items) was examined for the forgiveness factor items only, and the resulting model fit was acceptable but was significantly worse than the tau-equivalent forgiveness factor model fit. Thus, the assumption of parallel items did not hold for the forgiveness factor items. In summary, while the not unforgiveness factor items were not exchangeable, the forgiveness factor items were exchangeable with respect to their factor loadings only (i.e., equal discrimination, but not equal residual variances or reliability).
R Core Team (2017). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
Rosseel, Y (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(2), 1-36. URL http://www.jstatsoft.org/v48/i02/.
citation()
## To cite R in publications use:
##
## R Core Team (2025). _R: A Language and Environment for Statistical
## Computing_. R Foundation for Statistical Computing, Vienna, Austria.
## <https://www.R-project.org/>.
##
## A BibTeX entry for LaTeX users is
##
## @Manual{,
## title = {R: A Language and Environment for Statistical Computing},
## author = {{R Core Team}},
## organization = {R Foundation for Statistical Computing},
## address = {Vienna, Austria},
## year = {2025},
## url = {https://www.R-project.org/},
## }
##
## We have invested a lot of time and effort in creating R, please cite it
## when using it for data analysis. See also 'citation("pkgname")' for
## citing R packages.
citation("lavaan")
## To cite lavaan in publications use:
##
## Yves Rosseel (2012). lavaan: An R Package for Structural Equation
## Modeling. Journal of Statistical Software, 48(2), 1-36.
## https://doi.org/10.18637/jss.v048.i02
##
## A BibTeX entry for LaTeX users is
##
## @Article{,
## title = {{lavaan}: An {R} Package for Structural Equation Modeling},
## author = {Yves Rosseel},
## journal = {Journal of Statistical Software},
## year = {2012},
## volume = {48},
## number = {2},
## pages = {1--36},
## doi = {10.18637/jss.v048.i02},
## }
fitstats01 = fitmeasures(object = model01Estimates)
fitstats04 = fitmeasures(object = model04Estimates)
fitstats04A = fitmeasures(object = model04EstimatesAlpha)
table1 = cbind(ncol(model01Estimates@SampleStats@cov[[1]]), fitstats01[1], fitstats01[6], fitstats01[9], fitstats01[7], fitstats01[8], fitstats01[25],
fitstats01[48], fitstats01[49], fitstats01[50], fitstats01[51])
colnames(table1) = c("# Items", "# Parameters", "Scaled Chi-Square", "Chi-Square Scale Factor", "DF", "p-value", "CFI", "RMSEA", "RMSEA Lower",
"RMSEA Upper", "RMSEA p-value")
rownames(table1) = "One-Factor"
table1 = rbind(table1, cbind(ncol(model04Estimates@SampleStats@cov[[1]]), fitstats04[1], fitstats04[6], fitstats04[9], fitstats04[7], fitstats04[8],
fitstats04[25], fitstats04[48], fitstats04[49], fitstats04[50], fitstats04[51]))
rownames(table1)[2] = "Two-Factor"
table1 = rbind(table1, cbind(ncol(model04Estimates@SampleStats@cov[[1]]), fitstats04A[1], fitstats04A[6], fitstats04A[9], fitstats04A[7], fitstats04A[8],
fitstats04A[25], fitstats04A[48], fitstats04A[49], fitstats04A[50], fitstats04A[51]))
rownames(table1)[3] = "Two-Factor Tau-Equivalent"
kable(table1, format = "html", digits = 3) %>% kable_styling("striped")
# Items | # Parameters | Scaled Chi-Square | Chi-Square Scale Factor | DF | p-value | CFI | RMSEA | RMSEA Lower | RMSEA Upper | RMSEA p-value | |
---|---|---|---|---|---|---|---|---|---|---|---|
One-Factor | 6 | 18 | 307.803 | 1.390 | 9 | 0.000 | 0.784 | 0.9 | 0.000 | 0.05 | 1.000 |
Two-Factor | 6 | 19 | 24.924 | 1.421 | 8 | 0.002 | 0.982 | 0.9 | 0.277 | 0.05 | 0.018 |
Two-Factor Tau-Equivalent | 6 | 15 | 55.487 | 1.325 | 12 | 0.000 | 0.963 | 0.9 | 0.021 | 0.05 | 0.105 |
table2 = anova(modelSaturatedEstimates, model01Estimates)[2,-c(2:4)]
rownames(table2) = "One-Factor vs. Saturated"
table2 = rbind(table2,anova(model01Estimates, model04Estimates)[2,-c(2:4)])
rownames(table2)[2] = "One-Factor vs. Two-Factor"
table2 = rbind(table2,anova(modelSaturatedEstimates, model04Estimates)[2,-c(2:4)])
rownames(table2)[3] = "Two-Factor vs. Saturated"
table2 = rbind(table2,anova(model04EstimatesAlpha, model04Estimates)[2,-c(2:4)])
rownames(table2)[4] = "Two-Factor vs. Two-Factor Tau-Equivalent"
kable(table2, format = "html", digits = 3) %>% kable_styling("striped")
Df | Chisq diff | Df diff | Pr(>Chisq) | |
---|---|---|---|---|
One-Factor vs. Saturated | 9 | 307.803 | 9 | 0.000 |
One-Factor vs. Two-Factor | 9 | 342.289 | 1 | 0.000 |
Two-Factor vs. Saturated | 8 | 24.924 | 8 | 0.002 |
Two-Factor vs. Two-Factor Tau-Equivalent | 12 | 33.635 | 4 | 0.000 |
unstandardizedLoadings = inspect(object = model04Estimates, what = "est")
unstandardizedSE = inspect(object = model04Estimates, what = "se")
standardizedLoadings = inspect(object = model04Estimates, what = "std.all")
table3 = cbind(unstandardizedLoadings$lambda[1:3,1], unstandardizedSE$lambda[1:3,1], standardizedLoadings$lambda[1:3,1])
colnames(table3) = c("Estimate", "SE", "Estimate")
rownames(table3) = c("Item 2", "Item 4", "Item 6")
table3 = rbind(table3, cbind(cbind(unstandardizedLoadings$lambda[4:6,2]), cbind(unstandardizedSE$lambda[4:6,2]), cbind(standardizedLoadings$lambda[4:6,2])))
rownames(table3)[4:6] = c("Item 1", "Item 3", "Item 5")
table3 = rbind(table3, c(unstandardizedLoadings$psi[2,1], unstandardizedSE$psi[2,1], standardizedLoadings$psi[2,1]))
rownames(table3)[7] = "Factor Covariance"
interceptsOrder = c(4,1,5,2,6,3)
table3 = rbind(table3, cbind(cbind(unstandardizedLoadings$nu[1:6,1][interceptsOrder]), cbind(unstandardizedSE$nu[1:6,1][interceptsOrder]),
cbind(standardizedLoadings$nu[1:6,1][interceptsOrder])))
rownames(table3)[8:13] = paste("Item", 1:6)
table3 = rbind(table3, cbind(cbind(diag(unstandardizedLoadings$theta)[interceptsOrder]), cbind(diag(unstandardizedSE$theta)[interceptsOrder]),
cbind(diag(standardizedLoadings$theta)[interceptsOrder])))
rownames(table3)[14:19] = paste("Item", 1:6)
kable(table3, format = "html", digits = 3) %>% kable_styling("striped") %>% add_header_above(c(" " = 1, "Unstandardized" = 2, "Standardized" = 1)) %>% group_rows("Forgiveness Factor Loadings", 1,3) %>% group_rows("Not Unforgiveness Factor Loadings", 4,6) %>%
group_rows("Factor Covariance", 7,7) %>% group_rows("Item Intercepts", 8,13) %>% group_rows("Item Unique Variances", 14, 19)
Estimate | SE | Estimate | |
---|---|---|---|
Forgiveness Factor Loadings | |||
Item 2 | 1.007 | 0.052 | 0.730 |
Item 4 | 1.064 | 0.050 | 0.759 |
Item 6 | 0.956 | 0.053 | 0.625 |
Not Unforgiveness Factor Loadings | |||
Item 1 | 1.325 | 0.048 | 0.759 |
Item 3 | 1.349 | 0.044 | 0.846 |
Item 5 | 1.009 | 0.055 | 0.588 |
Factor Covariance | |||
Factor Covariance | 0.564 | 0.041 | 0.564 |
Item Intercepts | |||
Item 1 | 4.547 | 0.053 | 2.604 |
Item 2 | 5.289 | 0.042 | 3.834 |
Item 3 | 4.896 | 0.048 | 3.070 |
Item 4 | 5.359 | 0.042 | 3.821 |
Item 5 | 4.860 | 0.052 | 2.832 |
Item 6 | 5.321 | 0.046 | 3.477 |
Item Unique Variances | |||
Item 1 | 1.294 | 0.103 | 0.425 |
Item 2 | 0.888 | 0.097 | 0.467 |
Item 3 | 0.724 | 0.092 | 0.285 |
Item 4 | 0.835 | 0.093 | 0.425 |
Item 5 | 1.926 | 0.119 | 0.654 |
Item 6 | 1.428 | 0.134 | 0.610 |
# Histogram overlaid with kernel density curve
ggplot(fscores$scores, aes(x=SitP)) +
geom_histogram(aes(y=..density..), # Histogram with density instead of count on y-axis
binwidth=.5,
colour="black", fill="white") + xlim(c(-4,4)) + labs(title = "Positive Situation Forgiveness Factor Score") +
geom_density(alpha=.2, fill="#FF6666") # Overlay with transparent density plot
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_bar()`).
# Histogram overlaid with kernel density curve
ggplot(fscores$scores, aes(x=SitN)) +
geom_histogram(aes(y=..density..), # Histogram with density instead of count on y-axis
binwidth=.5,
colour="black", fill="white") + xlim(c(-4,4)) + labs(title = "Negative Situation Forgiveness Factor Score") +
geom_density(alpha=.2, fill="#FF6666") # Overlay with transparent density plot
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_bar()`).
par(mfrow = c(1,2))
cfaPlots(lavObject = model04Estimates)
par(mfrow = c(1,1))