## Highlights of Chapter

• Principles of Bayesian methods for psychometric models
• Comparisons with maximum likelihood methods

## Review of Frequentist Inference with Maximum Likelihood

• Notation:
• $$\mathbf{x}$$ represents data
• $$\boldsymbol{\theta}$$ represents model parameters
• $$p \left(\mathbf{x} \mid \boldsymbol{\theta} \right)$$ is the conditional distribution of data given model parameters
• $$L \left(\boldsymbol{\theta} \mid \mathbf{x} \right) = p \left(\mathbf{x} \mid \boldsymbol{\theta} \right)$$ is the likelihood function

The likelihood function in ML is equal to the conditional distribution of data given model parameters in Bayesian - Expresses that in ML, once data are observed, they are treated as known

$L \left(\boldsymbol{\theta} \mid \mathbf{x} \right) = p \left(\mathbf{x} \mid \boldsymbol{\theta} \right)$

## ML Specifics

• Data are random
• Parameters are fixed/known in ML
• “What are the values of the parameters that yield the highest probability of observing the values of data that were observed”
• The ML estimator is a function of the data
• With the data having a distribution, the ML estimator is now a random variable
• Standard errors give uncertainty in estimates
• Asymptotic theory gives rise to calculations of:
• Parameter estimates
• Standard errors
• The assumed shape of the sampling distributions of parameters
• The test distribution for parameters
• The confidence intervals for parameters

## ML Philosophy

• Parameter point estimates, SEs, and confidence intervals are fixed, not random
• Cannot be discussed probabilistically
• Probability (or uncertainty) enters when talking about repeated sampling of the data
• Statements of uncertainty refer to the parameter estimator, not the parameter itself
• Supports deductive arguments from parameters (general) to the data (particulars)
• Does not support inductive arguments from data (particulars) to general (parameters)

## Bayesian Specifics

• Key is that $$L \left(\boldsymbol{\theta} \mid \mathbf{x} \right) = p \left(\mathbf{x} \mid \boldsymbol{\theta} \right)$$
• Likelihood function is central to Bayesian estimation
• Treats all parameters as random
• Model parameters are assigned prior distributions
• Prior distributions are statements of uncertainty
• Posterior distribution is blend of prior and likelihood

## Bernoulli and Binomial Models

### First, Bernoulli:

Imagine $$J$$ independent and identically distributed Bernoulli random variables $$\mathbf{x} = \left(x_1, \ldots, x_J \right)$$, each with probability parameter $$\theta$$.

Conditional distribution of the data given the parameter for any one variable $$j$$:

$p\left( x_j \mid \theta \right) = \theta^{x_j}\left(1-\theta\right)^{1-x_j}$ Conditional distribution of the sample $$\mathbf{x}$$:

$p\left( \mathbf{x} \mid \theta \right) = \prod_{j=1}^{J} p\left( x_j \mid \theta \right) = \prod_{j=1}^{J} \theta^{x_j}\left(1-\theta\right)^{1-x_j}$ ### Binomial

Let $$y = \sum_{j=1}^J x_j$$ be the number of 1s in $$\mathbf{x}$$

The conditional distribution of $$y$$ given $$\theta$$ and $$J$$ is a binomial:

$p \left(y \mid \theta, J \right) = {J \choose y} \theta^y\left(1-\theta \right)^{J-y}$

## Now, the data

Suppose $$y = 7$$ and $$J=10$$. Let’s compare ML vs. Bayesian

We could derive the MLE for $$\theta$$ and find that it is:

$$\renewcommand{\hat}[1]{\widehat{#1}}$$ $\hat{\theta} = \frac{y}{J}$

## Analysis with ML:

Under ML, we know that the MLE for $$\theta$$ is $$.7$$. We can plot the likelihood function $$L(\theta \mid y) = p \left(y \mid \theta, J \right)$$ an confirm this:

plot(x = seq(0.001,.999,.001), y = dbinom(x = 7, size = 10, prob =  seq(0.001,.999,.001)), type = "l", xlab = expression(theta), ylab = expression(paste("p(y|", theta, ",J)")))
lines(x = c(.7,.7), y=c(0,dbinom(x = 7, size = 10, prob = .7)), lty = 2)

## Analysis with Bayesian:

For a Bayesian analysis, we need to plot the posterior distribution, or $$p \left(\theta \mid y, J \right) \propto p\left(y \mid \theta, J \right) p\left(\theta\right)$$.

To do so, we must pick a prior distribution for $$\theta$$, or $$p\left(\theta \right)$$. I argue that finding good priors is difficult and not made any easier by example analyses like this one. But, we’ll continue.

Let’s pick a distribution with the same sample space (or support; range of values for the data) that $$\theta$$ has. One such distribution is the uniform distribution. However, the uniform distribution doesn’t make calculations of the posterior very easy (but…we could use one!). Instead, we will use a beta distribution, or $$\theta \sim Beta\left( \alpha, \beta \right)$$. You can change alpha and beta below to alter the shape

alpha = 1
beta = 1
x = seq(.001, .999, .001)
y = dbeta(x = x, shape1 = alpha, shape2 = beta)
plot(x = x, y = y, type = "l", xlab = expression(theta), ylab = expression(paste("p(", theta, ")")))

With the Beta distribution as a prior, it turns out the posterior is also a Beta distribution, making it a conjugate prior.

The posterior is then:

$p \left(\theta \mid y, J \right) = Beta \left( \theta \mid y + \alpha, J - y + \beta \right)$

y = 7
J = 10
priorAlpha = 1
priorBeta = 1

posteriorAlpha = y + priorAlpha
posteriorBeta = J - y + priorBeta

xCoord = seq(.001, .999, .001)
yCoord = dbeta(x = xCoord, shape1 = posteriorAlpha, shape2 = posteriorBeta)

mapEstimate = xCoord[which(yCoord==max(yCoord))]
posteriorMode = (posteriorAlpha - 1)/(posteriorAlpha + posteriorBeta - 2)
posteriorMean = (posteriorAlpha)/(posteriorAlpha + posteriorBeta)
posteriorVariance = (posteriorAlpha*posteriorBeta)/(((posteriorAlpha + posteriorBeta)^2)*(posteriorAlpha + posteriorBeta + 1))
posteriorSD = sqrt(posteriorVariance)

plot(x = xCoord, y = yCoord, type = "l", xlab = expression(theta),
ylab = expression(paste("p(", theta, ")")), main = paste("Red = MAP = ", round(x = mapEstimate, digits = 2),
"; Green = EAP = ", round(x = posteriorMean, digits = 2),
"; SD = ", round(x = posteriorSD, digits = 2)))
lines(x = c(mapEstimate, mapEstimate), y = c(0, dbeta(x = mapEstimate, shape1 = posteriorAlpha, shape2 = posteriorBeta)), lty = 2, col=2)
lines(x = c(posteriorMean, posteriorMean), y = c(0, dbeta(x = posteriorMean, shape1 = posteriorAlpha, shape2 = posteriorBeta)), lty = 2, col=3)

## Comparison of all Three Functions: Prior, Likelihood, and Posterior

Now, lets put all functions onto the same plot to compare: Prior, Posterior, and Likelihood

y = 7
J = 10
priorAlpha = 6
priorBeta = 6

posteriorAlpha = y + priorAlpha
posteriorBeta = J - y + priorBeta

xCoord = seq(.001, .999, .001)
yCoordLikelihood = dbinom(x = 7, size = 10, prob =  xCoord)
yCoordPosterior = dbeta(x = xCoord, shape1 = posteriorAlpha, shape2 = posteriorBeta)
yCoordPrior = dbeta(x = xCoord, shape1 = priorAlpha, shape2 = priorBeta)

mapEstimate = xCoord[which(yCoord==max(yCoord))]
posteriorMode = (posteriorAlpha - 1)/(posteriorAlpha + posteriorBeta - 2)
posteriorMean = (posteriorAlpha)/(posteriorAlpha + posteriorBeta)
posteriorVariance = (posteriorAlpha*posteriorBeta)/(((posteriorAlpha + posteriorBeta)^2)*(posteriorAlpha + posteriorBeta + 1))
posteriorSD = sqrt(posteriorVariance)

yMat = cbind(yCoordPrior, yCoordLikelihood, yCoordPosterior)
xMat = cbind(xCoord, xCoord, xCoord)

matplot(x = xMat, y = yMat, type = "l", xlab = expression(theta),
ylab = expression(paste("p(", theta, ")")), main = paste("Red = MAP = ", round(x = mapEstimate, digits = 2),
"; Green = EAP = ", round(x = posteriorMean, digits = 2),
"; SD = ", round(x = posteriorSD, digits = 2)),
lwd = 3)
legend(x = 0, y = 1, legend = c("Prior", "Likelihood", "Posterior"), col = 1:3, lty = 1:3, lwd = 1:3)