- Concepts that reappear in Bayesian inference
- Features where Bayesian methods differ from maximum likelihood methods

- Covered in each of the past few lectures
- To see an example, see Lecture 05: JAGS introduction
- See how inferences change when prior distribution changes

- Key to understanding: As the likelihood function is at the core of Bayesian theorem, properties of the likelihood from ML apply in Bayesian
- For instance, limiting distributions
- For continuous parameters, the posterior distribution becomes normally distributed as N approaches infinity

- For instance, limiting distributions
- Some exceptions do apply:
- Models with priors outside the sample space of the parameter
- When estimates are in a suboptimal area of the likelihood (for models with multiple modes)

- The easy parts to understand
- Priors should quantify uncertainty about parameters
- Choosing flexible priors seems better than inflexible ones (e.g., Beta vs. Uniform)

- Harder:
- Picking priors from previous research
- Interpretability: How do you choose a good prior for parameters that are difficult overall?
- Having input from “experts”
- “Experts” is often an easy way to say “uncertainty is not easily quantifiable”
- “Experts” are often wrong, especially with respect to determining numeric quantities (see Judgment and Decision Making literature about eliciting human judgement about stochastic processes)

- In between:
- Choosing priors that lead to easily obtainable posteriors
- Used to mean picking conjugate priors (Normal-Normal or Beta-Binomial)
- Now, with software like JAGS or STAN, it is a matter of waiting time

- Choosing priors that lead to easily obtainable posteriors

- Pro ML: Priors are somewhat (possibly entirely) subjective
- Counter: Data can overwhelm prior
- Counter, part deux: Choice of a model is a type of prior present in ML

- Pro Bayes: ML is “hard” – difficult to calculate derivatives for using numerical optimization of likelihood function
- Counter: Many new Bayesian techniques need same derivatives (or other quantities)

- Remember: Sample size is key to determining many differences
- ML beliefs: Estimators are random, so statements aren’t about probabilities of parameters
- Bayesian beliefs: Parameters are random, so statements are more direct
- Bayes theorem makes inductive reasoning transparently about the parameters

- Pro Bayes: Models are explicit – so more transparent
- Counter: Not unique to Bayes, probably more about didactic traditions in teaching statistics

- Exchangablility: belief that quantities (data or parameters) can have labels erased and be treated the same by the model
- Shows up in ML and in Bayes
- de Finetti’s theorems (which show conditional independence)

- Conditional independence:

\[p(x_1, x_2) = p(x_1)p(x_2)\]

- Conditional independence is important for building efficient (fast) algorithms
- Things that are exchangeable are conditionally independent
- Independent processes can be split computationally (more processors; Hello GPUs)

- Text sidebar (p. 66): The use of a uniform prior \(U\left(-\infty, infty \right)\) is indeed improper, however, when using computers to derive results, \(\infty\) isn’t a number.
- All integer and floating point numbers have a maximum and minimum value – so such a prior would be proper
- Bigger issue: Belief that uniform priors may be “bad”
- Yes, some results say they can be bad – often these are extreme cases

- Managing and propagating error: important in both Bayes and ML…can be done easier in Bayes
- Updating results based on new information: Bayes is built for it
- Today’s posterior is tomorrow’s prior
- Probably one of the easier ways to justify the use of a prior

- Several conceptualizations used in book
- Bayesian inference is a belief updating process
- Prior beliefs are updated when they meet data and become a posterior distribution

- Bayesian methods augment information in the data
- Joint distribution of data is primary conceptual basis for analysis (consistent with ML)
- Model building/expansion

- Bayesian inference is a belief updating process