Highlights of Chapter
- Concepts that reappear in Bayesian inference
- Features where Bayesian methods differ from maximum likelihood methods
Issue Number 1: Influence of Prior vs. Data
- Covered in each of the past few lectures
- To see an example, see Lecture 05: JAGS introduction
- See how inferences change when prior distribution changes
- Key to understanding: As the likelihood function is at the core of Bayesian theorem, properties of the likelihood from ML apply in Bayesian
- For instance, limiting distributions
- For continuous parameters, the posterior distribution becomes normally distributed as N approaches infinity
- Some exceptions do apply:
- Models with priors outside the sample space of the parameter
- When estimates are in a suboptimal area of the likelihood (for models with multiple modes)
Issue Number 2: Motivating the Selection of Prior Distributions
- The easy parts to understand
- Priors should quantify uncertainty about parameters
- Choosing flexible priors seems better than inflexible ones (e.g., Beta vs. Uniform)
- Harder:
- Picking priors from previous research
- Interpretability: How do you choose a good prior for parameters that are difficult overall?
- Having input from “experts”
- “Experts” is often an easy way to say “uncertainty is not easily quantifiable”
- “Experts” are often wrong, especially with respect to determining numeric quantities (see Judgment and Decision Making literature about eliciting human judgement about stochastic processes)
- In between:
- Choosing priors that lead to easily obtainable posteriors
- Used to mean picking conjugate priors (Normal-Normal or Beta-Binomial)
- Now, with software like JAGS or STAN, it is a matter of waiting time
Issue Number 3: Bayesian vs. ML
- Pro ML: Priors are somewhat (possibly entirely) subjective
- Counter: Data can overwhelm prior
- Counter, part deux: Choice of a model is a type of prior present in ML
- Pro Bayes: ML is “hard” – difficult to calculate derivatives for using numerical optimization of likelihood function
- Counter: Many new Bayesian techniques need same derivatives (or other quantities)
- Remember: Sample size is key to determining many differences
- ML beliefs: Estimators are random, so statements aren’t about probabilities of parameters
- Bayesian beliefs: Parameters are random, so statements are more direct
- Bayes theorem makes inductive reasoning transparently about the parameters
- Pro Bayes: Models are explicit – so more transparent
- Counter: Not unique to Bayes, probably more about didactic traditions in teaching statistics
Issue Number 4: Exchangability and Conditional Independence
- Exchangablility: belief that quantities (data or parameters) can have labels erased and be treated the same by the model
- Shows up in ML and in Bayes
- de Finetti’s theorems (which show conditional independence)
- Conditional independence:
\[p(x_1, x_2) = p(x_1)p(x_2)\]
- Conditional independence is important for building efficient (fast) algorithms
- Things that are exchangeable are conditionally independent
- Independent processes can be split computationally (more processors; Hello GPUs)
- Text sidebar (p. 66): The use of a uniform prior \(U\left(-\infty, infty \right)\) is indeed improper, however, when using computers to derive results, \(\infty\) isn’t a number.
- All integer and floating point numbers have a maximum and minimum value – so such a prior would be proper
- Bigger issue: Belief that uniform priors may be “bad”
- Yes, some results say they can be bad – often these are extreme cases
- Managing and propagating error: important in both Bayes and ML…can be done easier in Bayes
- Updating results based on new information: Bayes is built for it
- Today’s posterior is tomorrow’s prior
- Probably one of the easier ways to justify the use of a prior
Conceptulizations of Bayesian Modeling
- Several conceptualizations used in book
- Bayesian inference is a belief updating process
- Prior beliefs are updated when they meet data and become a posterior distribution
- Bayesian methods augment information in the data
- Joint distribution of data is primary conceptual basis for analysis (consistent with ML)
- Model building/expansion