Introduction to Bayesian Statistics
Start your free 7-days trial now!
Philosophy
The pivotal difference between Frequentist and Bayesian statistics is that the former considers the population parameter to be fixed, while the latter treats it as a random variable. Such a difference in the interpretation of the nature of the population parameters carries significant and vast implications. In the eyes of Bayesians, it makes sense to assign probability distributions to parameters and compute expected values and variances, just as we do for any random variable.
Bayes' Theorem
The Bayes' theorem is as follows:
If the events $B_1,B_2,…,B_k$ constitute a partition of the sample space $S$ such that $\mathbb{P}(B_i)\ne0$ for $i=1,2,…,k$, then for any event $A$ in $S$ such that $\mathbb{P}(A)\ne0$:
For the continuous case:
Here, the denominator is a normalising constant for the posterior distribution and does not depend on $\theta$. For this reason, we often write \eqref{eq:oyWcmvy42akS8TCH9AF} as:
Proof. Using the definition of conditional probability and the rule of elimination, we can derive the Bayes’ theorem:
The continuous case is as follows (i.e. this is just a combination of conditional probability and the rule of elimination):
Terminologies
Prior Distribution
The initial probability distribution assigned to the parameter is referred to as the prior distribution. Note that the prior distribution is given before we process the data - hence the name "prior".
Likelihood Function
The likelihood function that we are used was often denoted as $f(\theta|x)$, which makes the fact that we are treating theta as the variable explicit. However, the notation that we use in the Bayesian world differs from this in that, the likelihood function appears to be in reverse order $f(x|\theta)$ at first glance. However, $x$ is still treated as a constant, while the θ is treated as a variable. For this very reason, we do not regard $f(x|\theta)$ as a probability.
Posterior Distribution
Once we incorporate and process the data, the prior distribution then becomes a posterior distribution, $f_\theta(\theta|\mathbf{x})$. The posterior distribution is the basis for statistical inference in the Bayesian world.
Example
Question. Company A is trying to estimate how many of their products are defects. Out of the thousands of the products made, the company took a random sample of size n and found that k of them are defects. As an additional insight, suppose company A knows that around 5% of theirs products are defects based on past experience. Determine the posterior distribution.
Solution. We know directly from the question that $X\sim\mathcal{B}(n,\theta)$, which means that the likelihood function is:
One way of modelling the company's insight of $\theta$ is to use the beta distribution:
We need to come up with the parameters $\alpha$ and $\beta$ that aligns with the company's insight, that is, we want the beta distribution to be skewed sharply to the left. If we set $\alpha=2$ and $\beta=8$, this seems to be adequate. This means that the prior distribution is:
As it turns out, the posterior distribution is: