Comprehensive Guide on Geometric Distribution
Start your free 7-days trial now!
Before we formally define the geometric distribution, let's go through a motivating example.
Motivating example
Suppose we repeatedly toss an unfair coin with the following probabilities:
Where $\mathrm{H}$ and $\mathrm{T}$ represent heads and tails respectively.
Let the random variable $X$ denote the first occurrence of a heads at the $X$-th trial. To obtain heads for the very first time at the $X$-th trial, we must have obtained $X-1$ number of tails before that. For instance, suppose we are interested in the probability of obtaining a heads for the first time at the 3rd trial. This means that the outcome of our tosses has to be:
The probability of this specific outcome is:
What would the probability of obtaining a heads for the first time at the 5th trial be? In this case, the outcome of the tosses must be:
The probability that this specific outcome occurs is:
Hopefully you can see that, in general, the probability of obtaining a heads for the first time at the $x$-th trial is given by:
Let's generalize further - instead of heads and tails, let's denote a heads as a success and a tails as a failure. If the probability of success is $p$, then the probability of failure is $1-p$. Therefore, the probability of observing a success for the first time at the $x$-th trial is given by:
Random variables that have this specific distribution is said to follow the geometric distribution.
Assumptions of the geometric distribution
While deriving the geometric distribution, we made the following implicit assumptions:
probability of success is constant for every trial. For our example, the probability of heads is $0.2$ for each coin toss.
the trials are independent. For our example, the outcome of a coin toss does not affect the outcome of the next coin toss.
the outcome of each trial is binary. For our example, the outcome of a coin toss is either heads or tails.
Statistical experiments that satisfy these conditions are called repeated Bernoulli trials.
Geometric distribution
A discrete random variable $X$ is said to follow a geometric distribution with parameter $p$ if and only if the probability mass function of $X$ is:
Where $0\le{p}\le{1}$. If a random variable $X$ follows a geometric distribution with parameter $p$, then we can write $X\sim\text{Geom}(p)$.
Drawing balls from a bag
Suppose we randomly draw with replacement from a bag containing 3 red balls and 2 green balls until a green ball is drawn. Answer the following questions:
What is the probability of drawing a green ball at the 3rd trial?
What is the probability of drawing a green ball at or before the 3rd trial?
Solution. Let's start by confirming that this experiment is a repeated Bernoulli trials:
probability of success, that is, drawing a green ball is constant ($p=2/5$).
the trials are independent because we are drawing with replacement.
the outcome is binary - we either draw a green ball or we don't.
Let $X$ be a geometric random variable representing the first time we draw a green ball at the $X$-th trial. The probability of success, that is, the probability of drawing a green ball at each trial is $p=2/5$. Therefore, the geometric probability mass function is:
The probability of drawing a green ball at the 3rd trial is:
The probability of drawing a green ball at or before the 3rd trial is:
Finally, let's graph our geometric probability mass function:
We can see that $\mathbb{P}(X=3)$ is indeed roughly around $0.144$. Note that we've truncated the graph at $x=10$ but $x$ can be any positive integer.
Properties of geometric distribution
Expected value of a geometric random variable
If $X$ follows a geometric distribution with parameter $p$, then the expected value of $X$ is given by:
Proof. Let $X\sim\mathrm{Geom}(p)$. By the definitionlink of expected values, we have that:
We now want to compute the summation:
Let's rewrite each term as:
To get \eqref{eq:tmFosgxf7NJZupdcACh}, we must take the summation - but the trick is to do so vertically:
Notice that each of these are infinite geometric series with common ratio $(1-p)$. The only difference between them is the starting value. Because $p$ is a probability, we have that $0\lt{p}\lt1$. This also means that $0\lt{1-p}\lt1$. In our guide on geometric series, we have shownlink that all infinite geometric series converge to the following sum when the common ratio is between $-1$ and $1$:
Where $a$ is the starting value of the series and $r$ is the common ratio. Therefore, in our case, the sum would be:
The orange sum can therefore be written as:
Once again, we end up with yet another infinite geometric series with starting value $1/p$ and common ratio $(1-p)$. Using the formula for the sum of infinite geometric series \eqref{eq:ETLEUmzvGgK9qpXwgIf} once again gives:
Substituting \eqref{eq:jkILy4J9QlCfNqZ0wQI} into \eqref{eq:wX1Je2CfjYCv3cV1Sgl} gives:
This completes the proof.
Variance of a geometric random variable
If $X$ follows a geometric distribution with parameter $p$, then the expected value of $X$ is given by:
Proof. We know from the propertylink of variance that:
We already know what $\mathbb{E}(X)$ is from earlierlink, so we have that:
We now need to derive the expression for $\mathbb{E}(X^2)$. From the definition of expected values, we have that:
Now, notice how we can obtain the purple term by taking the derivative like so:
Note that we are using the power rule of differentiation here.
Let's now use the properties of geometric series to find an expression for the green summation in \eqref{eq:ctA9d0vBqyejpCEhDWm}. We define a new variable $k$ such that $k=x+1$, which also means that $x=k-1$. Rewriting the green summation in terms of $k$ gives:
Now, recall that we derived the following lemma \eqref{eq:jkILy4J9QlCfNqZ0wQI} when proving the expected value earlier:
Notice how the only difference between the summation in \eqref{eq:QL6R8IRppv3v1qFGqed} and \eqref{eq:BCJF0TeNFEcRdzNrRqs} is the symbol used. Since the symbol itself doesn't matter, we have that:
Taking the derivative of both sides with respective to $p$ gives:
Equating \eqref{eq:ctA9d0vBqyejpCEhDWm} and \eqref{eq:C5c0gN5zpfWoNmLmv79} gives:
Substituting \eqref{eq:JMbtP2c4X0XzUrqu7Pj} into \eqref{eq:Liebvt5zDji9PljQRHJ} gives:
Finally, substituting \eqref{eq:VqPsbnb96WEAsuLui4w} into \eqref{eq:BFUrEk4tUwaopDxTSb9} gives:
This completes the proof.
Cumulative distribution function of the geometric distribution
The cumulative distribution function of the geometric distribution with success probability $p$ is given by:
Proof. We use the definition of cumulative distribution and geometric distribution:
Notice that this is a finite geometric series with starting value $p$ and common ratio $(1-p)$. Using the formulalink for the sum of a finite geometric series, we have that:
This completes the proof.
Memoryless property of the geometric distribution
The geometric distribution satisfies the memoryless property, that is:
Where $m$ and $n$ are non-negative integers. Note that the geometric distribution is the only discrete probability distribution with the memoryless property.
Intuition. Suppose we keep tossing a coin until we observe our first heads. We know that if we let random variable $X$ represent the outcome of heads at the $X$-th trial, then $X$ is a geometric random variable with success probability $p$. Suppose we are interested in the probability that we get heads for the first time after trial $5$, that is:
Now, suppose we have already observed more than $2$ tails. We can update our probability \eqref{eq:ooYrmktgCHfXpKBZwTL} to include this information:
We now use the formula for conditional probability:
Notice how $\mathbb{P}(X\gt5\;\text{and}\;X\gt2)$ is equal to $\mathbb{P}(X\gt5)$ because when $X\gt5$, then $X\gt2$ is always true. Therefore, we have that:
To calculate the two probabilities, we can use the geometric cumulative distribution functionlink that we derived earlier:
Taking the complement on both sides:
Therefore, $\mathbb{P}(X\gt5)$ and $\mathbb{P}(X\gt2)$ in \eqref{eq:XDyz0O6555xlpibL8kt} can be expressed as:
We can simplify this further using \eqref{eq:DVNJ1bsgANPTzCUJUqP} again:
This means that the probability of observing a heads after the $5$-th trial given that we have already observed $2$ tails is equal to the probability of starting over and observing the first heads after $3$ trials. This makes sense because the past outcomes ($2$ tails in this case) do not affect subsequent outcomes, and hence we can forget about them and act as if we're starting a new coin-toss experiment with the remaining number of trials ($3$ in this case).
Proof. The proof of the memoryless property follows the same logic. Consider a geometric random variable $X$ with probability of success $p$. Recall from earlier that the probability of the first heads occurring after the $5$-th trial given that we have already observed $2$ tails is:
Instead of using these concrete numbers, we replace $5$ with $m+n$ and $2$ with $n$.
This completes the proof.
Alternate parametrization of the geometric distribution
A discrete random variable $X$ is also said to follow a geometric distribution with parameter $p$ if the probability mass function of $X$ is:
Where $0\le{p}\le{1}$.
Intuition and proof. We have introduced the geometric random variable $X$ as observing the first success at the $X$-th trial. The probability mass function of $X$ was derived to be:
Where $p$ is the probability of success and $x=1,2,3,\cdots$.
There exists an equivalent formulation of the geometric distribution where we let random variable $X$ represent the number of failures before the first success. The key is to notice that observing the first success at the $X$-th trial is logically equivalent to observing $X-1$ failures before the first success. For instance, observing the first success at the $5$-th trial is the same as observing $5-1=4$ failures before the first success:
Let's go the other way now - if we let random variable $X$ represent the number of failures before the first success, then we must observe the first success at the $(X+1)^\text{th}$ trial. We know that $X+1$ follows a geometric distribution with probability mass function:
Let's simplify the left-hand side:
Next, we simplify the right-hand side:
Therefore, \eqref{eq:sTwVrvAjmdD3HLG0aA0} is:
Finally, since $X$ represents the number of failures before the first success, $X$ can take on the values $X=0,1,2,\cdots$. This is slightly different from what $X$ can take on in the original definition of the geometric distribution, which was $X=1,2,3,\cdots$. This completes the proof.
Working with geometric distribution using Python
Computing probabilities
Consider the example from earlier:
Suppose we randomly draw with replacement from a bag containing 3 red balls and 2 green balls until a green ball is drawn. What is the probability of drawing a green ball at the 3rd trial?
If we define random variable $X$ as the number of trials needed to observe the first green ball at the $X$-th trial, then $X\sim\text{Geom}(2/5)$. We then use the geometric probability mass function to compute the probability of $X=3$:
Instead of computing the probability by hand, we can use Python's SciPy library:
from scipy.stats import geomx = 3p = (2/5)geom.pmf(x, p)
0.144
Notice that the computed result is identical to the hand-calculated result.
Drawing probability mass function
Suppose we wanted to draw the probability mass function of $X\sim\text{Geom}(2/5)$:
We can call the geom.pdf(~)
function on a list of positive integers:
import matplotlib.pyplot as plt
p = (2/5)n = 15xs = list(range(1, n+1)) # [0,1,2,...,15]pmfs = geom.pmf(xs, p)str_xs = [str(x) for x in xs] # Convert list of integers into list of string labelsplt.bar(str_xs, pmfs)plt.xlabel('$x$')plt.ylabel('$p(x)$')plt.show()
This generates the following plot:
Note that we converted the list of integers into a list of string labels, otherwise the $x$-axis will contain decimals: