Properties of expected values
Start your free 7-days trial now!
This guide will go over the mathematical properties of the expected value of a random variable. For those unfamiliar with the concept of expected values, please check out our comprehensive guide on expected value first.
The proofs we provide here will be for discrete random variables, but the properties hold for continuous random variables as well. The proof for the continuous case is analogous to the discrete case with the summation sign essentially replaced by the integral sign.
Expected value of a constant
The expected value of a scalar constant $c$ is:
Proof. Let $X$ denote a discrete or continuous random variable with a probability mass function $p(x)$. Let $c$ be the outcome of some function $g(x)$, that is $c=g(x)$. Now, using the definition of expected values:
This completes the proof.
Expected value of a constant times a random variable
If $X$ is a random variable and $c$ is a constant, then the expected value of their product is:
Proof. Let $p(x)$ be the probability mass function of $X$. From the definition of expected value:
This completes the proof.
Expected value of a random variable plus a constant
If $X$ is a random variable and $c$ is a constant, then the expected value of their sum is:
Proof. Let $p(x)$ be the probability mass function of $X$. From the definition of expected value:
This completes the proof.
Expected value of XY where X and Y are independent
If $X$ and $Y$ are independent random variables, then:
Proof. Let $X$ and $Y$ are discrete random variables with probability mass functions $p_X(x)$ and $p_Y(y)$, respectively. Let $p_{X,Y}(x,y)$ denote their joint probability mass function. From the definition of expected value, we have that:
Here, the second equality holds because $p_{X,Y}(x_i,y_j)=p_X(x_i)\cdot{p_Y(y_j)}$ is true if $X$ and $Y$ are independent random variables.
Expected value of a sum of two random variables (X+Y)
If $X$ and $Y$ are random variables, then the expected value of their sum $X+Y$ is:
Proof. Let $X$ and $Y$ are discrete random variables with probability mass functions $p_X(x)$ and $p_Y(y)$, respectively. Let $p_{X,Y}(x,y)$ denote their joint probability mass function. From the definition of expected value, we have that:
This completes the proof.
Linearity of expected values
If $X$ and $Y$ are two random variables and $a$ and $b$ are some constants, then:
Proof. The linearity of expected values follows from two of the properties of expected values below that we have already proven:
The proof is as follows:
This completes the proof.
Taking summation sign in and out of expected values
If $X_1,X_2,\cdots,X_n$ are random variables, then:
Proof. We can use the linearity of expected values to prove this easily:
This completes the proof.
Expected value of X given Y where X and Y are independent
If $X$ and $Y$ are independent random variables, then:
Proof. We used the fact that $p(x|y)=p(x)$ given $X$ and $Y$ are independent:
This completes the proof.
Bounding random variable
If a random variable $X$ is bounded between scalars $a$ and $b$, the expected value of $X$ is also bounded between $a$ and $b$, that is:
Proof. The idea is to incrementally apply transformations to the inequality such that $X$ becomes $\mathbb{E}(X)$. We begin from the left-hand side:
Multiplying every term by the probability distribution function of X gives:
This is allowed because $p(x)\ge0$. Let's take a moment to understand what \eqref{eq:lLrkhvQDcb8ndDEZep8} means. Suppose the possible values that $X$ can take is $\{x_1,x_2,x_3\}$. \eqref{eq:lLrkhvQDcb8ndDEZep8} implies that the following are all true:
Let's add the three inequalities:
To generalize, let's say $X$ can take on the values $\{x_1,x_2,\cdots,x_n\}$, which will lead to:
Notice how the middle term is the definition of $\mathbb{E}(X)$! We can also take out $a$ and $b$ from the summation since they are constants:
Finally, by definition of probability mass function, the sum of the probabilities of all possible values of a random variable must sum up to one, that is:
Using this property on \eqref{eq:AHOFCS3qRCNcPbYKujk} gives us the desired result:
This completes the proof.
Expected value of a function of random variables
Let $X$ be a random variable with a known probability mass function $p(x)$, and $Y$ be another random variable that is a function of the original random variable $X$, say $Y=g(x)$. The expected value of $Y$ is:
Proof. Let's work with a simple example first for some intuition before we tackle the formal proof. Suppose $X$ is a random variable that can take on the following $3$ different values that occur with some known probabilities:
$X$ | $\mathbb{P}(X=x)$ |
---|---|
$-1$ | $\mathbb{P}(X=-1)$ |
$1$ | $\mathbb{P}(X=1)$ |
$2$ | $\mathbb{P}(X=2)$ |
Here, instead of explicitly assigning probabilities (e.g. $\mathbb{P}(X=-1)=0.3$), we stick with the general form.
Now, let $Y$ be a random variable defined by $Y=X^2$. Our goal is to compute the expected value of $Y$, that is, $\mathbb{E}(Y)$. Let's start by drawing a diagram that visualizes the function $y=g(x)=x^2$:
Here, notice how $X=-1$ and $X=1$ are mapped to the same $Y$ value. This is because our function $g(x)$ is a many-to-one function, that is, multiple inputs can output the same value. To compute the probability $\mathbb{P}(Y=1)$, we need to sum up all the probabilities that result in $Y=1$. In this case, $X=-1$ and $X=1$ both result in $Y=1$ so:
We can write this more generally as:
Where $y_1=g(x_1)=g(x_2)$. Similarly, $\mathbb{P}(Y=4)$ can be computed by:
Or more generally by:
Where $y_2=g(x_3)$.
Now, by referring to \eqref{eq:EKGL0NiHnsSUQpYAqMJ} and \eqref{eq:Xr9cwUTNraDeWtdvs5U}, we can generalize the process of computing the probability of any $Y$ using the following summation:
Again, the reason why the summation is necessary here is that the function $g(x)$ may not necessarily be one-to-one (e.g. $g(x)=x^2$).
Now, let's go back to our original goal of computing $\mathbb{E}(Y)$. From the definition of expected value, we have that:
Substituting \eqref{eq:HDzS3cB1MNYZ4S2sIej} into \eqref{eq:nQpvD0Xh9mTv1Ngz6EN} gives:
Let's generalize this further:
instead of 3 values for random variable $X$, let's make this $m$.
instead of 2 values for random variable $Y$, let's make this $n$.
if $g(x)$ is a one-to-one function, then $m=n$.
if $g(x)$ is a many-to-one function (as in our example), then $m\ge{n}$.
instead of a probability $\mathbb{P}(X=x)$, we use a probability mass (or density) function $p(x)$.
The formal proof simply involves using these general variables - the logic is exactly the same:
This completes the proof.
Law of total expectation
If $ X$ and $Y$ are random variables, then the Law of Total Expectation states that:
To emphasize that $\mathbb{E}(X|Y)$ is a random quantity based on $Y$, the outer expected value is subscripted with $Y$.
From the definition of expected values, the Law of Total Expectation can also be written as:
Where $p(y)$ is the probability distribution of $Y$.
Proof. We use the definition of expected values rewrite term:
We again use the definition of expected values:
We rearrange the summations:
We know from the definition of conditional probability that:
Therefore, \eqref{eq:TNDHb0OXFTj2QNpgYRH} becomes:
Now, the marginal distribution $p(x)$ is defined as:
Substituting this into \eqref{eq:u74Q0ZeRpZLHy47b4G1} gives:
This completes the proof.