Proof. Let $X$ denote a discrete or continuous random variable with a probability mass function $p(x)$. Let $c$ be the outcome of some function $g(x)$, that is $c=g(x)$. Now, using the definition of expected values:

$$\begin{align*} \mathbb{E}(c) &=\mathbb{E}[g(x)]\\ &=\sum_xg(x)\cdot{p(x)}\\ &=\sum_x{c}\cdot{p(x)}\\ &=c\sum_x{p(x)}\\ &=c\\ \end{align*}$$

This completes the proof.

■

Theorem.

Expected value of a constant times a random variable

If $X$ is a random variable and $c$ is a constant, then the expected value of their product is:

$$\mathbb{E}(cX)=c\cdot{\mathbb{E}(X)}$$

Proof. Let $p(x)$ be the probability mass function of $X$. From the definition of expected value:

$$\begin{align*} \mathbb{E}(cX) &=\sum_xcx\cdot{p(x)}\\ &=c\sum_xx\cdot{p(x)}\\ &=c\cdot\mathbb{E}(X) \end{align*}$$

This completes the proof.

■

Theorem.

Expected value of a random variable plus a constant

If $X$ is a random variable and $c$ is a constant, then the expected value of their sum is:

$$\mathbb{E}(X+c)=\mathbb{E}(X)+c$$

Proof. Let $p(x)$ be the probability mass function of $X$. From the definition of expected value:

$$\begin{align*} \mathbb{E}(X+c) &=\sum_x(x+c)\cdot{p(x)}\\ &=\sum_x\Big(x\cdot{p(x)}+c\cdot{p(x)}\Big)\\ &=\sum_x\Big(x\cdot{p(x)}\Big)+\sum_x\Big(c\cdot{p(x)}\Big)\\ &=\mathbb{E}(X)+c\cdot\sum_x\Big(p(x)\Big)\\ &=\mathbb{E}(X)+c(1)\\ &=\mathbb{E}(X)+c\\ \end{align*}$$

This completes the proof.

■

Theorem.

Expected value of XY where X and Y are independent

If $X$ and $Y$ are independent random variables, then:

$$\mathbb{E}(XY)=\mathbb{E}(X)\cdot\mathbb{E}(Y)$$

Proof. Let $X$ and $Y$ are discrete random variables with probability mass functions $p_X(x)$ and $p_Y(y)$, respectively. Let $p_{X,Y}(x,y)$ denote their joint probability mass function. From the definition of expected value, we have that:

$$\begin{align*} \mathbb{E}(XY) &=\sum_i\sum_jx_iy_j\cdot{p_{X,Y}(x_i,y_j)}\\ &=\sum_i\sum_jx_iy_j\cdot{p_X(x_i)\cdot{p_Y(y_j)}}\\ &=\Big(\sum_ix_i\cdot{p_X(x_i)}\Big)\Big(\sum_jy_j\cdot{p_Y(y_j)}\Big)\\ &=\mathbb{E}(X)\cdot\mathbb{E}(Y) \end{align*}$$

Here, the second equality holds because $p_{X,Y}(x_i,y_j)=p_X(x_i)\cdot{p_Y(y_j)}$ is true if $X$ and $Y$ are independent random variables.

■

Theorem.

Expected value of a sum of two random variables (X+Y)

If $X$ and $Y$ are random variables, then the expected value of their sum $X+Y$ is:

$$\mathbb{E}(X+Y)= \mathbb{E}(X)+\mathbb{E}(Y)$$

$$\begin{align*} \mathbb{E}(X+Y) &=\sum_i\sum_j(x_i+y_j)\cdot{p_{XY}(x_i,y_j)}\\ &=\sum_i\sum_jx_i\cdot{p_{XY}(x_i,y_j)}+y_j\cdot{p_{XY}(x_i,y_j)}\\ &=\sum_i\sum_jx_i\cdot{p_{XY}(x_i,y_j)}+\sum_i\sum_jy_j\cdot{p_{XY}(x_i,y_j)}\\ &=\sum_ix_i\sum_j{p_{XY}(x_i,y_j)}+\sum_j\sum_iy_j\cdot{p_{XY}(x_i,y_j)}\\ &=\sum_ix_i\sum_j{p_{XY}(x_i,y_j)}+\sum_jy_j\sum_i{p_{XY}(x_i,y_j)}\\ &=\sum_ix_i\cdot{p_{X}(x_i)}+\sum_jy_j\cdot{p_Y(y_j)}\\ &=\mathbb{E}(X)+\mathbb{E}(Y) \end{align*}$$

This completes the proof.

■

Theorem.

Linearity of expected values

If $X$ and $Y$ are two random variables and $a$ and $b$ are some constants, then:

$$\mathbb{E}(aX+bY) =a\cdot{\mathbb{E}(X)}+ b\cdot{\mathbb{E}(Y)}$$

Proof. The linearity of expected values follows from two of the properties of expected values below that we have already proven:

$$\begin{align*} \mathbb{E}(X+Y)&=\mathbb{E}(X)+\mathbb{E}(Y)\\ \mathbb{E}(aX)&=a\cdot\mathbb{E}(X) \end{align*}$$

The proof is as follows:

$$\begin{align*} \mathbb{E}(aX+bY) &=\mathbb{E}(aX)+\mathbb{E}(bY)\\ &=a\cdot\mathbb{E}(X)+b\cdot\mathbb{E}(Y) \end{align*}$$

This completes the proof.

■

Theorem.

Taking summation sign in and out of expected values

If $X_1,X_2,\cdots,X_n$ are random variables, then:

$$\sum_{i=1}^n \mathbb{E}(X_i)= \mathbb{E}\Big(\sum_{i=1}^nX_i\Big)$$

Proof. We can use the linearity of expected values to prove this easily:

$$\begin{align*} \sum_{i=1}^n \mathbb{E}(X_i) &=\mathbb{E}(X_1)+\mathbb{E}(X_2)+\cdots+\mathbb{E}(X_n)\\ &=\mathbb{E}(X_1+X_2+\cdots+X_n)\\ &=\mathbb{E}\Big(\sum^n_{i=1}X_i\Big) \end{align*}$$

This completes the proof.

■

Theorem.

Expected value of X given Y where X and Y are independent

If $X$ and $Y$ are independent random variables, then:

$$\mathbb{E}(X|Y)=\mathbb{E}(X)$$

Proof. We used the fact that $p(x|y)=p(x)$ given $X$ and $Y$ are independent:

$$\begin{align*} \mathbb{E}(X|Y=y) &=\sum_xx\cdot{p(x|y)}\\ &=\sum_xx\cdot{p(x)}\\ &=\mathbb{E}(X) \end{align*}$$

This completes the proof.

■

Theorem.

Bounding random variable

If a random variable $X$ is bounded between scalars $a$ and $b$, the expected value of $X$ is also bounded between $a$ and $b$, that is:

$$a\le{X}\le{b} \;\;\;\implies\;\;\; a\le\mathbb{E}(X)\le{b}$$

Proof. The idea is to incrementally apply transformations to the inequality such that $X$ becomes $\mathbb{E}(X)$. We begin from the left-hand side:

$$a\le{x}\le{b}$$

Multiplying every term by the probability distribution function of X gives:

$$\begin{equation}\label{eq:lLrkhvQDcb8ndDEZep8} a\cdot{p(x)}\le{x}\cdot{p(x)}\le{b}\cdot{p(x)} \end{equation}$$

This is allowed because $p(x)\ge0$. Let's take a moment to understand what \eqref{eq:lLrkhvQDcb8ndDEZep8} means. Suppose the possible values that $X$ can take is $\{x_1,x_2,x_3\}$. \eqref{eq:lLrkhvQDcb8ndDEZep8} implies that the following are all true:

$$\begin{align*} a_1\cdot{p(x_1)}\le{x_1}\cdot{p(x_1)}\le{b}\cdot{p(x_1)}\\ a_2\cdot{p(x_2)}\le{x_2}\cdot{p(x_2)}\le{b}\cdot{p(x_2)}\\ a_3\cdot{p(x_3)}\le{x_3}\cdot{p(x_3)}\le{b}\cdot{p(x_3)} \end{align*}$$

Let's add the three inequalities:

$$\sum_{i=1}^{3}a\cdot{p(x_i)} \le\sum_{i=1}^{3}{x_i}\cdot{p(x_i)} \le\sum_{i=1}^{3}{b}\cdot{p(x_i)}$$

To generalize, let's say $X$ can take on the values $\{x_1,x_2,\cdots,x_n\}$, which will lead to:

$$\sum_{i=1}^{n}a\cdot{p(x_i)} \le\sum_{i=1}^{n}{x_i}\cdot{p(x_i)} \le\sum_{i=1}^{n}{b}\cdot{p(x_i)}$$

Notice how the middle term is the definition of $\mathbb{E}(X)$! We can also take out $a$ and $b$ from the summation since they are constants:

$$\begin{equation}\label{eq:AHOFCS3qRCNcPbYKujk} a\sum_{i=1}^{n}p(x_i) \le\mathbb{E}(X) \le{b\sum_{i=1}^{n}p(x_i)} \end{equation}$$

Finally, by definition of probability mass function, the sum of the probabilities of all possible values of a random variable must sum up to one, that is:

$$\sum_{i=1}^{n}p(x_i)=1$$

Using this property on \eqref{eq:AHOFCS3qRCNcPbYKujk} gives us the desired result:

$$a\le\mathbb{E}(X)\le{b}$$

This completes the proof.

■

Theorem.

Expected value of a function of random variables

Let $X$ be a random variable with a known probability mass function $p(x)$, and $Y$ be another random variable that is a function of the original random variable $X$, say $Y=g(x)$. The expected value of $Y$ is:

$$\mathbb{E}(Y) =\mathbb{E}[g(x)] =\sum_{x}g(x)\cdot{p(x)}$$

Proof. Let's work with a simple example first for some intuition before we tackle the formal proof. Suppose $X$ is a random variable that can take on the following $3$ different values that occur with some known probabilities:

$X$	$\mathbb{P}(X=x)$
$-1$	$\mathbb{P}(X=-1)$
$1$	$\mathbb{P}(X=1)$
$2$	$\mathbb{P}(X=2)$

Here, instead of explicitly assigning probabilities (e.g. $\mathbb{P}(X=-1)=0.3$), we stick with the general form.

Now, let $Y$ be a random variable defined by $Y=X^2$. Our goal is to compute the expected value of $Y$, that is, $\mathbb{E}(Y)$. Let's start by drawing a diagram that visualizes the function $y=g(x)=x^2$:

Here, notice how $X=-1$ and $X=1$ are mapped to the same $Y$ value. This is because our function $g(x)$ is a many-to-one function, that is, multiple inputs can output the same value. To compute the probability $\mathbb{P}(Y=1)$, we need to sum up all the probabilities that result in $Y=1$. In this case, $X=-1$ and $X=1$ both result in $Y=1$ so:

$$\begin{equation}\label{eq:jyZgThRLEE7WVZHvlzS} \mathbb{P}(Y=1)=\mathbb{P}(X=-1)+\mathbb{P}(X=1) \end{equation}$$

We can write this more generally as:

$$\begin{equation}\label{eq:EKGL0NiHnsSUQpYAqMJ} \mathbb{P}(Y=y_1) =\mathbb{P}(X=x_1) +\mathbb{P}(X=x_2) \end{equation}$$

Where $y_1=g(x_1)=g(x_2)$. Similarly, $\mathbb{P}(Y=4)$ can be computed by:

$$\mathbb{P}(Y=4)=\mathbb{P}(X=2)$$

Or more generally by:

$$\begin{equation}\label{eq:Xr9cwUTNraDeWtdvs5U} \mathbb{P}(Y=y_2) =\mathbb{P}(X=x_3) \end{equation}$$

Where $y_2=g(x_3)$.

Now, by referring to \eqref{eq:EKGL0NiHnsSUQpYAqMJ} and \eqref{eq:Xr9cwUTNraDeWtdvs5U}, we can generalize the process of computing the probability of any $Y$ using the following summation:

$$\begin{equation}\label{eq:HDzS3cB1MNYZ4S2sIej} \mathbb{P}(Y=y_j)= \sum_{\substack{\text{All }x_i\text{ such that }\\g(x_i)=y_j}} \mathbb{P}(X=x_i) \end{equation}$$

Again, the reason why the summation is necessary here is that the function $g(x)$ may not necessarily be one-to-one (e.g. $g(x)=x^2$).

Now, let's go back to our original goal of computing $\mathbb{E}(Y)$. From the definition of expected value, we have that:

$$\begin{equation}\label{eq:nQpvD0Xh9mTv1Ngz6EN} \mathbb{E}(Y) =\sum_{j=1}^2y_j\cdot\mathbb{P}(Y=y_j) \end{equation}$$

Substituting \eqref{eq:HDzS3cB1MNYZ4S2sIej} into \eqref{eq:nQpvD0Xh9mTv1Ngz6EN} gives:

$$\begin{equation}\label{eq:NXSEJ3sYDPVDuFvBakh} \begin{aligned}[b] \mathbb{E}(Y) &=\sum_{j=1}^2y_j\cdot \Big( \sum_{\substack{\text{All }x_i\text{ such that }\\g(x_i)=y_j}} \mathbb{P}(X=x_i) \Big)\\ &= \sum_{j=1}^2 \sum_{\substack{\text{All }x_i\text{ such that }\\g(x_i)=y_j}} y_j\cdot\mathbb{P}(X=x_i)\\ &= y_1\cdot\mathbb{P}(X=-1)+ y_1\cdot\mathbb{P}(X=1)+ y_2\cdot\mathbb{P}(X=2)\\ &= g(x_1)\cdot\mathbb{P}(X=-1)+ g(x_1)\cdot\mathbb{P}(X=1)+ g(x_2)\cdot\mathbb{P}(X=2)\\ &=\sum_{i=1}^3g(x_i)\cdot\mathbb{P}(X=x_i)\\ &=\sum_{x}g(x)\cdot\mathbb{P}(X=x) \end{aligned} \end{equation}$$

Let's generalize this further:

instead of 3 values for random variable $X$, let's make this $m$.
instead of 2 values for random variable $Y$, let's make this $n$.
if $g(x)$ is a one-to-one function, then $m=n$.
if $g(x)$ is a many-to-one function (as in our example), then $m\ge{n}$.
instead of a probability $\mathbb{P}(X=x)$, we use a probability mass (or density) function $p(x)$.

The formal proof simply involves using these general variables - the logic is exactly the same:

$$\begin{align*} \mathbb{E}(Y) &=\sum_{j=1}^ny_j\cdot \Big( \sum_{\substack{\text{All }x_i\text{ such that }\\g(x_i)=y_j}} {p(x_i)} \Big)\\ &= \sum_{j=1}^n \sum_{\substack{\text{All }x_i\text{ such that }\\g(x_i)=y_j}} y_j\cdot{p(x_i)}\\ &=\sum_{i=1}^mg(x_i)\cdot{p(x_i)}\\ &=\sum_{x}g(x)\cdot{p(x)} \end{align*}$$

This completes the proof.

■

Theorem.

Law of total expectation

If $ X$ and $Y$ are random variables, then the Law of Total Expectation states that:

$$\mathbb{E}_{Y}(\mathbb{E}(X|Y))=\mathbb{E}(X)$$

To emphasize that $\mathbb{E}(X|Y)$ is a random quantity based on $Y$, the outer expected value is subscripted with $Y$.

From the definition of expected values, the Law of Total Expectation can also be written as:

$$\begin{align*} \mathbb{E}(X) =\mathbb{E}_{Y}[\mathbb{E}(X|Y)] =\sum_{y}\Big[\mathbb{E}(X|Y)\cdot{p(y)}\Big] \end{align*}$$

Where $p(y)$ is the probability distribution of $Y$.

Proof. We use the definition of expected values rewrite term:

$$\begin{align*} \mathbb{E}_Y[\mathbb{E}(X|Y)]&= \mathbb{E}_Y\Big[\sum_xx\cdot{p(x|Y)}\Big]\\ \end{align*}$$

We again use the definition of expected values:

$$\begin{align*} \mathbb{E}_Y[\mathbb{E}(X|Y)] &=\sum_y\Big[\sum_xx\cdot{p(x|y)}\Big]\cdot{p(y)}\\ \end{align*}$$

We rearrange the summations:

$$\begin{equation}\label{eq:TNDHb0OXFTj2QNpgYRH} \mathbb{E}_Y[\mathbb{E}(X|Y)]=\sum_xx\sum_y{p(x|y)}\cdot{p(y)} \end{equation}$$

We know from the definition of conditional probability that:

$$p(x|y)=\frac{p(x,y)}{p(y)} \;\;\;\;\; \Longleftrightarrow \;\;\;\;\; p(x,y)=p(x|y)\cdot{p(y)}$$

Therefore, \eqref{eq:TNDHb0OXFTj2QNpgYRH} becomes:

$$\begin{equation}\label{eq:u74Q0ZeRpZLHy47b4G1} \mathbb{E}_Y[\mathbb{E}(X|Y)]=\sum_xx\sum_y{p(x,y)} \end{equation}$$

Now, the marginal distribution $p(x)$ is defined as:

$$p(x)=\sum_yp(x,y)$$

Substituting this into \eqref{eq:u74Q0ZeRpZLHy47b4G1} gives:

$$\begin{align*} \mathbb{E}_Y[\mathbb{E}(X|Y)] &=\sum_xx\cdot{p(x)}\\ &=\mathbb{E}(X)\\ \end{align*}$$

This completes the proof.

■

Published by Isshin Inada

Edited by 0 others

Did you find this page useful?

thumb_up

thumb_down

Comment

Citation

Ask a question or leave a feedback...

thumb_up

thumb_down

chat_bubble_outline

settings

Enjoy our search

Hit / to insta-search docs and recipes!