Pandas DataFrame | cov method
Start your free 7-days trial now!
Pandas DataFrame.cov(~)
method computes the covariance matrix of the columns in the source DataFrame. Note that the unbiased estimator of the covariance is used:
Where,
$N$ is the number of values in a column
$\bar{x}$ is the sample mean of column $\mathbf{x}$
$\bar{y}$ is the sample mean of column $\mathbf{y}$
$x_i$ and $y_i$ are the $i$th value in the column $\mathbf{x}$ and $\mathbf{y}$ respectively.
All NaN
values are ignored.
Parameters
1. min_periods
link | int
| optional
The minimum number of non-NaN
values to compute the covariance.
Return Value
A DataFrame
that represents the covariance matrix of the values in the source DataFrame.
Examples
Basic usage
Consider the following DataFrame:
df
A B0 2 31 4 42 6 5
To compute the covariance of two columns:
df.cov()
A BA 4.0 2.0B 2.0 1.0
Here, we get the following results:
the sample covariance of columns
A
andB
is2.0
.the sample variance of column
A
is4.0
and that of columnB
is1.0
.
Specifying min_periods
Consider the following DataFrame with some missing values:
df
A B0 3.0 5.01 NaN 6.02 4.0 7.0
Setting min_periods=3
yields:
df.cov(min_periods=3)
A BA NaN NaNB NaN 1.0
Here, the reason why we get NaN
is that, since the method ignores NaN
, column A
only has 2 values. Since we've set the minimum threshold to compute the covariance to be 3
, we end up with a DataFrame filled with NaN
.