Pandas DataFrame | corr method
Start your free 7-days trial now!
Pandas DataFrame.corr(~)
method computes pair-wise correlation of the columns in the source DataFrame.
All NaN
values are ignored.
Parameters
1. method
| string
or callable
| optional
The type of correlation coefficient to compute:
Value | Description |
---|---|
| Compute the standard correlation coefficient. |
| Compute the Kendall Tau correlation coefficient. |
| Compute the Spearman rank correlation. |
| A function that takes in as argument two 1D Numpy arrays and returns a single float. The matrix that is returned will always be symmetric and have 1 filled along the main diagonal. |
By default, method="pearson"
.
2. min_periods
link | int
| optional
The minimum number of non-NaN
values required to compute the correlation.
Return Value
A DataFrame
that represents the correlation matrix of the values in the source DataFrame.
Examples
Basic usage
Consider the following DataFrame:
df
A B0 8 31 5 42 2 53 1 9
To compute the "pearson"
correlation of two columns:
df.corr()
A BA 1.000000 -0.841685B -0.841685 1.000000
We get the result that columns A
and B
have a correlation of -0.84
.
Specifying min_periods
Consider the following DataFrame:
df
A B0 3.0 5.01 NaN 6.02 4.0 NaN
Setting min_periods=3
yields:
df.corr(min_periods=3)
A BA NaN NaNB NaN NaN
Here, the reason why we get all NaN
is that, the method ignores NaN
and so each column only has 2 values. Since we've set the minimum threshold to compute the correlation to be 3
, we end up with a DataFrame filled with NaN
.