Pandas DataFrame | corr method
Start your free 7-days trial now!
Pandas DataFrame.corr(~) method computes pair-wise correlation of the columns in the source DataFrame.
All NaN values are ignored.
Parameters
1. method | string or callable | optional
The type of correlation coefficient to compute:
Value | Description |
|---|---|
| Compute the standard correlation coefficient. |
| Compute the Kendall Tau correlation coefficient. |
| Compute the Spearman rank correlation. |
| A function that takes in as argument two 1D Numpy arrays and returns a single float. The matrix that is returned will always be symmetric and have 1 filled along the main diagonal. |
By default, method="pearson".
2. min_periodslink | int | optional
The minimum number of non-NaN values required to compute the correlation.
Return Value
A DataFrame that represents the correlation matrix of the values in the source DataFrame.
Examples
Basic usage
Consider the following DataFrame:
df
A B0 8 31 5 42 2 53 1 9
To compute the "pearson" correlation of two columns:
df.corr()
A BA 1.000000 -0.841685B -0.841685 1.000000
We get the result that columns A and B have a correlation of -0.84.
Specifying min_periods
Consider the following DataFrame:
df
A B0 3.0 5.01 NaN 6.02 4.0 NaN
Setting min_periods=3 yields:
df.corr(min_periods=3)
A BA NaN NaNB NaN NaN
Here, the reason why we get all NaN is that, the method ignores NaN and so each column only has 2 values. Since we've set the minimum threshold to compute the correlation to be 3, we end up with a DataFrame filled with NaN.