search
Search
Login
Unlock 100+ guides
menu
menu
web
search toc
close
Comments
Log in or sign up
Cancel
Post
account_circle
Profile
exit_to_app
Sign out
What does this mean?
Why is this true?
Give me some examples!
search
keyboard_voice
close
Searching Tips
Search for a recipe:
"Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview
icon_star
Doc Search
icon_star
Code Search Beta
SORRY NOTHING FOUND!
mic
Start speaking...
Voice search is only supported in Safari and Chrome.
Navigate to

Pandas DataFrame | corr method

schedule Aug 12, 2023
Last updated
local_offer
PythonPandas
Tags
mode_heat
Master the mathematics behind data science with 100+ top-tier guides
Start your free 7-days trial now!

Pandas DataFrame.corr(~) method computes pair-wise correlation of the columns in the source DataFrame.

NOTE

All NaN values are ignored.

Parameters

1. method | string or callable | optional

The type of correlation coefficient to compute:

Value

Description

"pearson"

Compute the standard correlation coefficient.

"kendall"

Compute the Kendall Tau correlation coefficient.

"spearman"

Compute the Spearman rank correlation.

callable

A function that takes in as argument two 1D Numpy arrays and returns a single float. The matrix that is returned will always be symmetric and have 1 filled along the main diagonal.

By default, method="pearson".

2. min_periodslink | int | optional

The minimum number of non-NaN values required to compute the correlation.

Return Value

A DataFrame that represents the correlation matrix of the values in the source DataFrame.

Examples

Basic usage

Consider the following DataFrame:

df = pd.DataFrame({"A":[8,5,2,1],"B":[3,4,5,9]})
df
   A  B
0  8  3
1  5  4
2  2  5
3  1  9

To compute the "pearson" correlation of two columns:

df.corr()
   A          B
A  1.000000   -0.841685
B  -0.841685  1.000000

We get the result that columns A and B have a correlation of -0.84.

Specifying min_periods

Consider the following DataFrame:

df = pd.DataFrame({"A":[3,np.NaN,4],"B":[5,6,np.NaN]})
df
   A    B
0  3.0  5.0
1  NaN  6.0
2  4.0  NaN

Setting min_periods=3 yields:

df.corr(min_periods=3)
   A    B
A  NaN  NaN
B  NaN  NaN

Here, the reason why we get all NaN is that, the method ignores NaN and so each column only has 2 values. Since we've set the minimum threshold to compute the correlation to be 3, we end up with a DataFrame filled with NaN.

robocat
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down
Comment
Citation
Ask a question or leave a feedback...
thumb_up
1
thumb_down
1
chat_bubble_outline
0
settings
Enjoy our search
Hit / to insta-search docs and recipes!