search
Search
Login
Unlock 100+ guides
menu
menu
web
search toc
close
Comments
Log in or sign up
Cancel
Post
account_circle
Profile
exit_to_app
Sign out
What does this mean?
Why is this true?
Give me some examples!
search
keyboard_voice
close
Searching Tips
Search for a recipe:
"Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview
icon_star
Doc Search
icon_star
Code Search Beta
SORRY NOTHING FOUND!
mic
Start speaking...
Voice search is only supported in Safari and Chrome.
Navigate to

Pandas DataFrame | rank method

schedule Aug 10, 2023
Last updated
local_offer
PythonPandas
Tags
mode_heat
Master the mathematics behind data science with 100+ top-tier guides
Start your free 7-days trial now!

Pandas DataFrame.rank(~) method computes the ordering of the values for each row or column of the DataFrame.

Parameters

1. axislink · int or string · optional

Whether to compute the ordering row-wise or column-wise:

Axis

Description

0 or "index"

Ordering is computed for each column.

1 or "columns"

Ordering is computed for each row.

By default axis=0.

2. methodlink · string · optional

How to rank duplicate values in a group:

Value

Description

"average"

Return the average of the ranks.

"min"

Return the minimum of the ranks.

"max"

Return the maximum of the ranks.

"first"

Return the ranks based on the ordering in the DataFrame.

"dense"

Similar to "min", but the rank is incremented by one after each group.

Check examples below for clarification. By default, method="average".

3. numeric_only · boolean · optional

If True, ordering is performed only on numeric values. By default, numeric_only=True.

4. na_optionlink · string · optional

How to deal with NaN values:

Value

Description

"keep"

Leave the NaNs intact, and ignore them in the ordering.

"top"

Assign the lowest (1, 2, ...) ordering to the NaNs.

"bottom"

Assign the highest ordering to the NaNs.

By default, na_option="keep".

5. ascendinglink · boolean · optional

  • If True, then the smallest value will have a rank of 1.

  • If False, then the largest value will have a rank of 1.

By default, ascending=False.

6. pctlink · boolean · optional

If True, then rank will be in terms of percentiles instead. By default, pct=False.

Return Value

A DataFrame containing the ordering of the values in the source DataFrame.

Examples

Consider the following DataFrame:

df = pd.DataFrame({"A":[4,5,3,3], "B": ["b","a","c","d"]})
df
A B
0 4 b
1 5 a
2 3 c
3 3 d

Ranking column-wise

To obtain the ordering of the values of each column:

df.rank() # axis=0
A B
0 3.0 2.0
1 4.0 1.0
2 1.5 3.0
3 1.5 4.0

Notice how we have two 1.5 in column A. This is because we had a tie - entries A2 and A3 shared the same value, and so the rank(~) method computed the average of their ranks (method="average" by default), that is, the average of 1 and 2.

Ranking row-wise

Consider the following DataFrame:

df = pd.DataFrame({"A":[3,4],"B":[1,2],"C":[5,6]})
df
A B C
0 3 1 5
1 4 2 6

To rank the values for each row, set axis=1:

df.rank(axis=1)
A B C
0 2.0 1.0 3.0
1 2.0 1.0 3.0

Specifying method

Consider the following DataFrame:

df = pd.DataFrame({"A":[8,6,6,8]})
df
A
0 8
1 6
2 6
3 8

average

By default, method="average", which means that the average rank is computed for duplicate values:

df.rank()
A
0 3.5
1 1.5
2 1.5
3 3.5

max

To use the largest rank of each group:

df.rank(method="max")
A
0 4.0
1 2.0
2 2.0
3 4.0

Here's df again for your reference:

df
A
0 8
1 6
2 6
3 8

min

To use the smallest rank of each group:

df.rank(method="min")
A
0 3.0
1 1.0
2 1.0
3 3.0

first

To use the ordering of the values in the original DataFrame:

df.rank(method="first")
A
0 3.0
1 1.0
2 2.0
3 4.0

Here, notice how the first value 8 is assigned a rank of 3, while the last value 8 is assigned a rank of 4. This is because of their ordering in df, that is, the first 8 is assigned a lower rank since it appears earlier in df.

Here's df again for your reference:

df
A
0 8
1 6
2 6
3 8

dense

This is similar to "min", except that the ranks are incremented by one after each duplicate group:

df.rank(method="dense")
A
0 2.0
1 1.0
2 1.0
3 2.0

To clarify, in the case of "min", the group values 8 were assigned a rank of 3, but for "dense", the rank only gets incremented by 1 after each group, so we end up with a rank of 2 for the next group.

Specifying na_option

Consider the following DataFrame with some missing values:

df = pd.DataFrame({"A":[pd.np.NaN,6,pd.np.NaN,5]})
df
A
0 NaN
1 6.0
2 NaN
3 5.0

By default, na_option="keep", which means that NaNs are ignored during the ranking and kept in the resulting DataFrame:

df.rank() # na_option="keep"
A
0 NaN
1 2.0
2 NaN
3 1.0

To assign the lowest ranks (1, 2, ...) to missing values:

df.rank(na_option="top")
A
0 1.5
1 4.0
2 1.5
3 3.0

Here, you see 1.5 there since we have 2 NaN, and so the average of their ranks (1 and 2) was computed.

To assign the highest ranks to the missing values:

df.rank(na_option="bottom")
A
0 3.5
1 2.0
2 3.5
3 1.0

Ranking in descending order

Consider the same DataFrame we had before:

df = pd.DataFrame({"A":[4,5,3,3], "B":["b","a","c","d"]})
df
A B
0 4 b
1 5 a
2 3 c
3 3 d

To rank in descending order (largest value has a rank of 1), simply set ascending=False:

df.rank(ascending=False)
A B
0 2.0 3.0
1 1.0 4.0
2 3.5 2.0
3 3.5 1.0

Ranking using percentiles

Consider the following DataFrame:

df = pd.DataFrame({"A":[4,5,3,3], "B":["b","a","c","d"]})
df
A B
0 4 b
1 5 a
2 3 c
3 3 d

To rank using percentiles, set pct=True:

df_one.rank(pct=True)
A B
0 0.750 0.50
1 1.000 0.25
2 0.375 0.75
3 0.375 1.00

Ranking by multiple columns

Consider the following DataFrame:

df = pd.DataFrame({"A":[8,9,9], "B":[7,6,5]})
df
A B
0 8 7
1 9 6
2 9 5

To rank by column A while using column B as a tie beaker:

df[["A","B"]].apply(tuple, axis=1).rank()
0 1.0
1 3.0
2 2.0
dtype: float64

Note the following:

  • the first row is assigned a rank of 1 because the its value of A is the lowest.

  • the second row and third rows both have the same value of A. Therefore, we use their value of B as a tie-breaker; since the third row has a larger value of B, it is assigned a rank of 2.

Let's now break down the code. We first use the apply(~) method to combine the two columns into a single column of tuples:

df[["A","B"]].apply(tuple, axis=1)
0 (8, 7)
1 (9, 6)
2 (9, 5)
dtype: object

We then use the rank method like so:

df[["A","B"]].apply(tuple, axis=1).rank()
0 1.0
1 3.0
2 2.0
dtype: float64
robocat
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down
Comment
Citation
Ask a question or leave a feedback...