Pandas DataFrame | rank method
Start your free 7-days trial now!
Pandas DataFrame.rank(~)
method computes the ordering of the values for each row or column of the DataFrame.
Parameters
1. axis
link · int
or string
· optional
Whether to compute the ordering row-wise or column-wise:
Axis | Description |
---|---|
| Ordering is computed for each column. |
| Ordering is computed for each row. |
By default axis=0
.
2. method
link · string
· optional
How to rank duplicate values in a group:
Value | Description |
---|---|
| Return the average of the ranks. |
| Return the minimum of the ranks. |
| Return the maximum of the ranks. |
| Return the ranks based on the ordering in the DataFrame. |
| Similar to |
Check examples below for clarification. By default, method="average"
.
3. numeric_only
· boolean
· optional
If True
, ordering is performed only on numeric values. By default, numeric_only=True
.
4. na_option
link · string
· optional
How to deal with NaN
values:
Value | Description |
---|---|
| Leave the |
| Assign the lowest ( |
| Assign the highest ordering to the |
By default, na_option="keep"
.
5. ascending
link · boolean
· optional
If
True
, then the smallest value will have a rank of 1.If
False
, then the largest value will have a rank of 1.
By default, ascending=False
.
6. pct
link · boolean
· optional
If True
, then rank will be in terms of percentiles instead. By default, pct=False
.
Return Value
A DataFrame
containing the ordering of the values in the source DataFrame.
Examples
Consider the following DataFrame:
df
A B0 4 b1 5 a2 3 c3 3 d
Ranking column-wise
To obtain the ordering of the values of each column:
df.rank() # axis=0
A B 0 3.0 2.01 4.0 1.02 1.5 3.03 1.5 4.0
Notice how we have two 1.5
in column A
. This is because we had a tie - entries A2
and A3
shared the same value, and so the rank(~)
method computed the average of their ranks (method="average"
by default), that is, the average of 1
and 2
.
Ranking row-wise
Consider the following DataFrame:
df
A B C0 3 1 51 4 2 6
To rank the values for each row, set axis=1
:
df.rank(axis=1)
A B C0 2.0 1.0 3.01 2.0 1.0 3.0
Specifying method
Consider the following DataFrame:
df
A0 81 62 63 8
average
By default, method="average"
, which means that the average rank is computed for duplicate values:
df.rank()
A0 3.51 1.52 1.53 3.5
max
To use the largest rank of each group:
df.rank(method="max")
A0 4.01 2.02 2.03 4.0
Here's df
again for your reference:
df
A0 81 62 63 8
min
To use the smallest rank of each group:
df.rank(method="min")
A0 3.01 1.02 1.03 3.0
first
To use the ordering of the values in the original DataFrame:
df.rank(method="first")
A0 3.01 1.02 2.03 4.0
Here, notice how the first value 8
is assigned a rank of 3
, while the last value 8
is assigned a rank of 4
. This is because of their ordering in df
, that is, the first 8
is assigned a lower rank since it appears earlier in df
.
Here's df
again for your reference:
df
A0 81 62 63 8
dense
This is similar to "min"
, except that the ranks are incremented by one after each duplicate group:
df.rank(method="dense")
A0 2.01 1.02 1.03 2.0
To clarify, in the case of "min"
, the group values 8
were assigned a rank of 3, but for "dense"
, the rank only gets incremented by 1 after each group, so we end up with a rank of 2
for the next group.
Specifying na_option
Consider the following DataFrame with some missing values:
df
A0 NaN1 6.02 NaN3 5.0
By default, na_option="keep"
, which means that NaN
s are ignored during the ranking and kept in the resulting DataFrame:
df.rank() # na_option="keep"
A0 NaN1 2.02 NaN3 1.0
To assign the lowest ranks (1
, 2
, ...
) to missing values:
df.rank(na_option="top")
A0 1.51 4.02 1.53 3.0
Here, you see 1.5
there since we have 2 NaN
, and so the average of their ranks (1
and 2
) was computed.
To assign the highest ranks to the missing values:
df.rank(na_option="bottom")
A0 3.51 2.02 3.53 1.0
Ranking in descending order
Consider the same DataFrame we had before:
df
A B0 4 b1 5 a2 3 c3 3 d
To rank in descending order (largest value has a rank of 1), simply set ascending=False
:
df.rank(ascending=False)
A B 0 2.0 3.01 1.0 4.02 3.5 2.03 3.5 1.0
Ranking using percentiles
Consider the following DataFrame:
df
A B0 4 b1 5 a2 3 c3 3 d
To rank using percentiles, set pct=True
:
df_one.rank(pct=True)
A B 0 0.750 0.501 1.000 0.252 0.375 0.753 0.375 1.00
Ranking by multiple columns
Consider the following DataFrame:
df
A B0 8 71 9 62 9 5
To rank by column A
while using column B
as a tie beaker:
0 1.01 3.02 2.0dtype: float64
Note the following:
the first row is assigned a rank of
1
because the its value ofA
is the lowest.the second row and third rows both have the same value of
A
. Therefore, we use their value ofB
as a tie-breaker; since the third row has a larger value ofB
, it is assigned a rank of2
.
Let's now break down the code. We first use the apply(~)
method to combine the two columns into a single column of tuples:
0 (8, 7)1 (9, 6)2 (9, 5)dtype: object
We then use the rank method like so:
0 1.01 3.02 2.0dtype: float64