Pandas DataFrame | sort_values method
Start your free 7-days trial now!
Pandas DataFrame.sort_values(~)
method sorts the source DataFrame either by column or row values.
Parameters
1. by
link | string
or list<string>
The name of the column(s) or row to sort.
2. axis
link | string
or int
| optional
Whether to sort by row or column:
Axis | Description |
---|---|
| DataFrame will be sorted by column values. |
| DataFrame will be sorted by row values. |
By default, axis=0
.
3. ascending
link | boolean
or list<booleans>
| optional
Whether to sort in ascending or descending order. By default, ascending=True
.
4. inplace
| boolean
| optional
If
True
, then the source DataFrame will be directly modified, and no new DataFrame will be created.If
False
, then a new DataFrame will be created, and the source DataFrame will be kept intact.
By default, inplace=False
.
5. kind
| string
| optional
The sorting algorithm to use:
Kind | Speed | Worst case | Memory | Stable |
---|---|---|---|---|
quicksort | 1 (fast) |
| 0 | no |
mergesort | 2 |
| ~n/2 | yes |
heapsort | 3 (slow) |
| 0 | no |
By default, kind="quicksort"
.
Sorting algorithms that are "stable" retain the relative ordering of duplicate values. For instance, suppose you are sorting the array [(2,3), (2,1), (4,5)]
by the first element of each tuple. We have a duplicate value of 2 here, and stable sorting algorithms ensure that (2,3)
will always come before (2,1)
since that is how they are ordered originally. Unstable searches provide no guarantee that such ordering is retained.
6. na_position
link | string
| optional
Where to place NaN
values:
Value | Description |
---|---|
| Place |
| Place |
By default, na_position="last"
.
7. ignore_index
link | boolean
| optional
If
True
, then the index of the sorted DataFrame will be0,1,...,n-1
, wheren
is the number of rows of the DataFrame.If
False
, then the index names will be kept as is.
By default, ignore_index=False
.
Return Value
A DataFrame
sorted by row or column values.
Examples
Consider the following DataFrame:
df
A B0 5 c1 3 d2 1 a3 3 a
Sorting by column value
To sort by column A
:
df.sort_values("A")
A B2 1 a1 3 d3 3 a0 5 c
Sorting by multiple columns
To sort by multiple columns, pass in a list of column labels:
df.sort_values(["A","B"])
A B2 1 a3 3 a1 3 d0 5 c
Here, the rows are sorted first by the values in column A
. When there are duplicate values, we sort by the values in column B
. This is the reason why the value "a"
is guaranteed to come before "d"
in this case.
Sorting by row value
Consider the following DataFrame:
df
A Ba 5 1 b 9 3c 4 2
To sort by values in row "b"
:
df.sort_values("b", axis=1)
B Aa 1 5b 3 9c 2 4
Sorting in descending order
Consider the following DataFrame:
df = pd.DataFrame({"A":[5,9,5],"B":[4,3,1]}, index=["a","b","c"])df
A Ba 5 4b 9 3c 5 1
By default, values are sorted in ascending order. To sort in descending order instead, set ascending=False
:
df.sort_values("A", ascending=False)
A Bb 9 3a 5 4c 5 1
Passing an array of booleans
If you're sorting by multiple columns, you can pass a list of booleans to specify how each column should be sorted.
df.sort_values(["A","B"], ascending=[False, True])
A Bb 9 3c 5 1a 5 4
Here, we are sorting by column A
values in descending order, while sorting by column B
in ascending order.
Specifying na_position
Consider the following DataFrame:
df
Aa NaNb 5.0c 4.0
By default, na_position="last"
, which means that NaN
are placed at the end:
df.sort_values(by="A") # na_position="last"
Ac 4.0b 5.0a NaN
To place NaN
in the beginning instead:
df.sort_values(by="A", na_position="first")
Aa NaNc 4.0b 5.0
Specifying ignore_index
Consider the following DataFrame:
df
Aa 5b 9c 2
By default, the index names are kept:
df.sort_values(by="A")
Ac 2a 5b 9
By setting ignore_index=True
, we can reset the index:
df.sort_values(by="A", ignore_index=True)
A0 21 52 9