PySpark DataFrame | sort method
Start your free 7-days trial now!
PySpark DataFrame's sort(~)
method returns a new DataFrame with the rows sorted based on the specified columns.
Parameters
1. cols
| string
or list
or Column
The columns by which to sort the rows.
2. ascending
| boolean
or list
| optional
Whether to sort in ascending or descending order. By default, ascending=True
.
Return Value
A PySpark DataFrame.
Examples
Consider the following PySpark DataFrame:
+-----+---+| name|age|+-----+---+| Alex| 30|| Bob| 20||Cathy| 20|+-----+---+
Sorting a PySpark DataFrame in ascending order by a single column
To sort our PySpark DataFrame by the age
column in ascending order:
+-----+---+| name|age|+-----+---+|Cathy| 20|| Bob| 20|| Alex| 30|+-----+---+
We could also use sql.functions
to refer to the column:
import pyspark.sql.functions as F
+-----+---+| name|age|+-----+---+|Cathy| 20|| Bob| 20|| Alex| 30|+-----+---+
Sorting a PySpark DataFrame in descending order by a single column
To sort a PySpark DataFrame by the age
column in descending order:
+-----+---+| name|age|+-----+---+| Alex| 30|| Bob| 20||Cathy| 20|+-----+---+
Sorting a PySpark DataFrame by multiple columns
To sort a PySpark DataFrame by the age
column first, and then by the name
column both in ascending order:
+-----+---+| name|age|+-----+---+| Bob| 20||Cathy| 20|| Alex| 30|+-----+---+
Here, Bob
and Cathy
appear before Alex
because their age (20
) is smaller. Bob
then comes before Cathy
because B
comes before C
.
We can also pass a list of booleans to specify the desired ordering of each column:
+-----+---+| name|age|+-----+---+|Cathy| 20|| Bob| 20|| Alex| 30|+-----+---+
Here, we are first sorting by age
in ascending order, and then by name
in descending order.