PySpark DataFrame | sort method
Start your free 7-days trial now!
PySpark DataFrame's sort(~) method returns a new DataFrame with the rows sorted based on the specified columns.
Parameters
1. cols | string or list or Column
The columns by which to sort the rows.
2. ascending | boolean or list | optional
Whether to sort in ascending or descending order. By default, ascending=True.
Return Value
A PySpark DataFrame.
Examples
Consider the following PySpark DataFrame:
+-----+---+| name|age|+-----+---+| Alex| 30|| Bob| 20||Cathy| 20|+-----+---+
Sorting a PySpark DataFrame in ascending order by a single column
To sort our PySpark DataFrame by the age column in ascending order:
+-----+---+| name|age|+-----+---+|Cathy| 20|| Bob| 20|| Alex| 30|+-----+---+
We could also use sql.functions to refer to the column:
import pyspark.sql.functions as F
+-----+---+| name|age|+-----+---+|Cathy| 20|| Bob| 20|| Alex| 30|+-----+---+
Sorting a PySpark DataFrame in descending order by a single column
To sort a PySpark DataFrame by the age column in descending order:
+-----+---+| name|age|+-----+---+| Alex| 30|| Bob| 20||Cathy| 20|+-----+---+
Sorting a PySpark DataFrame by multiple columns
To sort a PySpark DataFrame by the age column first, and then by the name column both in ascending order:
+-----+---+| name|age|+-----+---+| Bob| 20||Cathy| 20|| Alex| 30|+-----+---+
Here, Bob and Cathy appear before Alex because their age (20) is smaller. Bob then comes before Cathy because B comes before C.
We can also pass a list of booleans to specify the desired ordering of each column:
+-----+---+| name|age|+-----+---+|Cathy| 20|| Bob| 20|| Alex| 30|+-----+---+
Here, we are first sorting by age in ascending order, and then by name in descending order.