PySpark DataFrame | orderBy method
Start your free 7-days trial now!
PySpark DataFrame's orderBy(~) method returns a new DataFrame that is sorted based on the specified columns.
Parameters
1. cols | string or list or Column | optional
A column or columns by which to sort.
2. ascending | boolean or list of boolean | optional
If
True, then the sort will be in ascending order.If
False, then the sort will be in descending order.If a list of booleans is passed, then sort will respect this order. For example, if
[True,False]is passed andcols=["colA","colB"], then the DataFrame will first be sorted in ascending order ofcolA, and then in descending order ofcolB. Note that the second sort will be relevant only when there are duplicate values incolA.
By default, ascending=True.
Return Value
A PySpark DataFrame (pyspark.sql.dataframe.DataFrame).
Examples
Consider the following PySpark DataFrame:
df = spark.createDataFrame([["Alex", 22, 200], ["Bob", 24, 300], ["Cathy", 22, 100]], ["name", "age", "salary"])
+-----+---+------+| name|age|salary|+-----+---+------+| Alex| 22| 200|| Bob| 24| 300||Cathy| 22| 100|+-----+---+------+
Sorting PySpark DataFrame by single column in ascending order
To sort by age in ascending order:
+-----+---+------+| name|age|salary|+-----+---+------+| Alex| 22| 200||Cathy| 22| 100|| Bob| 24| 300|+-----+---+------+
Sorting PySpark DataFrame by multiple columns in ascending order
To sort by age, and then by salary (both by ascending order):
+-----+---+------+| name|age|salary|+-----+---+------+|Cathy| 22| 100|| Alex| 22| 200|| Bob| 24| 300|+-----+---+------+
Sorting PySpark DataFrame by descending order
To sort by descending order, set ascending=False:
+-----+---+------+| name|age|salary|+-----+---+------+| Bob| 24| 300|| Alex| 22| 200||Cathy| 22| 100|+-----+---+------+