PySpark DataFrame | orderBy method
Start your free 7-days trial now!
PySpark DataFrame's orderBy(~)
method returns a new DataFrame that is sorted based on the specified columns.
Parameters
1. cols
| string
or list
or Column
| optional
A column or columns by which to sort.
2. ascending
| boolean
or list
of boolean
| optional
If
True
, then the sort will be in ascending order.If
False
, then the sort will be in descending order.If a list of booleans is passed, then sort will respect this order. For example, if
[True,False]
is passed andcols=["colA","colB"]
, then the DataFrame will first be sorted in ascending order ofcolA
, and then in descending order ofcolB
. Note that the second sort will be relevant only when there are duplicate values incolA
.
By default, ascending=True
.
Return Value
A PySpark DataFrame (pyspark.sql.dataframe.DataFrame
).
Examples
Consider the following PySpark DataFrame:
df = spark.createDataFrame([["Alex", 22, 200], ["Bob", 24, 300], ["Cathy", 22, 100]], ["name", "age", "salary"])
+-----+---+------+| name|age|salary|+-----+---+------+| Alex| 22| 200|| Bob| 24| 300||Cathy| 22| 100|+-----+---+------+
Sorting PySpark DataFrame by single column in ascending order
To sort by age
in ascending order:
+-----+---+------+| name|age|salary|+-----+---+------+| Alex| 22| 200||Cathy| 22| 100|| Bob| 24| 300|+-----+---+------+
Sorting PySpark DataFrame by multiple columns in ascending order
To sort by age
, and then by salary
(both by ascending order):
+-----+---+------+| name|age|salary|+-----+---+------+|Cathy| 22| 100|| Alex| 22| 200|| Bob| 24| 300|+-----+---+------+
Sorting PySpark DataFrame by descending order
To sort by descending order, set ascending=False
:
+-----+---+------+| name|age|salary|+-----+---+------+| Bob| 24| 300|| Alex| 22| 200||Cathy| 22| 100|+-----+---+------+