PySpark DataFrame | sort method
Start your free 7-days trial now!
PySpark DataFrame's sort(~) method returns a new DataFrame with the rows sorted based on the specified columns.
Parameters
1. cols | string or list or Column
The columns by which to sort the rows.
2. ascending | boolean or list | optional
Whether to sort in ascending or descending order. By default, ascending=True.
Return Value
A PySpark DataFrame.
Examples
Consider the following PySpark DataFrame:
        
        
            
                
                
                    
                
            
            +-----+---+| name|age|+-----+---+| Alex| 30||  Bob| 20||Cathy| 20|+-----+---+
        
    Sorting a PySpark DataFrame in ascending order by a single column
To sort our PySpark DataFrame by the age column in ascending order:
        
        
            
                
                
                    
                
            
            +-----+---+| name|age|+-----+---+|Cathy| 20||  Bob| 20|| Alex| 30|+-----+---+
        
    We could also use sql.functions to refer to the column:
        
        
            
                
                
                    import pyspark.sql.functions as F
                
            
            +-----+---+| name|age|+-----+---+|Cathy| 20||  Bob| 20|| Alex| 30|+-----+---+
        
    Sorting a PySpark DataFrame in descending order by a single column
To sort a PySpark DataFrame by the age column in descending order:
        
        
            
                
                
                    
                
            
            +-----+---+| name|age|+-----+---+| Alex| 30||  Bob| 20||Cathy| 20|+-----+---+
        
    Sorting a PySpark DataFrame by multiple columns
To sort a PySpark DataFrame by the age column first, and then by the name column both in ascending order:
        
        
            
                
                
                    
                
            
            +-----+---+| name|age|+-----+---+|  Bob| 20||Cathy| 20|| Alex| 30|+-----+---+
        
    Here, Bob and Cathy appear before Alex because their age (20) is smaller. Bob then comes before Cathy because B comes before C.
We can also pass a list of booleans to specify the desired ordering of each column:
        
        
            
                
                
                    
                
            
            +-----+---+| name|age|+-----+---+|Cathy| 20||  Bob| 20|| Alex| 30|+-----+---+
        
    Here, we are first sorting by age in ascending order, and then by name in descending order.
