PySpark 
 keyboard_arrow_down 147 guides
 chevron_leftPySpark SQL Functions
Method arrayMethod colMethod collect_listMethod collect_setMethod concatMethod concat_wsMethod countMethod count_distinctMethod countDistinctMethod date_addMethod date_formatMethod dayofmonthMethod dayofweekMethod dayofyearMethod element_atMethod explodeMethod exprMethod firstMethod greatestMethod instrMethod isnanMethod lastMethod leastMethod lengthMethod litMethod lowerMethod maxMethod meanMethod minMethod monthMethod regexp_extractMethod regexp_replaceMethod repeatMethod roundMethod splitMethod to_dateMethod translateMethod trimMethod upperMethod whenMethod year
  check_circle
 Mark as learned thumb_up
 2
 thumb_down
 0
 chat_bubble_outline
 0
 Comment  auto_stories Bi-column layout 
 settings
 PySpark SQL Functions | max method
 schedule Aug 12, 2023 
 Last updated  local_offer 
 Tags PySpark
  tocTable of Contents
 expand_more Master the mathematics behind data science with 100+ top-tier guides
Start your free 7-days trial now!
   Start your free 7-days trial now!
PySpark SQL Functions' max(~) method returns the maximum value in the specified column.
Parameters
1. col | string or Column
The column in which to obtain the maximum value.
Return Value
A PySpark Column (pyspark.sql.column.Column).
Examples
Consider the following PySpark DataFrame:
        
        
            
                
                
                    
                
            
            +----+---+|name|age|+----+---+|Alex| 25|| Bob| 30|+----+---+
        
    Getting the maximum value of a PySpark column
To obtain the maximum age:
        
        
    To obtain the maximum age as an integer:
        
        
    Here, the collect() method returns a list of Row objects, which in this case is length one because the PySpark DataFrame returned by select(~) only has one row. The content of the Row object can be accessed via [0].
Getting the maximum value of each group in PySpark
Consider the following PySpark DataFrame:
        
        
            
                
                
                                                ["Bob", 30, "B"],\                            ["Cathy", 50, "A"]],                            ["name", "age", "class"])
                
            
            +-----+---+-----+| name|age|class|+-----+---+-----+| Alex| 20|    A||  Bob| 30|    B||Cathy| 50|    A|+-----+---+-----+
        
    To get the maximum age of each class:
        
        
    Here, we are using the alias(~) method to assign a label to the aggregated age column.
Published by Isshin Inada
 Edited by 0 others
 Did you find this page useful?
 thumb_up
 thumb_down
 Comment
 Citation
  Ask a question or leave a feedback...
 Official PySpark Documentation
                    https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.functions.max.html
                 thumb_up
 2
 thumb_down
 0
 chat_bubble_outline
 0
 settings
 Enjoy our search
 Hit / to insta-search docs and recipes!
 