PySpark 
 keyboard_arrow_down 147 guides
 chevron_leftPySpark DataFrame
Method aliasMethod coalesceMethod collectMethod colRegexMethod corrMethod countMethod covMethod describeMethod distinctMethod dropMethod dropDuplicatesMethod dropnaMethod exceptAllMethod fillnaMethod filterMethod foreachMethod groupByMethod headMethod intersectMethod intersectAllMethod joinMethod limitMethod orderByMethod printSchemaMethod randomSplitMethod repartitionMethod replaceMethod sampleMethod sampleByMethod selectMethod selectExprMethod showMethod sortMethod summaryMethod tailMethod takeMethod toDFMethod toJSONMethod toPandasMethod transformMethod unionMethod unionByNameMethod whereMethod withColumnMethod withColumnRenamedProperty columnsProperty dtypesProperty rdd
  check_circle
 Mark as learned thumb_up
 2
 thumb_down
 0
 chat_bubble_outline
 0
 Comment  auto_stories Bi-column layout 
 settings
 PySpark DataFrame | tail method
 schedule Aug 12, 2023 
 Last updated  local_offer 
 Tags PySpark
  tocTable of Contents
 expand_more Master the mathematics behind data science with 100+ top-tier guides
Start your free 7-days trial now!
   Start your free 7-days trial now!
PySpark DataFrame's tail(~) method returns the last num number of rows as a list of Row objects.
WARNING
This method involves transferring data to the application's driver process, and therefore if the specified num  is too large, then an OutOfMemoryError will occur.
Parameters
1. num | int
The number of rows to return.
Return Value
A list of Row objects.
Examples
Consider the following PySpark DataFrame:
        
        
            
                
                
                    columns = ["name", "age"]data = [("Alex", 15), ("Bob", 20), ("Cathy", 25)]
                
            
            +-----+---+| name|age|+-----+---+| Alex| 15||  Bob| 20||Cathy| 25|+-----+---+
        
    Getting the last row of a PySpark DataFrame
To get the last row:
        
        
            
                
                
                    df.tail(num=1)
                
            
            [Row(name='Cathy', age=25)]
        
    Getting the last n rows of a PySpark DataFrame
To get the last two rows:
        
        
            
                
                
                    df.tail(num=2)
                
            
            [Row(name='Bob', age=20), Row(name='Cathy', age=25)]
        
    Published by Isshin Inada
 Edited by 0 others
 Did you find this page useful?
 thumb_up
 thumb_down
 Comment
 Citation
  Ask a question or leave a feedback...
 Official PySpark Documentation
                    https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.DataFrame.tail.html
                 thumb_up
 2
 thumb_down
 0
 chat_bubble_outline
 0
 settings
 Enjoy our search
 Hit / to insta-search docs and recipes!
 