PySpark SQL Functions | last method
Start your free 7-days trial now!
PySpark's SQL function last(~)
method returns the last row of the DataFrame.
Parameters
1. col
| string
or Column
object
The column label or Column
object of interest.
2. ignorenulls
| boolean
| optional
Whether or not to ignore null values. By default, ignorenulls=False
.
Return Value
A PySpark SQL Column object (pyspark.sql.column.Column
).
Examples
Consider the following PySpark DataFrame:
+-----+---+| name|age|+-----+---+| Alex| 15|| Bob| 20||Cathy| 25|+-----+---+
Getting the last value of a PySpark column
To get the last value of the name
column:
Note we can also pass a Column
object instead:
Getting the last non-null value in PySpark column
Consider the following PySpark DataFrame with null values:
+-----+----+| name| age|+-----+----+| Alex| 15|| Bob| 20||Cathy|null|+-----+----+
By default, ignorenulls=False
, which means that the last value is returned regardless of whether it is null
or not:
To return the last non-null value instead, set ignorenulls=True
:
Getting the last value of each group in PySpark
The last(~)
method is also useful in aggregations. Consider the following PySpark DataFrame:
data = [("Alex", "A"), ("Alex", "B"), ("Bob", None), ("Bob", "A"), ("Cathy", "C")]
+-----+-----+| name|class|+-----+-----+| Alex| A|| Alex| B|| Bob| null|| Bob| A||Cathy| C|+-----+-----+
To get the last value of each aggregate:
Here, we are grouping by name
, and then for each of these group, we are obtaining the last value that occurred in the class
column.