PySpark Column | isNull method
Start your free 7-days trial now!
PySpark Column's isNull()
method identifies rows where the value is null.
Return Value
A PySpark Column (pyspark.sql.column.Column
).
Examples
Consider the following PySpark DataFrame:
+-----+----+| name| age|+-----+----+| Alex| 25|| Bob| 30||Cathy|null|+-----+----+
Identifying rows where certain value is null in PySpark DataFrame
To identify rows where the value for age
is null
:
Getting rows where certain value is null in PySpark DataFrame
To get rows where the value for age
is null
:
Here, the where(~)
method fetches rows that correspond to True
in the boolean column returned by the isNull()
method.
Warning - using equality to compare null values
One common mistake is to use equality to compare null values. For example, consider the following DataFrame:
+-----+----+| name| age|+-----+----+| Alex|25.0|| Bob|30.0||Cathy|null|+-----+----+
Let's get the rows where age
is equal to None
:
Notice how Cathy's row where the age
is null
is not picked up. When comparing null
values, we should always use isNull()
instead.
Null values and NaN are treated differently
Consider the following PySpark DataFrame:
import numpy as np
+-----+----+| name| age|+-----+----+| Alex|25.0|| Bob| NaN||Cathy|null|+-----+----+
Here, the age
column contains both NaN
and null
. In PySpark, NaN
and null
are treated as different entities as demonstrated below:
Here, notice how Bob's row whose age
is NaN
is not picked up. To get rows with NaN
, use the isnan(-)
method like so: