PySpark SQL Functions | when method
Start your free 7-days trial now!
PySpark SQL Functions' when(~)
method is used to update values of a PySpark DataFrame column to other values based on the given conditions.
The when(~)
method is often used in conjunction with the otherwise(~)
method to implement an if-else logic. See examples below for clarification.
Parameters
1. condition
| Column
| optional
A boolean Column
expression. See examples below for clarification.
2. value
| any
| optional
The value to map to if the condition is true.
Return Value
A PySpark Column (pyspark.sql.column.Column
).
Examples
Consider the following PySpark DataFrame:
+-----+---+| name|age|+-----+---+| Alex| 20|| Bob| 24||Cathy| 22|+-----+---+
Implementing if-else logic using when and otherwise
To rename the name Alex
to Doge
, and others to Eric
:
+-----------------------------------------------+|CASE WHEN (name = Alex) THEN Doge ELSE Eric END|+-----------------------------------------------+| Doge|| Eric|| Eric|+-----------------------------------------------+
Notice how we used the method otherwise(~)
to set values for cases when the conditions are not met.
Case when otherwise method is not used
Note that if you do not include the otherwise(~)
method, then any value that does not fulfil the if condition will be assigned null
:
Specifying multiple conditions
Using pipeline and ampersand operator
We can combine conditions using &
(and) and |
(or) like so:
+----+---+|name|age|+----+---+|Doge| 20||Eric| 24||Eric| 22|+----+---+
Chaining the when method
The when(~)
method can be chained like so:
+----------------------------------------------------------------------------+|CASE WHEN (name = Alex) THEN Doge WHEN (name = Bob) THEN Zebra ELSE Eric END|+----------------------------------------------------------------------------+| Doge|| Zebra|| Eric|+----------------------------------------------------------------------------+
Setting a new value based on original value
To set a new value based on the original value:
Using an alias
By default, the new column label is convoluted:
+-----------------------------------------------+|CASE WHEN (name = Alex) THEN Doge ELSE Eric END|+-----------------------------------------------+| Doge|| Eric|| Eric|+-----------------------------------------------+
To assign a new column, simply use the alias(~)
method: