PySpark SQL Functions | expr method
Start your free 7-days trial now!
PySpark SQL Functions' expr(~)
method parses the given SQL expression.
Parameters
1. str
| string
The SQL expression to parse.
Return Value
A PySpark Column.
Examples
Consider the following PySpark DataFrame:
+----+---+|name|age|+----+---+|Alex| 30|| Bob| 50|+----+---+
Using the expr method to convert column values to uppercase
The expr(~)
method takes in as argument a SQL expression, so we can use SQL functions such as upper(~)
:
The expr(~)
method can often be more succinctly written using PySpark DataFrame's selectExpr(~)
method. For instance, the above case can be rewritten as:
+-----------+|upper(name)|+-----------+| ALEX|| BOB|+-----------+
I recommend that you use selectExpr(~)
whenever possible because:
you won't have to import the SQL functions library (
pyspark.sql.functions
).syntax is shorter
Parsing complex SQL expressions using expr method
Here's a more complex SQL expression using clauses like AND
and LIKE
:
Note the following:
we are checking for rows where
age
is larger than40
andname
starts withB
.we are assigning the label
'result'
to theColumn
returned byexpr(~)
using thealias(~)
method.
Practical applications of boolean masks returned by expr method
As we can see in the above example, the expr(~)
method can return a boolean mask depending on the SQL expression you supply:
This allows us to check for the existence of rows that satisfy a given condition using any(~)
:
Here, we get True
because there exists at least one True
value in the boolean mask.
Mapping column values using expr method
We can map column values using CASE WHEN
in the expr(~)
method like so:
+----+---+------+|name|age|status|+----+---+------+|Alex| 30|JUNIOR|| Bob| 50|SENIOR|+----+---+------+
Here, note the following:
we are using the DataFrame's
withColumn(~)
method to obtain a new PySpark DataFrame that includes the column returned byexpr(~)
.