PySpark SQL Functions | regexp_replace method
Start your free 7-days trial now!
PySpark SQL Functions' regexp_replace(~)
method replaces the matched regular expression with the specified string.
Parameters
1. str
| string
or Column
The column whose values will be replaced.
2. pattern
| string
or Regex
The regular expression to be replaced.
3. replacement
| string
The string value to replace pattern
.
Return Value
A new PySpark Column.
Examples
Consider the following PySpark DataFrame:
+----+---+|name|age|+----+---+|Alex| 10||Mile| 30|+----+---+
Replacing a specific substring
To replace the substring 'le'
with 'LE'
, use regexp_replace(~)
:
The second argument is a regular expression, so characters such as $
and [
will carry special meaning. In order to treat these special characters as literal characters, escape them using the \
character (e.g. \$
).
Passing in a Column object
Instead of referring to the column by its name, we can also pass in a Column
object:
Getting a new PySpark DataFrame
We can use the PySpark DataFrame's withColumn(~)
method to obtain a new PySpark DataFrame with the updated column like so:
+----+---+|name|age|+----+---+|ALEx| 10||MiLE| 30|+----+---+
Replacing a specific substring using regular expression
To replace the substring 'le'
that occur only at the end with 'LE'
, use regexp_replace(~)
:
Here, we are using the special regular expression character '$'
that only matches patterns occurring at the end of the string. This is the reason no replacement was done for the 'le'
in Alex
.