PySpark Column | substr method
Start your free 7-days trial now!
PySpark Column's substr(~)
method returns a Column
of substrings extracted from string column values.
Parameters
1. startPos
| int
or Column
The starting position. This position is inclusive and non-index, meaning the first character is in position 1. Negative position is allowed here as well - please consult the example below for clarification.
2. length
| int
or Column
The length of the substring to extract.
Return Value
A Column
object.
Examples
Consider the following PySpark DataFrame:
+-----+---+| name|age|+-----+---+| Alex| 20|| Bob| 30||Cathy| 40|+-----+---+
Extracting substrings from column values in PySpark DataFrame
To extract substrings from column values:
Note the following:
the
F.col("name").substr(2,3)
means that we are extracting a substring starting from the 2nd character and up to a length of 3.even if the string is too short (e.g.
"Bob"
), no error will be thrown.alias(~)
method is used to assign a label to our column.
Note that you could also specify a negative starting position like so:
Here, we are starting from the third character from the end (inclusive).