PySpark DataFrame | colRegex method
Start your free 7-days trial now!
PySpark DataFrame's colRegex(~)
method returns a Column
object whose label match the specified regular expression. This method also allows multiple columns to be selected.
Parameters
1. colName
| string
The regex to match the label of the columns.
Return Value
A PySpark Column.
Examples
Selecting columns using regular expression in PySpark
Consider the following PySpark DataFrame:
+-----+----+| col1|col2|+-----+----+| Alex| 20|| Bob| 30||Cathy| 40|+-----+----+
To select columns using regular expression, use the colRegex(~)
method:
Here, note the following:
we wrapped the column label using backticks
`
- this is required otherwise PySpark will throw an error.the regular expression
col[123]
matches columns with labelcol1
,col2
orcol3
.the
select(~)
method is used to convert theColumn
object into a PySpark DataFrame.
Getting column labels that match regular expression as list of strings in PySpark
To get column labels as a list of strings instead of PySpark Column
objects:
Here, we are using the columns
property of the PySpark DataFrame returned by select(~)
.