PySpark DataFrame | colRegex method
Start your free 7-days trial now!
PySpark DataFrame's colRegex(~) method returns a Column object whose label match the specified regular expression. This method also allows multiple columns to be selected.
Parameters
1. colName | string
The regex to match the label of the columns.
Return Value
A PySpark Column.
Examples
Selecting columns using regular expression in PySpark
Consider the following PySpark DataFrame:
+-----+----+| col1|col2|+-----+----+| Alex| 20|| Bob| 30||Cathy| 40|+-----+----+
To select columns using regular expression, use the colRegex(~) method:
Here, note the following:
we wrapped the column label using backticks
`- this is required otherwise PySpark will throw an error.the regular expression
col[123]matches columns with labelcol1,col2orcol3.the
select(~)method is used to convert theColumnobject into a PySpark DataFrame.
Getting column labels that match regular expression as list of strings in PySpark
To get column labels as a list of strings instead of PySpark Column objects:
Here, we are using the columns property of the PySpark DataFrame returned by select(~).