from pyspark.sql import functions as F
df.filter(~F.col('col1').contains('#')).show()
                
            
            +----+----+
|col1|col2|
+----+----+
|   A|   a|
+----+----+

Here, we are first obtaining a boolean mask using the F.col('col1').contains('#') method:


        
        
            
                
                
                    df.select(F.col('col1').contains('#')).show()
                
            
            +-----------------+
|contains(col1, #)|
+-----------------+
|            false|
|             true|
|             true|
+-----------------+

We then reverse the boolean using the ~ operator:


        
        
            
                
                
                    df.select(~F.col('col1').contains('#')).show()
                
            
            +-----------------------+
|(NOT contains(col1, #))|
+-----------------------+
|                   true|
|                  false|
|                  false|
+-----------------------+

Finally, we use the filter(~) method to extract rows that correspond to True in this boolean mask:


        
        
            
                
                
                    df.filter(~F.col('col1').contains('#')).show()
                
            
            +----+----+
|col1|col2|
+----+----+
|   A|   a|
+----+----+

Using the rlike method to remove rows with values that match some regular expression

Once again, consider the same PySpark DataFrame as above:


        
        
            
                
                
                    df = spark.createDataFrame([['A', 'a'], ['B#A', 'b'], ['C##', 'c']], ['col1', 'col2'])
df.show()
                
            
            +----+----+
|col1|col2|
+----+----+
|   A|   a|
| B#A|   b|
| C##|   c|
+----+----+

To remove rows where some string values match a regular expression, use the rlike(~) method:


        
        
            
                
                
                    df.filter(~F.col('col1').rlike('#$')).show()
                
            
            +----+----+
|col1|col2|
+----+----+
|   A|   a|
| B#A|   b|
+----+----+

Here, the rlike(~) method takes in as argument a regular expression (regex). The $ in regex #$ is a special character that matches the end of the string, that is, #$ matches the character # that occurs at the end of the string.

Note that just like the contains(~) method, rlike(~) also returns a boolean mask:


        
        
            
                
                
                    df.select(~F.col('col1').rlike('#$')).show()
                
            
            +---------------------+
|(NOT RLIKE(col1, #$))|
+---------------------+
|                 true|
|                 true|
|                false|
+---------------------+

NOTE

The rlike(~) method is equivalent to SQL's RLIKE clause.

Using the like method to remove rows that contain string values matching some pattern

Again, consider the same PySpark DataFrame as before:


        
        
            
                
                
                    df = spark.createDataFrame([['A', 'a'], ['B#A', 'b'], ['C##', 'c']], ['col1', 'col2'])
df.show()
                
            
            +----+----+
|col1|col2|
+----+----+
|   A|   a|
| B#A|   b|
| C##|   c|
+----+----+

We could use the like(~) method to remove rows that contain string values matching some patterns:


        
        
            
                
                
                    df.filter(~F.col('col1').like('%#')).show()
                
            
            +----+----+
|col1|col2|
+----+----+
|   A|   a|
| B#A|   b|
+----+----+

Here, the special character % is a wildcard and matches any character. %# therefore matches all strings that end with #.

NOTE

like(~) method is equivalent to SQL's LIKE clause.

Published by Isshin Inada

Edited by 0 others

Did you find this page useful?

thumb_up

thumb_down

Comment

Citation

Ask a question or leave a feedback...

thumb_up

thumb_down

chat_bubble_outline

settings

Enjoy our search

Hit / to insta-search docs and recipes!