df = spark.createDataFrame([["André", 20], ["Bob", 30], ["Cathy", 30]], ["name", "age"])
df.show()
                
            
            +-----+---+
| name|age|
+-----+---+
|André| 20|
|  Bob| 30|
|Cathy| 30|
+-----+---+

Converting the first row of PySpark DataFrame into a dictionary

To convert the first row of a PySpark DataFrame into a string-encoded JSON:


        
        
            
                
                
                    df.toJSON().first()
                
            
            '{"name":"André","age":20}'

To convert a string-encoded JSON into a native dict:


        
        
            
                
                
                    import json
json.loads(df.toJSON().first())
                
            
            {'name': 'André', 'age': 20}

Converting PySpark DataFrame into a list of row objects (dictionaries)

To convert a PySpark DataFrame into a list of string-encoded JSON:


        
        
            
                
                
                    df.toJSON().collect()
                
            
            ['{"name":"André","age":20}',
 '{"name":"Bob","age":30}',
 '{"name":"Cathy","age":30}']

To convert a PySpark DataFrame into a list of native dict:


        
        
            
                
                
                    df.toJSON().map(lambda str_json: json.loads(str_json)).collect()
                
            
            [{'name': 'André', 'age': 20},
 {'name': 'Bob', 'age': 30},
 {'name': 'Cathy', 'age': 30}]

Here:

we are using the RDD.map(~) method to apply a custom function on each element of the RDD.
our custom function converts each string-encoded JSON into a dict.

Disabling unicode when converting PySpark DataFrame rows into string JSON

By default, unicode is enabled:


        
        
            
                
                
                    df.toJSON().first()   # use_unicode=True
                
            
            '{"name":"André","age":20}'

To disable unicode, set use_unicode=False:


        
        
            
                
                
                    df.toJSON(use_unicode=False).first()
                
            
            b'{"name":"Andr\xc3\xa9","age":20}'

Published by Isshin Inada

Edited by 0 others

Did you find this page useful?

thumb_up

thumb_down

Comment

Citation

Ask a question or leave a feedback...

Official PySpark Documentation

https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.DataFrame.toJSON.html

thumb_up

thumb_down

chat_bubble_outline

settings

Enjoy our search

Hit / to insta-search docs and recipes!