PySpark DataFrame | toJSON method
Start your free 7-days trial now!
PySpark DataFrame's toJSON(~)
method converts the DataFrame into a string-typed RDD. When the RDD data is extracted, each row of the DataFrame will be converted into a string JSON. Consult the examples below for clarification.
Parameters
1. use_unicode
| boolean
Whether to use unicode during the conversion. By default, use_unicode=True
.
Return Value
A MapPartitionsRDD
object.
Examples
Consider the following PySpark DataFrame:
+-----+---+| name|age|+-----+---+|André| 20|| Bob| 30||Cathy| 30|+-----+---+
Converting the first row of PySpark DataFrame into a dictionary
To convert the first row of a PySpark DataFrame into a string-encoded JSON:
df.toJSON().first()
'{"name":"André","age":20}'
To convert a string-encoded JSON into a native dict
:
import jsonjson.loads(df.toJSON().first())
{'name': 'André', 'age': 20}
Converting PySpark DataFrame into a list of row objects (dictionaries)
To convert a PySpark DataFrame into a list of string-encoded JSON:
['{"name":"André","age":20}', '{"name":"Bob","age":30}', '{"name":"Cathy","age":30}']
To convert a PySpark DataFrame into a list of native dict
:
Here:
we are using the
RDD.map(~)
method to apply a custom function on each element of the RDD.our custom function converts each string-encoded JSON into a
dict
.
Disabling unicode when converting PySpark DataFrame rows into string JSON
By default, unicode is enabled:
'{"name":"André","age":20}'
To disable unicode, set use_unicode=False
:
b'{"name":"Andr\xc3\xa9","age":20}'