Pandas DataFrame | to_json method
Start your free 7-days trial now!
Pandas DataFrame.to_json(~)
method either converts a DataFrame to a JSON string, or outputs a JSON file.
Parameters
1. path_or_buf
| string
or file handle
| optional
The path to where you want to save the JSON. By default, the method will return a JSON string without writing to a file.
2. orient
link | string
How you want convert the source DataFrame into a JSON.
For "split"
:
{ "index": [list of index], "columns": [list of labels], "data": [list of values]}
For "records"
:
[{column_label: values}, ... , {column_label: values}]
Here, we call each item a record.
For "index"
:
{ index: column_label: value ... index: ...}
For "columns"
(default):
{ column_label: index: value ... column_label: ...}
For "values"
, we just get a list of values:
[value_one, ..., ]
For "table"
, we get a comprehensive representation of the source DataFrame:
{ "schema": {schema} "data": {data}}
By default, orient="columns"
.
3. date_format
link | string
| optional
Whether or not to convert the dates into epoch milliseconds or iso8601 format. The allowed value are as follows:
"epoch"
: the time passed since1970-01-01
in milliseconds."iso"
: the global standard for date representation.
By default, if orient="table"
, then date_format="iso"
, otherwise "epoch"
.
4. double_precision
| int
| optional
The number of decimal places to store for floating numbers. By default, double_precision=10
.
5. force_ascii
| boolean
| optional
Whether to use ASCII for encoding strings. By default, force_ascii=True
.
6. date_unit
link | string
| optional
The time unit to use. The allowed values are as follows:
"s"
: seconds"ms"
: milliseconds"us"
: microseconds"ns"
: nanoseconds
By default, date_unit="ms"
.
7. default_handler
| callable
| optional
The callback that fires if the JSON conversion is not successful. The callback takes in as argument the source DataFrame and returns a serialisable object (e.g. maps and lists). By default, default_handler=None
.
8. lines
link | boolean
| optional
If orient="records"
, then write out each {column_label:values}
in a new line. Passing in True
when for values other than "records"
will result in an error. By default, lines=False
.
9. compression
| string
| optional
The compression algorithm to use when outputting to a file. The available algorithms are:
"infer", "gzip", "bz2", "zip", "xz"
This is only relevant when we are outputting to a file, that is, the path_or_buf
is specified. By default, compression="infer"
.
10. index
| boolean
| optional
Whether to include the index in the resulting JSON string. This is only relevant if orient
is either "split"
or "table"
. By default, index=True
.
11. indent
link | int
| optional
The number of whitespaces to indent each record.
Return Value
If path_or_buf
is specified, then None
is returned. Otherwise, a string
in JSON format is returned.
Examples
Outputting to a file
Consider the following DataFrame:
df
A Ba 2 4b 3 5
Instead of obtaining a JSON string, we can output the JSON to a file by passing in path_or_buf
like so:
df.to_json(path_or_buf="my_json")
This will create a new file called my_json
in the same directory as your Python script:
{"A":{"a":2,"b":3},"B":{"a":4,"b":5}}
Specifying orient
Consider the following DataFrame:
df
A Ba 2 4b 3 5
Default
By default, orient="columns"
:
df.to_json()
'{"A":{"a":2,"b":3},"B":{"a":4,"b":5}}'
split
df.to_json(orient="split")
'{"columns":["A","B"],"index":["a","b"],"data":[[2,4],[3,5]]}'
records
df.to_json(orient="records")
'[{"A":2,"B":4},{"A":3,"B":5}]'
index
df.to_json(orient="index")
'{"a":{"A":2,"B":4},"b":{"A":3,"B":5}}'
columns
df.to_json(orient="columns")
'{"A":{"a":2,"b":3},"B":{"a":4,"b":5}}'
values
df.to_json(orient="values")
'[[2,4],[3,5]]'
table
df.to_json(orient="table")
'{"schema":{"fields":[{"name":"index","type":"string"}, {"name":"A","type":"integer"}, {"name":"B","type":"integer"}], "primaryKey":["index"], "pandas_version":"0.20.0"} "data":[{"index":"a","A":2,"B":4}, {"index":"b","A":3,"B":5}]}'
Here, we've prettified the output for clarity, but the actual output is all in one line.
Specifying date_format
Consider the following DataFrame:
df
A B0 2020-12-25 2020-12-20
By default, date_format="epoch"
(unless orient="table"
):
df.to_json() # date_format="epoch"
'{"A":{"0":1608854400000},"B":{"0":1608422400000}}'
Here, the large numbers represent the time passed since 1970-01-01
in milliseconds.
On the other hand, the we can change the date formatting to iso8601 by passing in "iso"
like so:
df.to_json(date_format="iso")
'{"A":{"0":"2020-12-25T00:00:00.000Z"},"B":{"0":"2020-12-20T00:00:00.000Z"}}'
Specifying date_unit
Consider the following DataFrame:
df
A B0 2020-12-25 2020-12-20
By default, when date_format="epoch"
, the units will be in milliseconds:
df.to_json() # date_format="epoch"
'{"A":{"0":1608854400000},"B":{"0":1608422400000}}'
Here, the large numbers represent the time passed since 1970-01-01
in milliseconds.
We can change the units to seconds like so:
df.to_json(date_unit="s") # date_format="epoch"
'{"A":{"0":1608854400},"B":{"0":1608422400}}'
Specifying default_handler
Consider the following DataFrame:
df
A Ba 3.000000+4.000000j 4
Here, our df
has a complex number.
By default, when we try to convert df
into a JSON string, we get the following:
df.to_json(orient="records")
'[{"A":{"imag":4.0},"B":4}]'
Notice how the complex number is represented erroneously as "imag"
, which happens because JSON does not know how to internally parse complex numbers. For situations like this when the conversion is improper, we can use the default_handler
parameter to control what is returned:
def my_handler(my_df):
df.to_json(orient="records", default_handler=my_handler)
'[{"A":[{"A":3,"B":4}],"B":4}]'
The handler takes as argument the source DataFrame and returns a serialisable object like a map, Series, DataFrame and so on. Now, instead of the malformed JSON that we had before, we can return another JSON of our liking.
Specifying lines
Consider the following DataFrame:
df
A Ba 2 4b 3 5
When orient="records"
, we can make each item appear in a new line by passing in lines=True
like so:
my_json = df.to_json(orient="records", lines=True)my_json
'{"A":2,"B":4}\n{"A":3,"B":5}'
When we print out our string my_json
, then we see the \n
take effect:
print(my_json)
{"A":2,"B":4}{"A":3,"B":5}
Specifying indent
We can add whitespace indents by passing in the indent
parameter like so:
my_json = df.to_json(orient="records", indent=3)print(my_json)
[ { "A":2, "B":4 }, { "A":3, "B":5 }]
Here, the second line has 3 whitespaces, while the third has 6 whitespaces, and so on.