search
Search
Login
Unlock 100+ guides
menu
menu
web
search toc
close
Comments
Log in or sign up
Cancel
Post
account_circle
Profile
exit_to_app
Sign out
What does this mean?
Why is this true?
Give me some examples!
search
keyboard_voice
close
Searching Tips
Search for a recipe:
"Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview
icon_star
Doc Search
icon_star
Code Search Beta
SORRY NOTHING FOUND!
mic
Start speaking...
Voice search is only supported in Safari and Chrome.
Navigate to

Pandas DataFrame | to_json method

schedule Aug 12, 2023
Last updated
local_offer
PythonPandas
Tags
mode_heat
Master the mathematics behind data science with 100+ top-tier guides
Start your free 7-days trial now!

Pandas DataFrame.to_json(~) method either converts a DataFrame to a JSON string, or outputs a JSON file.

Parameters

1. path_or_buf | string or file handle | optional

The path to where you want to save the JSON. By default, the method will return a JSON string without writing to a file.

2. orientlink | string

How you want convert the source DataFrame into a JSON.

For "split":

{
   "index": [list of index],
   "columns": [list of labels],
   "data": [list of values]
}

For "records":

[{column_label: values}, ... , {column_label: values}]

Here, we call each item a record.

For "index":

{
   index:
      column_label: value
      ...
   index:
      ...
}

For "columns" (default):

{
   column_label:
      index: value
      ...
   column_label:
      ...
}

For "values", we just get a list of values:

[value_one, ..., ]

For "table", we get a comprehensive representation of the source DataFrame:

{
   "schema": {schema}
   "data": {data}
}

By default, orient="columns".

3. date_formatlink | string | optional

Whether or not to convert the dates into epoch milliseconds or iso8601 format. The allowed value are as follows:

  • "epoch": the time passed since 1970-01-01 in milliseconds.

  • "iso": the global standard for date representation.

By default, if orient="table", then date_format="iso", otherwise "epoch".

4. double_precision | int | optional

The number of decimal places to store for floating numbers. By default, double_precision=10.

5. force_ascii | boolean | optional

Whether to use ASCII for encoding strings. By default, force_ascii=True.

6. date_unitlink | string | optional

The time unit to use. The allowed values are as follows:

  • "s": seconds

  • "ms": milliseconds

  • "us": microseconds

  • "ns": nanoseconds

By default, date_unit="ms".

7. default_handler | callable | optional

The callback that fires if the JSON conversion is not successful. The callback takes in as argument the source DataFrame and returns a serialisable object (e.g. maps and lists). By default, default_handler=None.

8. lineslink | boolean | optional

If orient="records", then write out each {column_label:values} in a new line. Passing in True when for values other than "records" will result in an error. By default, lines=False.

9. compression | string | optional

The compression algorithm to use when outputting to a file. The available algorithms are:

"infer", "gzip", "bz2", "zip", "xz"

This is only relevant when we are outputting to a file, that is, the path_or_buf is specified. By default, compression="infer".

10. index | boolean | optional

Whether to include the index in the resulting JSON string. This is only relevant if orient is either "split" or "table". By default, index=True.

11. indentlink | int | optional

The number of whitespaces to indent each record.

Return Value

If path_or_buf is specified, then None is returned. Otherwise, a string in JSON format is returned.

Examples

Outputting to a file

Consider the following DataFrame:

df = pd.DataFrame({"A":[2,3], "B":[4,5]}, index=["a","b"])
df
   A  B
a  2  4
b  3  5

Instead of obtaining a JSON string, we can output the JSON to a file by passing in path_or_buf like so:

df.to_json(path_or_buf="my_json")

This will create a new file called my_json in the same directory as your Python script:

{"A":{"a":2,"b":3},"B":{"a":4,"b":5}}

Specifying orient

Consider the following DataFrame:

df = pd.DataFrame({"A":[2,3], "B":[4,5]}, index=["a","b"])
df
   A  B
a  2  4
b  3  5

Default

By default, orient="columns":

df.to_json()
'{"A":{"a":2,"b":3},"B":{"a":4,"b":5}}'

split

df.to_json(orient="split")
'{"columns":["A","B"],"index":["a","b"],"data":[[2,4],[3,5]]}'

records

df.to_json(orient="records")
'[{"A":2,"B":4},{"A":3,"B":5}]'

index

df.to_json(orient="index")
'{"a":{"A":2,"B":4},"b":{"A":3,"B":5}}'

columns

df.to_json(orient="columns")
'{"A":{"a":2,"b":3},"B":{"a":4,"b":5}}'

values

df.to_json(orient="values")
'[[2,4],[3,5]]'

table

df.to_json(orient="table")
'{"schema":{"fields":[{"name":"index","type":"string"},
                      {"name":"A","type":"integer"},
                      {"name":"B","type":"integer"}],
            "primaryKey":["index"],
            "pandas_version":"0.20.0"}
  "data":[{"index":"a","A":2,"B":4},
          {"index":"b","A":3,"B":5}]}'

Here, we've prettified the output for clarity, but the actual output is all in one line.

Specifying date_format

Consider the following DataFrame:

df = pd.DataFrame({"A":["2020-12-25"], "B":["2020-12-20"]}, dtype="datetime64[ns]")
df
   A           B
0  2020-12-25  2020-12-20

By default, date_format="epoch" (unless orient="table"):

df.to_json()    # date_format="epoch"
'{"A":{"0":1608854400000},"B":{"0":1608422400000}}'

Here, the large numbers represent the time passed since 1970-01-01 in milliseconds.

On the other hand, the we can change the date formatting to iso8601 by passing in "iso" like so:

df.to_json(date_format="iso")
'{"A":{"0":"2020-12-25T00:00:00.000Z"},"B":{"0":"2020-12-20T00:00:00.000Z"}}'

Specifying date_unit

Consider the following DataFrame:

df = pd.DataFrame({"A":["2020-12-25"], "B":["2020-12-20"]}, dtype="datetime64[ns]")
df
   A           B
0  2020-12-25  2020-12-20

By default, when date_format="epoch", the units will be in milliseconds:

df.to_json()    # date_format="epoch"
'{"A":{"0":1608854400000},"B":{"0":1608422400000}}'

Here, the large numbers represent the time passed since 1970-01-01 in milliseconds.

We can change the units to seconds like so:

df.to_json(date_unit="s")    # date_format="epoch"
'{"A":{"0":1608854400},"B":{"0":1608422400}}'

Specifying default_handler

Consider the following DataFrame:

df = pd.DataFrame({"A":[complex(3,4)], "B":[4]}, index=["a"])
df
   A                   B
a  3.000000+4.000000j  4

Here, our df has a complex number.

By default, when we try to convert df into a JSON string, we get the following:

df.to_json(orient="records")
'[{"A":{"imag":4.0},"B":4}]'

Notice how the complex number is represented erroneously as "imag", which happens because JSON does not know how to internally parse complex numbers. For situations like this when the conversion is improper, we can use the default_handler parameter to control what is returned:

def my_handler(my_df):
    return pd.DataFrame({"A":[3], "B":[4]}, index=["a"])

df.to_json(orient="records", default_handler=my_handler)
'[{"A":[{"A":3,"B":4}],"B":4}]'

The handler takes as argument the source DataFrame and returns a serialisable object like a map, Series, DataFrame and so on. Now, instead of the malformed JSON that we had before, we can return another JSON of our liking.

Specifying lines

Consider the following DataFrame:

df = pd.DataFrame({"A":[2,3], "B":[4,5]}, index=["a","b"])
df
   A  B
a  2  4
b  3  5

When orient="records", we can make each item appear in a new line by passing in lines=True like so:

my_json = df.to_json(orient="records", lines=True)
my_json
'{"A":2,"B":4}\n{"A":3,"B":5}'

When we print out our string my_json, then we see the \n take effect:

print(my_json)
{"A":2,"B":4}
{"A":3,"B":5}

Specifying indent

We can add whitespace indents by passing in the indent parameter like so:

my_json = df.to_json(orient="records", indent=3)
print(my_json)
[
   {
      "A":2,
      "B":4
   },
   {
      "A":3,
      "B":5
   }
]

Here, the second line has 3 whitespaces, while the third has 6 whitespaces, and so on.

robocat
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down
Comment
Citation
Ask a question or leave a feedback...