Pandas DataFrame | to_csv method
Start your free 7-days trial now!
Pandas DataFrame.to_csv(~)
method converts the source DataFrame into comma-separated value format.
Parameters
1. path_or_buf
| string
or file handle
| optional
The path to write the csv. By default, the csv is returned as a string.
2. sep
| string
of length one | optional
The separator to use. By default, sep=","
.
3. na_rep
| string
| optional
The value to replace NaN
in the source DataFrame.
4. float_format
link | string
| optional
The format string for floats.
5. columns
link | sequence
| optional
The label of the columns to include. All other columns will be excluded from the resulting csv. By default, all columns are included in the resulting csv.
6. header
link | boolean
or list
of string
| optional
Whether or not to include the column labels in the csv. If a list of strings is passed, then those strings will overwrite the existing column labels. By default, header=True
.
7. index
| boolean
| optional
Whether or not to include row labels in the csv. By default, index=True
.
8. index_label
link | string
or sequence
or False
or None
| optional
The column labels to use. By default, index_label=None
.
9. mode
| string
| optional
The mode to open the file. By default, mode="w"
, which stands for write mode.
10. encoding
| string
| optional
The encoding to use when writing to a file. By default, encoding="utf-8"
.
11. compression
| string
or dict
| optional
The compression algorithm to use. The allowed values are as follows:
"infer" "gzip" "bz2" "zip" "xz"
By default, compression="infer"
, which means that if a path is supplied for path_or_buf
, then the compression algorithm will be inferred from the extension you appended. For instance, if path_or_buf
is "my_data.zip"
, then the "zip"
compression will be used. If an extension is not supplied, then no compression will take place.
12. quoting
| optional
By default, quoting=csv.QUOTE_MINIMAL
.
13. quotechar
link | string
of length one | optional
When a value happens to contain a delimiter, then the value will erroneously be divided up. You can wrap your value using quotechar
to prevent such unwanted splits from happening. By default, '"'
.
14. line_terminator
link | string
| optional
The character used to indicate a line break. Defaults to os.linesep
.
15. chunksize
| int
or None
| optional
The number of rows to write at one time. If your DataFrame is large, using a large chunksize (e.g. 10000
) may lead to better performance. By default, chunksize=None
.
16. date_format
| string
| optional
The format string for datetime objects.
17. doublequote
link | boolean
| optional
Whether or not to parse quotechar
. By default, doublequote=True
.
18. escapechar
link | string
| optional
The character to escape the double quotation marks. By default, escapechar=None
.
19. decimal
| string
| optional
The character denoting a decimal point. By default, decimal="."
. This parameter exists since some European countries like France use a ,
to denote a decimal point instead.
Return Value
If path_or_buf
is specified, then None
is returned. Otherwise, a string is returned.
Examples
Writing csv to a file
Consider the following DataFrame:
df
A Ba 3 5b 4 6
To save our df
as a CSV file:
df.to_csv(path_or_buf="my_data")
The outputted file my_data
, which is located in the same directory as the Python script, is as follows:
,A,Ba,3,5b,4,6
Writing csv as a string
Consider the following DataFrame:
df
A Ba 3 5b 4 6
To save our df
as a csv string, don't specify path_or_buf
:
df.to_csv()
',A,B\na,3,5\nb,4,6\n'
Note that printing this out will render the \n
to take effect:
,A,Ba,3,5b,4,6
Specifying float_format
Consider the following DataFrame that contains floating point numbers:
df
A Ba 3.00005 5b 4.00000 6
In order to format the floating point numbers (e.g. how many decimal places to include), we use the float_format
parameter. The syntax follows that of Python's standard string formatter, which we cover here in detail.
As an example, to include up to 3 decimal places:
df.to_csv(float_format="%.3f")
',A,B\na,3.000,5\nb,4.000,6\n'
Specifying columns
Consider the following DataFrame:
df
A Ba 3 5b 4 6
By default, all columns are included in the resulting csv. To include only specific columns, specify their column labels like so:
,Aa,3b,4
Specifying header
Consider the following DataFrame:
df
A Ba 3 5b 4 6
By default, header=True
, which means that the header is include in the resulting csv:
,A,Ba,3,5b,4,6
To exclude the headers, set header=False
like so:
a,3,5b,4,6
We can also pass a list of new column labels like so:
,C,Da,3,5b,4,6
Specifying index_label
Consider the following DataFrame:
df
A Ba 3 5b 4 6
By default, index_label=None
, which means that an empty index label will be included:
,A,Ba,3,5b,4,6
Notice how we begin with a comma here - the index label is empty, but it is still included.
To remove the index names, set index_label=False
like so:
A,Ba,3,5b,4,6
Some statistical software like R may find this format easier to parse.
Specifying quotechar
Consider the following DataFrame:
df
A Ba 3,9 5b 4 6
Here, notice how one of our values is "3,9"
, which unfortunately contains the default separator ,
. If we were to turn this into a csv, we would end up with 3,9,5
in the first row, which is incorrect since it indicates that we have 3 values in this row instead of 2.
In order to indicate that 3,9
is one single value, the to_csv(~)
method wraps a quotation mark ("
) around it by default:
df.to_csv(sep=",")
',A,B\na,"3,9",5\nb,4,6\n'
Notice how we have "3,9"
now.
Instead of "
, we can specify a single character as the wrapper by passing in quotechar
like so:
df.to_csv(quotechar="@")
',A,B\na,@3,9@,5\nb,4,6\n'
Specifying line_terminator
Consider the following DataFrame:
df
A Ba 3 5b 4 6
By default, each row is split using a new line character (\n
):
df.to_csv()
',A,B\na,3,5\nb,4,6\n'
We can use a custom character to split the rows by passing in line_terminator
:
df.to_csv(line_terminator="@")
',A,B@a,3,5@b,4,6@'
Specifying doublequote and escapechar
Consider the following DataFrame:
df
A Ba 3"9 5b 4 6
Notice how df
contains a value that has a single double quotation mark "
.
By default, doublequote=True
, which means that the quotechar
is duplicated like so:
,A,Ba,"3""9",5b,4,6
Here, note the following:
originally the output was
"3"9"
, where the outer""
is there to indicate that the value is a string.the problem here is that, the value actually contains
"
so this results in syntax error as"3"9"
is not a valid string.to overcome this problem, the method escapes the middle
"
using thequotechar
, which is (confusingly)"
by default.as a result, we end up with this peculiar-looking
"3""9"
.
We can turn off this behaviour where strings are wrapped in ""
by setting doublequote=False
, and then providing the escapechar
parameter:
,A,Ba,3@"9,5b,4,6
Here, the escapechar
is needed since "39
alone is invalid syntax - "
denotes a start of a string, and there is no closing "
. The role of escapechar
is to indicate that "
is not at all related to a string.