Pandas DataFrame | sum method
Start your free 7-days trial now!
Pandas DataFrame.sum(~)
method computes the sum for each row or column of the source DataFrame.
Parameters
1. axis
link | int
or string
| optional
Whether to compute the sum row-wise or column-wise:
Axis | Description |
---|---|
| Sum is computed for each column. |
| Sum is computed for each row. |
By default, axis=0
.
2. skipna
link | boolean
| optional
Whether or not to ignore missing values (NaN
). By default, skipna=True
.
3. level
| string
or int
| optional
The name or the integer index of the level to consider for summation. This is relevant only if your DataFrame is Multi-index.
4. numeric_only
link | None
or boolean
| optional
The allowed values are as follows:
Value | Description |
---|---|
| Only numeric rows/columns will be considered (e.g. |
| Attempt computation with all types (e.g. strings and dates), and throw an error whenever the summation is invalid. |
| Attempt computation with all types, and ignore all rows/columns that do not allow for summation without raising an error. |
For summation to be valid, the +
operator must be well-defined between the types.
By default, numeric_only=None
.
5. min_count
| int
| optional
The minimum number of values that must be present to perform summation. If there are fewer than min_count
values (excluding NaN
), then NaN
will be returned. By default, no minimum is set.
Return Value
If the level
parameter is specified, then a DataFrame
will be returned. Otherwise, a Series
will be returned.
Examples
Consider the following DataFrame:
df
A B0 2 41 3 5
Column-wise summation
To compute the sum for each column:
df.sum() # axis=0
A 5B 9dtype: int64
Here, the return type is Series
.
Row-wise summation
To compute the sum for each row, set axis=1
:
df.sum(axis=1)
0 61 8dtype: int64
Specifying skipna
Consider the following DataFrame with a missing value:
df
A B0 2.0 41 NaN 5
By default, skipna=True
, which means that NaN
s are ignored in the computation:
df.sum()
A 2.0B 9.0dtype: float64
Setting to skipna=False
will take into account the NaN
s:
df.sum(skipna=False)
A NaNB 9.0dtype: float64
The reason we get NaN
for the sum of column A
is that any arithmetic computation involving NaN
s will result in NaN
s.
Specifying numeric_only
Consider the following DataFrame:
df
A B C0 4 2 "6"1 5 True False
Here, both columns B
and C
contain mixed types, but the key difference is that summation is defined for B
, but not for C
. Recall that the internal representation of a True
boolean is 1
, so the operation 2+True
actually evaluates to 3
:
2 + True
3
On the other hand, "6"+False
throws an error:
6 + "False"
TypeError: unsupported operand type(s) for +: 'int' and 'str'
None
By default, numeric_only=None
, which means that rows/columns with mixed types will also be considered:
df.sum(numeric_only=None)
A 9B 3dtype: int64
Here, notice how summation was performed on column B
, but not on C
. By passing in None
, rows/columns that result in invalid summations will simply be ignored without throwing an error.
False
By setting numeric_only=False
, rows/columns with mixed types will again be considered, but an error will be thrown when summation cannot be performed:
df.sum(numeric_only=False)
TypeError: can only concatenate str (not "bool") to str
Here, we end up with an error because column C
contains mixed types where the +
operation is not defined.
True
By setting numeric_only=True
, only numeric rows/columns will be considered:
df.sum(numeric_only=True)
A 9dtype: int64
Notice how columns B
and C
were ignored since they contain mixed types.
Case of empty DataFrame
Computing a sum of an empty DataFrame or Series will result in 0
:
df.sum()
A 0.0dtype: float64