Pandas DataFrame | median method
Start your free 7-days trial now!
Pandas DataFrame.median(~)
method computes the median for each row or column of the DataFrame.
Parameters
1. axis
link | int
or string
| optional
Whether to compute the median row-wise or column-wise:
Axis | Description |
---|---|
| Median is computed for each column. |
| Median is computed for each row. |
By default, axis=0
.
2. skipna
link | boolean
| optional
Whether or not to skip NaN
. If skipna=False
, then having even one NaN
will return a NaN
as the median for its row/column. By default, skipna=True
.
3. level
| string
or int
| optional
The name or the integer index of the level to consider. This is only relevant if your DataFrame is Multi-index.
4. numeric_only
link | None
or boolean
| optional
The allowed values are as follows:
Value | Description |
---|---|
| Only numeric rows/columns will be considered (e.g. |
| Attempt computation with all types (e.g. strings and dates), and throw an error whenever the median cannot be computed. |
| Attempt computation with all types, and ignore all rows/columns whose median cannot be computed without raising an error. |
Note that medians can only be computed when we can perform summation between the types.
By default, numeric_only=None
.
Return Value
If the level
parameter is specified, then a DataFrame
will be returned. Otherwise, a Series
will be returned.
Examples
Consider the following DataFrame:
df = pd.DataFrame({"A":[2,3], "B":[4,5], "C":["6",7]})df
A B C0 2 4 "6"1 3 5 7
Column-wise median
To compute the median for each column:
df.median() # axis=0
A 2.5B 4.5C 6.5dtype: float64
Notice how "6"
has been automatically casted to a float
in order to compute the median.
Row-wise median
To compute the median for each row, set axis=1
:
df.median(axis=1)
0 4.01 5.0dtype: float64
Specifying skipna
Consider the following DataFrame:
df = pd.DataFrame({"A":[3,4,6], "B":[7,9,pd.np.nan]})df
A B0 3 7.01 4 9.02 6 NaN
By default, skipna=True
, which means that all missing value are skipped when computing the median:
df.median() # skipna=True
A 4.5B 8.0dtype: float64
To take into account the missing values:
df.median(skipna=False)
A 4.0B NaNdtype: float64
Note that if a row/column contains one or more missing values, the median for that row/column will be NaN
.
Specifying numeric_only
Consider the following DataFrame:
df = pd.DataFrame({"A":[3,4], "B":[5,True], "C":[6,"7@8"]})df
A B C0 3 5 61 4 True 7@8
Here, both columns B
and C
contain mixed types, but the key difference is that the median can be computed for B
, but not for C
. When sample size is even, which is the case here, the median is computed by taking the average of the middle two numbers, which means that the summation operation between the types must be well-defined.
Recall that the internal representation of a True
boolean is 1
, so the operation 5+True
actually evaluates to 6
:
5 + True
6
On the other hand, 6+"7@8"
throws an error:
6 + "7@8"
TypeError: unsupported operand type(s) for +: 'int' and 'str'
None
By default, numeric_only=None
, which means that rows/columns with mixed types will also be considered:
df.median(numeric_only=None)
A 3.5B 3.0dtype: float64
Here, notice how the median was computed for column B
, but not for C
. By passing in None
, rows/columns where the median cannot be computed (due to invalid summation) will simply be ignored without raising an error.
False
By setting numeric_only=False
, rows/columns with mixed types will again be considered, but an error will be thrown when the median cannot be computed:
df.median(numeric_only=False)
TypeError: could not convert string to float: '7@8'
Here, we end up with an error because column C
contains mixed types where summation is not defined.
True
By setting numeric_only=True
, only numeric rows/columns will be considered:
df.median(numeric_only=True)
A 4.5dtype: float64
Notice how columns B
and C
were ignored since they contain mixed types.