Pandas DataFrame | quantile method
Start your free 7-days trial now!
Pandas DataFrame.quantile(~)
method returns the interpolated value at the specified quantile.
Parameters
1. q
| array-like
of float
The desired quantile to compute, which must be between 0 (inclusive) and 1 (inclusive). By default, q=0.5
, that is the value at the 50th percentile is computed.
2. axis
link | None
or int
or string
| optional
Whether to compute the quantile row-wise or column-wise:
Axis | Description |
---|---|
| Compute the quantile for each column. |
| Compute the quantile for each row. |
By default, axis=0
.
3. numeric_only
| boolean
| optional
Whether or not to compute the quantiles only for rows/columns of numeric type. If set to False
, then quantiles of rows/columns with datetime
and timedelta
will also be computed. By default, numeric_only=True
.
4. interpolation
link | string
| optional
How the values are interpolated when the given percentile sits between two data-points, say i
and j
where i<j
:
Value | Description |
---|---|
| Standard linear interpolation |
| Returns |
| Return |
| Returns |
| Returns |
By default, interpolation="linear"
.
Return Value
If q
is a scalar, then a Series
is returned. Otherwise, a DataFrame
is returned.
Examples
Consider the following DataFrame:
df
A B0 2 51 4 62 6 73 8 8
Computing percentile column-wise
To compute the 50th percentile of each column:
df.quantile() # q=0.5
A 5.0B 6.5Name: 0.5, dtype: float64
Here, the return type is Series
. To interpret the output, exactly 50% of the values in column A
is smaller than 5.0
.
Computing percentile row-wise
To compute the 30th percentile of each row:
df.quantile(q=0.3, axis=1)
0 2.91 4.62 6.33 8.0Name: 0.3, dtype: float64
Computing multiple percentiles
To get the values at the 50th and 75th percentiles for each column:
df.quantile([0.5, 0.75]) # returns a DataFrame
A B0.50 5.0 6.500.75 6.5 7.25
Changing interpolation methods
Consider the following DataFrame:
df
A 0 21 42 63 8
linear
Consider the case when the value corresponding to the specified quantile does not exist:
df.quantile(0.5) # interpolation="linear"
A 5.0Name: 0.5, dtype: float64
Here, since the value corresponding to the 50th percentile does not exist in column A
, the value was linearly interpolated between 4 and 6.
lower
df.quantile(0.5, interpolation="lower")
A 4Name: 0.5, dtype: int64
Again, since the 50% quantile does not exist, we need to perform interpolation. We know it is between the values 4 and 6. By passing in "lower"
, we select the lower value, that is, 4 in this case.
higher
df.quantile(0.5, interpolation="higher")
A 6Name: 0.5, dtype: int64
Same logic as "lower"
, but we take the upper value.
Here's the same df
for your reference:
df
A 0 21 42 63 8
nearest
df.quantile(0.5, interpolation="nearest")
A 6Name: 0.5, dtype: int64
By passing in "nearest"
, instead of always selecting the lower or upper value, we take whichever is nearest. In this case, the 50% quantile is 5, which is coincidentally right in the middle of 4 and 6. In such cases, the upper value is selected.
midpoint
df.quantile(0.5, interpolation="midpoint")
A 5.0Name: 0.5, dtype: float64
Here, we just take the midpoint of the lower and upper value, so (4+6)/2=5
.