search
Search
Login
Unlock 100+ guides
menu
menu
web
search toc
close
Comments
Log in or sign up
Cancel
Post
account_circle
Profile
exit_to_app
Sign out
What does this mean?
Why is this true?
Give me some examples!
search
keyboard_voice
close
Searching Tips
Search for a recipe:
"Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview
icon_star
Doc Search
icon_star
Code Search Beta
SORRY NOTHING FOUND!
mic
Start speaking...
Voice search is only supported in Safari and Chrome.
Navigate to

Pandas DataFrame | quantile method

schedule Aug 10, 2023
Last updated
local_offer
PythonPandas
Tags
mode_heat
Master the mathematics behind data science with 100+ top-tier guides
Start your free 7-days trial now!

Pandas DataFrame.quantile(~) method returns the interpolated value at the specified quantile.

Parameters

1. q | array-like of float

The desired quantile to compute, which must be between 0 (inclusive) and 1 (inclusive). By default, q=0.5, that is the value at the 50th percentile is computed.

2. axislink | None or int or string | optional

Whether to compute the quantile row-wise or column-wise:

Axis

Description

0 or "index"

Compute the quantile for each column.

1 or "columns"

Compute the quantile for each row.

By default, axis=0.

3. numeric_only | boolean | optional

Whether or not to compute the quantiles only for rows/columns of numeric type. If set to False, then quantiles of rows/columns with datetime and timedelta will also be computed. By default, numeric_only=True.

4. interpolationlink | string | optional

How the values are interpolated when the given percentile sits between two data-points, say i and j where i<j:

Value

Description

"linear"

Standard linear interpolation

"lower"

Returns i

"higher"

Return j

"midpoint"

Returns (i+j)/2

"nearest"

Returns i or j, whichever is closer

By default, interpolation="linear".

Return Value

If q is a scalar, then a Series is returned. Otherwise, a DataFrame is returned.

Examples

Consider the following DataFrame:

df = pd.DataFrame({"A":[2,4,6,8],"B":[5,6,7,8]})
df
   A  B
0  2  5
1  4  6
2  6  7
3  8  8

Computing percentile column-wise

To compute the 50th percentile of each column:

df.quantile()   # q=0.5
A 5.0
B 6.5
Name: 0.5, dtype: float64

Here, the return type is Series. To interpret the output, exactly 50% of the values in column A is smaller than 5.0.

Computing percentile row-wise

To compute the 30th percentile of each row:

df.quantile(q=0.3, axis=1)
0 2.9
1 4.6
2 6.3
3 8.0
Name: 0.3, dtype: float64

Computing multiple percentiles

To get the values at the 50th and 75th percentiles for each column:

df.quantile([0.5, 0.75])   # returns a DataFrame
A B
0.50 5.0 6.50
0.75 6.5 7.25

Changing interpolation methods

Consider the following DataFrame:

df = pd.DataFrame({"A":[2,4,6,8]})
df
   A  
0  2
1  4
2  6
3  8

linear

Consider the case when the value corresponding to the specified quantile does not exist:

df.quantile(0.5)   # interpolation="linear"
A 5.0
Name: 0.5, dtype: float64

Here, since the value corresponding to the 50th percentile does not exist in column A, the value was linearly interpolated between 4 and 6.

lower

df.quantile(0.5, interpolation="lower")
A 4
Name: 0.5, dtype: int64

Again, since the 50% quantile does not exist, we need to perform interpolation. We know it is between the values 4 and 6. By passing in "lower", we select the lower value, that is, 4 in this case.

higher

df.quantile(0.5, interpolation="higher")
A 6
Name: 0.5, dtype: int64

Same logic as "lower", but we take the upper value.

Here's the same df for your reference:

df
   A  
0  2
1  4
2  6
3  8

nearest

df.quantile(0.5, interpolation="nearest")
A 6
Name: 0.5, dtype: int64

By passing in "nearest", instead of always selecting the lower or upper value, we take whichever is nearest. In this case, the 50% quantile is 5, which is coincidentally right in the middle of 4 and 6. In such cases, the upper value is selected.

midpoint

df.quantile(0.5, interpolation="midpoint")
A 5.0
Name: 0.5, dtype: float64

Here, we just take the midpoint of the lower and upper value, so (4+6)/2=5.

robocat
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down
Comment
Citation
Ask a question or leave a feedback...
thumb_up
0
thumb_down
0
chat_bubble_outline
0
settings
Enjoy our search
Hit / to insta-search docs and recipes!