Pandas DataFrame | describe method
Start your free 7-days trial now!
Pandas DataFrame.describe(~)
method returns a DataFrame containing some descriptive statistics (e.g. mean
and min
) of the columns of the source DataFrame. This is most commonly used to numerically summarise a given dataset.
Parameters
1. percentiles
link | array-like
of numbers
| optional
The percentiles to include as part of the descriptive statistics. By default, percentiles=[0.25, 0.50, 0.75]
.
2. include
link | "all"
or array-like
of dtypes
or None
| optional
The columns in the source DataFrame to consider:
Value | Description |
---|---|
| All columns of the source DataFrame will be included. |
| Only columns with the data-types specified in the list will be included. |
| Only columns of numeric type will be considered. |
By default, include=None
.
3. exclude
| list-like
of dtypes
or None
| optional
Similar to include
, but exclude
specifies the column data-types to ignore. By default, exclude=None
.
Return Value
A DataFrame holding the descriptive statistics of the column values in the source DataFrame.
Examples
Basic usage
Consider the following DataFrame:
df
name age grade0 alex 20 601 bob 30 602 cathy 40 70
We can obtain some descriptive statistics using the describe(~)
method:
df.describe()
age gradecount 3.0 3.000000mean 30.0 63.333333std 10.0 5.773503min 20.0 60.00000025% 25.0 60.00000050% 30.0 60.00000075% 35.0 65.000000max 40.0 70.000000
Here, the 50% percentile represents the median.
Specifying percentiles
Instead of the 25th and 75th percentile, we can specify what percentiles to include by passing in percentiles
:
df.describe(percentiles=[0.3, 0.6, 0.9])
age gradecount 3.0 3.000000mean 30.0 63.333333std 10.0 5.773503min 20.0 60.00000030% 26.0 60.00000050% 30.0 60.00000060% 32.0 62.00000090% 38.0 68.000000max 40.0 70.000000
Notice how the 50%
percentile is still there - this is because it represents the median.
Specifying include
Consider the following DataFrame:
To compute descriptive statistics of columns with type category
and int
only:
df.describe(include=["category",int])
gender agecount 3 3.000000unique 2 NaNtop male NaNfreq 2 NaNmean NaN 23.333333std NaN 5.773503min NaN 20.00000025% NaN 20.00000050% NaN 20.00000075% NaN 25.000000max NaN 30.000000