Pandas DataFrame | describe method
Start your free 7-days trial now!
Pandas DataFrame.describe(~) method returns a DataFrame containing some descriptive statistics (e.g. mean and min) of the columns of the source DataFrame. This is most commonly used to numerically summarise a given dataset.
Parameters
1. percentileslink | array-like of numbers | optional
The percentiles to include as part of the descriptive statistics. By default, percentiles=[0.25, 0.50, 0.75].
2. includelink | "all" or array-like of dtypes or None | optional
The columns in the source DataFrame to consider:
Value | Description |
|---|---|
| All columns of the source DataFrame will be included. |
| Only columns with the data-types specified in the list will be included. |
| Only columns of numeric type will be considered. |
By default, include=None.
3. exclude | list-like of dtypes or None | optional
Similar to include, but exclude specifies the column data-types to ignore. By default, exclude=None.
Return Value
A DataFrame holding the descriptive statistics of the column values in the source DataFrame.
Examples
Basic usage
Consider the following DataFrame:
df
name age grade0 alex 20 601 bob 30 602 cathy 40 70
We can obtain some descriptive statistics using the describe(~) method:
df.describe()
age gradecount 3.0 3.000000mean 30.0 63.333333std 10.0 5.773503min 20.0 60.00000025% 25.0 60.00000050% 30.0 60.00000075% 35.0 65.000000max 40.0 70.000000
Here, the 50% percentile represents the median.
Specifying percentiles
Instead of the 25th and 75th percentile, we can specify what percentiles to include by passing in percentiles:
df.describe(percentiles=[0.3, 0.6, 0.9])
age gradecount 3.0 3.000000mean 30.0 63.333333std 10.0 5.773503min 20.0 60.00000030% 26.0 60.00000050% 30.0 60.00000060% 32.0 62.00000090% 38.0 68.000000max 40.0 70.000000
Notice how the 50% percentile is still there - this is because it represents the median.
Specifying include
Consider the following DataFrame:
To compute descriptive statistics of columns with type category and int only:
df.describe(include=["category",int])
gender agecount 3 3.000000unique 2 NaNtop male NaNfreq 2 NaNmean NaN 23.333333std NaN 5.773503min NaN 20.00000025% NaN 20.00000050% NaN 20.00000075% NaN 25.000000max NaN 30.000000