Value	Description
`"all"`	All columns of the source DataFrame will be included.
`list-like` of `dtypes`	Only columns with the data-types specified in the list will be included.
`None`	Only columns of numeric type will be considered.

By default, include=None.

3. exclude | list-like of dtypes or None | optional

Similar to include, but exclude specifies the column data-types to ignore. By default, exclude=None.

Return Value

A DataFrame holding the descriptive statistics of the column values in the source DataFrame.

Examples

Basic usage

Consider the following DataFrame:


        
        
            
                
                
                    df = pd.DataFrame({"name":["alex","bob","cathy"],"age":[20,30,40],"grade":[60,60,70]})
df
                
            
               name   age  grade
0  alex   20    60
1  bob    30    60
2  cathy  40    70

We can obtain some descriptive statistics using the describe(~) method:


        
        
            
                
                
                    df.describe()
                
            
                   age   grade
count  3.0   3.000000
mean   30.0  63.333333
std    10.0  5.773503
min    20.0  60.000000
25%    25.0  60.000000
50%    30.0  60.000000
75%    35.0  65.000000
max    40.0  70.000000

Here, the 50% percentile represents the median.

Specifying percentiles

Instead of the 25th and 75th percentile, we can specify what percentiles to include by passing in percentiles:


        
        
            
                
                
                    df.describe(percentiles=[0.3, 0.6, 0.9])
                
            
                   age   grade
count  3.0   3.000000
mean   30.0  63.333333
std    10.0  5.773503
min    20.0  60.000000
30%    26.0  60.000000
50%    30.0  60.000000
60%    32.0  62.000000
90%    38.0  68.000000
max    40.0  70.000000

Notice how the 50% percentile is still there - this is because it represents the median.

Specifying include

Consider the following DataFrame:


        
        
            
                
                
                    names = pd.Series(["alex","bob","cathy"], dtype="string")
gender = pd.Series(["male","male","female"], dtype="category")
age = pd.Series([20,30,20], dtype="int")
df = pd.DataFrame({"names":names,"gender":gender,"age":age})
df
                
            
               names  gender  age
0  alex   male    20
1  bob    male    30
2  cathy  female  20

To compute descriptive statistics of columns with type category and int only:


        
        
            
                
                
                    df.describe(include=["category",int])
                
            
                   gender     age
count    3     3.000000
unique   2        NaN
top     male      NaN
freq    2         NaN
mean    NaN    23.333333
std     NaN    5.773503
min     NaN    20.000000
25%     NaN    20.000000
50%     NaN    20.000000
75%     NaN    25.000000
max     NaN    30.000000

Published by Isshin Inada

Edited by 0 others

Did you find this page useful?

thumb_up

thumb_down

Comment

Citation

Ask a question or leave a feedback...

Official Pandas Documentation

https://pandas.pydata.org/pandas-docs/dev/reference/api/pandas.DataFrame.describe.html

thumb_up

thumb_down

chat_bubble_outline

settings

Enjoy our search

Hit / to insta-search docs and recipes!