Type	Description
`int`	The number of equal-width bins. The range of `x` is increased by 0.1% to ensure that all values fall in some bin.
`sequence<scalar>`	The desired bin edges. Values that do no fall in a bin will be set to `NaN`.
`IntervalIndex`	The exact bins to use.

3. rightlink | boolean | optional

Whether to make the left bin edge exclusive and the right bin edge inclusive. By default, right=True.

4. labelslink | array or False | optional

The desired labels of the bins. By default, labels=None.

5. retbinslink | boolean | optional

Whether or not to return bins. By default, retbins=False.

6. precisionlink | int | optional

The number of decimal places to include up until for the bin labels. By default, precision=3.

7. include_lowestlink | boolean | optional

Whether to make the left edge of the first bin inclusive. By default, include_lowest=False.

8. duplicateslink | string | optional

How to deal with duplicate bin edges:

Value	Description
`"raise"`	Throw an error if any duplicate bin edges are set.
`"drop"`	Remove the duplicate bin edge and just keep one.

By default, duplicates="raise".

9. orderedlink | boolean | optional | v1.10~

Whether or not to embed ordering information. This is only relevant if the return type is Categorical or Series of data-type Categorical. ordered can only be set to False if labels is provided. By default, ordered=True.

Return Value

The return type depends on the type of the labels parameter:

if labels is unspecified:
- if x is a Series, then a Series that encode the bins for each value is returned. Each bin interval is represented by an Interval.
- else, a Categorical is returned. Each bin interval is represented by an Interval.
if labels is an array of scalars:
- if x is a Series, then a Series is returned. The type of the values stored within this Series matches the type of the values stored in labels.
- else, a Categorical is returned. The type of the values stored within the Categorical matches the type of the values stored in labels.
if labels is a boolean False, then a Numpy array of integers is returned.

If retbins=True, then in addition to the above, the bins are returned as a Numpy array. If x is an IntervalIndex, then x is returned instead.

Examples

Consider the following DataFrame about students and their grades:


        
        
            
                
                
                    raw_grades = [3,6,8,7,4,6]
students = ["alex", "bob", "cathy", "doge", "eric", "fred"]
df = pd.DataFrame({"name":students,"raw_grade":raw_grades})
df
                
            
               name  raw_grade
0  alex     3
1  bob      6
2  cathy    8
3  doge     7
4  eric     4
5  fred     6

Basic Usage

To categorise the raw grades into four bins (segments):


        
        
            
                
                
                    df["grade"] = pd.cut(df["raw_grade"], bins=4)   # returns a Series
df
                
            
               name  raw_grade    grade
0  alex     3       (2.999, 4.5]
1  bob      6       (4.5, 6.0]
2  cathy    8       (6.75, 8.0]
3  doge     7       (6.75, 8.0]
4  eric     4       (2.999, 4.5]
5  fred     6       (4.5, 6.0]

The grade column now contains the bins, and there should be 4 different bins in total. Note that (2.995, 4.25] just means that the 2.995 < raw_grade <= 4.25.

Specifying custom bin edges

To specify custom bin edges, we can pass in an array of bin edges instead of an int:


        
        
            
                
                
                    df["grade"] = pd.cut(df["raw_grade"], bins=[0,4,6,10])
df
                
            
               name  raw_grade  grade
0  alex     3       (0, 4]
1  bob      6       (4, 6]
2  cathy    8       (6, 10]
3  doge     7       (6, 10]
4  eric     4       (0, 4]
5  fred     6       (4, 6]

We show the same df here for your reference:


        
        
            
                
                
                    df
                
            
               name  raw_grade
0  alex     3
1  bob      6
2  cathy    8
3  doge     7
4  eric     4
5  fred     6

Specifying right

To make the left bin edge inclusive and the right bin edge exclusive, set right=False:


        
        
            
                
                
                    df["grade"] = pd.cut(df["raw_grade"], bins=[0,4,6,10], right=False)
df
                
            
               name  raw_grade  grade
0  alex      3      [0, 4)
1  bob       6      [6, 10)
2  cathy     8      [6, 10)
3  doge      7      [6, 10)
4  eric      4      [4, 6)
5  fred      6      [6, 10)

Notice how we have [0, 4) instead of the default (0, 4].

Specifying labels

We can give labels to our bins by setting the labels parameter:


        
        
            
                
                
                    df["grade"] = pd.cut(df["raw_grade"], bins=3, labels=["C","B","A"])
df
                
            
               name  raw_grade  grade
0  alex      3        C
1  bob       6        B
2  cathy     8        A
3  doge      7        A
4  eric      4        C
5  fred      6        B

This is an extremely practical feature of the cut(~) method. The length of the labels array must equal the specified number of bins.

By setting labels=False, a Numpy array of int is returned:


        
        
            
                
                
                    raw_grades = [3,6,8,7,4,5]
pd.cut(raw_grades, bins=3, labels=False)
                
            
            array([0, 1, 2, 2, 0, 1])

Here, the output tells us that:

the raw grade 3 belongs to bin 0 (first bin).
the raw grade 6 belongs to bin 1 (second bin).
and so on.

Specifying retbins

To get the computed bin edges as well, set retbins=True:


        
        
            
                
                
                    raw_grades = [3,6,8,7,4,5]
res = pd.cut(raw_grades, bins=2, retbins=True)
print("Categories: ", res[0])
print("Bin egdes: ", res[1])
                
            
            Categories:  [(2.995, 5.5], (5.5, 8.0], (5.5, 8.0], (5.5, 8.0], (2.995, 5.5], (2.995, 5.5]]
Categories (2, interval[float64]): [(2.995, 5.5] < (5.5, 8.0]]
Bin egdes:  [2.995 5.5   8.   ]

We show the same df here for your reference:


        
        
            
                
                
                    df
                
            
               name  raw_grade
0  alex     3
1  bob      6
2  cathy    8
3  doge     7
4  eric     4
5  fred     6

Specifying precision

To control how many decimal places are displayed, set the precision parameter:


        
        
            
                
                
                    res = pd.cut(df["raw_grade"], bins=[0,4.33333,6.6,10], precision=2)
print(res)
                
            
            0    (0.0, 4.33]
1    (4.33, 6.6]
2    (6.6, 10.0]
3    (6.6, 10.0]
4    (0.0, 4.33]
5    (4.33, 6.6]
Name: raw_grade, dtype: category
Categories (3, interval[float64]): [(0.0, 4.33] < (4.33, 6.6] < (6.6, 10.0]]

Here, notice how 4.3333 got truncated to 4.33, as specified by precision value of 2.

Specifying include_lowest

Consider the following:


        
        
            
                
                
                    df["grade"] = pd.cut(df["raw_grade"], bins=[3,6,10])
df
                
            
               name  raw_grade     grade
0  alex      3          NaN
1  bob       6       (3.0, 6.0]
2  ...

By default, include_lowest=False, which means that the first bin interval is left-exclusive. This is why the raw_grade of 3 does not fall in any bin here.

We can make the first bin interval left-inclusive by setting include_lowest=True:


        
        
            
                
                
                    df["grade"] = pd.cut(df["raw_grade"], bins=[3,6,10], include_lowest=True)
df
                
            
               name  raw_grade       grade
0  alex      3        (2.999, 6.0]
1  bob       6        (2.999, 6.0]
...

We now see that the raw_grade of 3 has been included in the first bin.

Specifying duplicates

By default, the bin edges must be unique, otherwise an error will be thrown. For instance:


        
        
            
                
                
                    x = [3,7,8,7,4,5]
pd.cut(x, bins=[2,6,6,10])   # duplicates="raise"
                
            
            ValueError: Bin edges must be unique: array([ 2,  6,  6, 10]).

Here, we have two bin edges of value 6, so that's why we get an error.

In order to drop (remove) redundant bin edges, set duplicates="drop", like so:


        
        
            
                
                
                    x = [3,7,8,7,4,5]
pd.cut(x, bins=[2,6,6,10], duplicates="drop")
                
            
            [(2, 6], (6, 10], (6, 10], (6, 10], (2, 6], (2, 6]]
Categories (2, interval[int64]): [(2, 6] < (6, 10]]

We see that one of the bin edge of value 6 got dropped.

Specifying ordered

By default, ordered=True, which means that the resulting Categorical will be ordered:


        
        
            
                
                
                    grades = [3,6,8,7,4,5]
pd.cut(grades, bins=2, labels=["B","A"])   # ordered=True
                
            
            ['B', 'A', 'A', 'A', 'B', 'B']
Categories (2, object): ['B' < 'A']

Notice how the information about ordering is embedded as ['B'<'A'].

By setting ordered=False, such ordering information is omitted:


        
        
            
                
                
                    grades = [3,6,8,7,4,5]
pd.cut(grades, bins=2, labels=["B","A"], ordered=False)
                
            
            ['B', 'A', 'A', 'A', 'B', 'B']
Categories (2, object): ['B', 'A']

To set ordered=False, make sure to have specified labels.

Published by Isshin Inada

Edited by 0 others

Did you find this page useful?

thumb_up

thumb_down

Comment

Citation

Ask a question or leave a feedback...

Official Pandas Documentation

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.cut.html

thumb_up

thumb_down

chat_bubble_outline

settings

Enjoy our search

Hit / to insta-search docs and recipes!