Pandas

655 guides

keyboard_arrow_down

Other math topics

Dagster

Pandas

NumPy

Matplotlib

PySpark

MySQL

chevron_leftTime Series

Method asfreq Method asof Method first_valid_index Method last_valid_index Method resample Method shift Method slice_shift Method to_period Method to_timestamp Method tshift Method tz_convert Method tz_localize

check_circle

Mark as learned

thumb_up

thumb_down

chat_bubble_outline

Comment

auto_stories Bi-column layout

settings

Pandas DataFrame | resample method

schedule Aug 11, 2023

Last updated

local_offer

Python●Pandas

Parameters

1. rulelink | DateOffset or Timedelta or string

The length of time of each group.

2. axis | int or string | optional

Whether to resample each row or column:

Axis	Description
`0` or `"index"`	Resample each row.
`1` or `"columns"`	Resample each column.

By default, axis=0.

3. closedlink | string | optional

Which side of the bin interval is closed:

Value	Description
`"right"`	Right side of bin interval is closed (inclusive)
`"left"`	Left side of bin interval is closed

By default, closed="left", except for the following frequency offsets, which have closed="right" by default:


        
        
            
                
                
                     ‘M’, ‘A’, ‘Q’, ‘BM’, ‘BA’, ‘BQ’, and ‘W’

4. labellink | string | optional

Which side of the bin interval is labeled:

Value	Description
`"right"`	Right side of bin interval is labelled
`"left"`	Left side of bin interval is labelled

By default, label=None, except for the following frequency offsets, which have label="right" by default:


        
        
            
                
                
                     ‘M’, ‘A’, ‘Q’, ‘BM’, ‘BA’, ‘BQ’, and ‘W’

5. convention | string | optional

This is relevant only when index of the source DataFrame is PeriodIndex. Whether to use the start time or end time of PeriodIndex:

Value	Description
`"s"` or `"start"`	Use the start time of `PeriodIndex`
`"e"` or `"end"`	Use the end time of `PeriodIndex`

By default, convention="start".

6. kindlink | None or string | optional

The data type of the resulting index:

Value	Description
`None`	Leave the resulting index as is.
`"timestamp"`	Convert resulting index to `DatetimeIndex`.
`"period"`	Convert resulting index to `PeriodIndex`.

By default, kind=None.

7. loffsetlink | timedelta | optional

The offset to apply on the time labels. By default, loffset=None.

8. baselink | int | optional

The number of rows to include in the first group. Check out examples for clarification. By default, base=0.

9. onlink | string | optional

The column to use for resampling instead of the index. The column must be of type datetime-like (e.g. datetime64). By default, the index will be used for resampling.

10. level | string or int | optional

The level used for resampling. This is only relevant if DataFrame has a multi-index. level must also be of type datetime-like (e.g. datetime64).

Returns Value

A Resampler object.

Examples

Consider the following DataFrame:


        
        
            
                
                
                    date_index = pd.date_range("2020/12/25", periods=4)
df = pd.DataFrame({"A":[2,3,4,5],"B":[6,7,8,9]}, index=date_index)
df
                
            
                         A   B
2020-12-25   2   6
2020-12-26   3   7
2020-12-27   4   8
2020-12-28   5   9

Here, the index of df is of type DatetimeIndex.

Basic usage

To compute the sum of each group of 2 consecutive days:


        
        
            
                
                
                    df.resample(rule="2D").sum()   # returns a DataFrame
                
            
                         A   B
2020-12-25   5   13
2020-12-27   9   17

By default, all columns are considered when aggregate functions like sum() are subsequently called. To apply the function on certain columns only, use the [] notation like so:


        
        
            
                
                
                    df.resample(rule="2D")["A"].sum()   # returns a Series since we only selected 1 column
                
            
            2020-12-25    5
2020-12-27    9
Freq: 2D, Name: A, dtype: int64

Specifying closed

Consider the same df as above:


        
        
            
                
                
                    df
                
            
                         A   B
2020-12-25   2   6
2020-12-26   3   7
2020-12-27   4   8
2020-12-28   5   9

By default, closed="left" (check parameter description for exception), which means that:

the left bin interval is inclusive
the right bin interval is exclusive


        
        
            
                
                
                    df.resample(rule="2D", closed="left").sum()
                
            
                         A   B
2020-12-25   5   13      # Sum of 12-25 and 12-26
2020-12-27   9   17      # Sum of 12-27 and 12-28

On the other hand, passing closed="right" would mean that:

the left bin interval is exclusive
the right bin interval is inclusive


        
        
            
                
                
                    df.resample(rule="2D", closed="right").sum()
                
            
                         A   B
2020-12-23   2   6      # Sum of 12-24 and 12-25
2020-12-25   7   15     # Sum of 12-26 and 12-27
2020-12-27   5   9      # Sum of 12-28 and 12-29

For your reference, here's our df again:


        
        
            
                
                
                    df
                
            
                         A   B
2020-12-25   2   6
2020-12-26   3   7
2020-12-27   4   8
2020-12-28   5   9

Specifying label

By default, label="left" (check parameter description for exception), which means that the label of the left interval bin is used:


        
        
            
                
                
                    df.resample(rule="2D", label="left").sum()
                
            
                         A   B
2020-12-25   5   13      # Sum of 12-25 and 12-26
2020-12-27   9   17      # Sum of 12-27 and 12-28

To use the label of the right interval bin, set label="right" like so:


        
        
            
                
                
                    df.resample(rule="2D", label="right").sum()
                
            
                         A   B
2020-12-27   5   13      # Sum of 12-25 and 12-26
2020-12-29   9   17      # Sum of 12-27 and 12-28

Specifying kind

To get DatetimeIndex, set kind="timestamp" like so:


        
        
            
                
                
                    df.resample(rule="2D", kind="timestamp").sum().index
                
            
            DatetimeIndex(['2020-12-25', '2020-12-27'], dtype='datetime64[ns]', freq='2D')

To get PeriodIndex, set kind="period" like so:


        
        
            
                
                
                    df.resample(rule="2D", kind="period").sum().index
                
            
            PeriodIndex(['2020-12-25', '2020-12-27'], dtype='period[2D]', freq='2D')

Specifying loffset

For your reference, here's our df again:


        
        
            
                
                
                    df
                
            
                         A   B
2020-12-25   2   6
2020-12-26   3   7
2020-12-27   4   8
2020-12-28   5   9

To shift the label by 1 day, pass loffset="1D" like so:


        
        
            
                
                
                    df.resample(rule="2D", loffset="1D").sum()
                
            
                         A   B
2020-12-26   5   13
2020-12-28   9   17

Notice how only the labels have shifted - the resulting values of the DataFrame are unaffected.

Just as a comparison, the case when we don't specify loffset is as follows:


        
        
            
                
                
                    df.resample(rule="2D").sum()
                
            
                         A   B
2020-12-25   5   13
2020-12-27   9   17

Specifying base

Consider the following DataFrame:


        
        
            
                
                
                    date_index = pd.date_range("2020/12/25", periods=4)
df = pd.DataFrame({"A":[2,3,4,5],"B":[6,7,8,9]}, index=date_index)
df
                
            
                         A   B
2020-12-25   2   6
2020-12-26   3   7
2020-12-27   4   8
2020-12-28   5   9

Resampling with 4 days yields:


        
        
            
                
                
                    df.resample(rule="4D").sum()
                
            
                         A    B
2020-12-25   14   30

Specifying base ensures that the first group has base number of rows. For instance, base=2 yields:


        
        
            
                
                
                    df.resample(rule="4D", base=2).sum()
                
            
                         A   B
2020-12-23   5   13
2020-12-27   9   17

Here, notice how we're starting from 12-23 instead of 12-25. This is because base=2 ensures only the first two rows of df is placed in the first group. Since we're still grouping by 4 consecutive days, this shifts the starting date to 12-23.

Specifying on

By default, resample(~) method assumes that the index of the DataFrame is datetime-like. The parameter on allows you resample on a column.

Consider the following DataFrame:


        
        
            
                
                
                    date_index = pd.date_range("2020/12/25", periods=4)
df = pd.DataFrame({"A":date_index,"B":[3,4,5,6]})
df
                
            
                A            B
0   2020-12-25   3
1   2020-12-26   4
2   2020-12-27   5
3   2020-12-28   6

To resample on column A:


        
        
            
                
                
                    df.resample(rule="2D", on="A").sum()
                
            
                         B
         A   
2020-12-25   7
2020-12-27   11

Note the following:

column A became the new index.
the name assigned to this new index is the column label (A in this case).

Published by Isshin Inada

Edited by 0 others

Did you find this page useful?

thumb_up

thumb_down

Comment

Citation

Ask a question or leave a feedback...

Official Pandas Documentation

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.resample.html

thumb_up

thumb_down

chat_bubble_outline

settings

Enjoy our search

Hit / to insta-search docs and recipes!