Pandas DataFrame | resample method
Start your free 7-days trial now!
Pandas DataFrame.resample(~)
method performs a group-by based on time. The parameters are difficult to explain by themselves, so we suggest looking at our examples for clarification.
Parameters
1. rule
link | DateOffset
or Timedelta
or string
The length of time of each group.
2. axis
| int
or string
| optional
Whether to resample each row or column:
Axis | Description |
---|---|
| Resample each row. |
| Resample each column. |
By default, axis=0
.
3. closed
link | string
| optional
Which side of the bin interval is closed:
Value | Description |
---|---|
| Right side of bin interval is closed (inclusive) |
| Left side of bin interval is closed |
By default, closed="left"
, except for the following frequency offsets, which have closed="right"
by default:
‘M’, ‘A’, ‘Q’, ‘BM’, ‘BA’, ‘BQ’, and ‘W’
4. label
link | string
| optional
Which side of the bin interval is labeled:
Value | Description |
---|---|
| Right side of bin interval is labelled |
| Left side of bin interval is labelled |
By default, label=None
, except for the following frequency offsets, which have label="right"
by default:
‘M’, ‘A’, ‘Q’, ‘BM’, ‘BA’, ‘BQ’, and ‘W’
5. convention
| string
| optional
This is relevant only when index of the source DataFrame is PeriodIndex
. Whether to use the start time or end time of PeriodIndex
:
Value | Description |
---|---|
| Use the start time of |
| Use the end time of |
By default, convention="start"
.
6. kind
link | None
or string
| optional
The data type of the resulting index:
Value | Description |
---|---|
| Leave the resulting index as is. |
| Convert resulting index to |
| Convert resulting index to |
By default, kind=None
.
7. loffset
link | timedelta
| optional
The offset to apply on the time labels. By default, loffset=None
.
8. base
link | int
| optional
The number of rows to include in the first group. Check out examples for clarification. By default, base=0
.
9. on
link | string
| optional
The column to use for resampling instead of the index. The column must be of type datetime-like (e.g. datetime64
). By default, the index will be used for resampling.
10. level
| string
or int
| optional
The level used for resampling. This is only relevant if DataFrame has a multi-index. level
must also be of type datetime-like (e.g. datetime64
).
Returns Value
A Resampler
object.
Examples
Consider the following DataFrame:
date_index = pd.date_range("2020/12/25", periods=4)df = pd.DataFrame({"A":[2,3,4,5],"B":[6,7,8,9]}, index=date_index)df
A B2020-12-25 2 62020-12-26 3 72020-12-27 4 82020-12-28 5 9
Here, the index of df
is of type DatetimeIndex
.
Basic usage
To compute the sum of each group of 2 consecutive days:
df.resample(rule="2D").sum() # returns a DataFrame
A B2020-12-25 5 132020-12-27 9 17
By default, all columns are considered when aggregate functions like sum()
are subsequently called. To apply the function on certain columns only, use the []
notation like so:
df.resample(rule="2D")["A"].sum() # returns a Series since we only selected 1 column
2020-12-25 52020-12-27 9Freq: 2D, Name: A, dtype: int64
Specifying closed
Consider the same df
as above:
df
A B2020-12-25 2 62020-12-26 3 72020-12-27 4 82020-12-28 5 9
By default, closed="left"
(check parameter description for exception), which means that:
the left bin interval is inclusive
the right bin interval is exclusive
df.resample(rule="2D", closed="left").sum()
A B2020-12-25 5 13 # Sum of 12-25 and 12-262020-12-27 9 17 # Sum of 12-27 and 12-28
On the other hand, passing closed="right"
would mean that:
the left bin interval is exclusive
the right bin interval is inclusive
df.resample(rule="2D", closed="right").sum()
A B2020-12-23 2 6 # Sum of 12-24 and 12-252020-12-25 7 15 # Sum of 12-26 and 12-272020-12-27 5 9 # Sum of 12-28 and 12-29
For your reference, here's our df
again:
df
A B2020-12-25 2 62020-12-26 3 72020-12-27 4 82020-12-28 5 9
Specifying label
By default, label="left"
(check parameter description for exception), which means that the label of the left interval bin is used:
df.resample(rule="2D", label="left").sum()
A B2020-12-25 5 13 # Sum of 12-25 and 12-262020-12-27 9 17 # Sum of 12-27 and 12-28
To use the label of the right interval bin, set label="right"
like so:
df.resample(rule="2D", label="right").sum()
A B2020-12-27 5 13 # Sum of 12-25 and 12-262020-12-29 9 17 # Sum of 12-27 and 12-28
Specifying kind
To get DatetimeIndex
, set kind="timestamp"
like so:
df.resample(rule="2D", kind="timestamp").sum().index
DatetimeIndex(['2020-12-25', '2020-12-27'], dtype='datetime64[ns]', freq='2D')
To get PeriodIndex
, set kind="period"
like so:
df.resample(rule="2D", kind="period").sum().index
PeriodIndex(['2020-12-25', '2020-12-27'], dtype='period[2D]', freq='2D')
Specifying loffset
For your reference, here's our df
again:
df
A B2020-12-25 2 62020-12-26 3 72020-12-27 4 82020-12-28 5 9
To shift the label by 1 day, pass loffset="1D"
like so:
df.resample(rule="2D", loffset="1D").sum()
A B2020-12-26 5 132020-12-28 9 17
Notice how only the labels have shifted - the resulting values of the DataFrame are unaffected.
Just as a comparison, the case when we don't specify loffset
is as follows:
df.resample(rule="2D").sum()
A B2020-12-25 5 132020-12-27 9 17
Specifying base
Consider the following DataFrame:
date_index = pd.date_range("2020/12/25", periods=4)df = pd.DataFrame({"A":[2,3,4,5],"B":[6,7,8,9]}, index=date_index)df
A B2020-12-25 2 62020-12-26 3 72020-12-27 4 82020-12-28 5 9
Resampling with 4 days yields:
df.resample(rule="4D").sum()
A B2020-12-25 14 30
Specifying base
ensures that the first group has base
number of rows. For instance, base=2
yields:
df.resample(rule="4D", base=2).sum()
A B2020-12-23 5 132020-12-27 9 17
Here, notice how we're starting from 12-23
instead of 12-25
. This is because base=2
ensures only the first two rows of df
is placed in the first group. Since we're still grouping by 4 consecutive days, this shifts the starting date to 12-23
.
Specifying on
By default, resample(~)
method assumes that the index of the DataFrame is datetime-like. The parameter on
allows you resample on a column.
Consider the following DataFrame:
date_index = pd.date_range("2020/12/25", periods=4)df = pd.DataFrame({"A":date_index,"B":[3,4,5,6]})df
A B0 2020-12-25 31 2020-12-26 42 2020-12-27 53 2020-12-28 6
To resample on column A
:
df.resample(rule="2D", on="A").sum()
B A 2020-12-25 72020-12-27 11
Note the following:
column
A
became the new index.the name assigned to this new index is the column label (
A
in this case).