Pandas DataFrame | rolling method
Start your free 7-days trial now!
Pandas DataFrame.rolling(~)
method is used to compute statistics using moving windows. Note that a window is simply a sequence of values used to compute statistics like the mean.
Parameters
1. window
| int
or offset
or BaseIndexer
subclass
The size of the moving window.
When dealing with time-series, that is when the index of the source DataFrame is DatetimeIndex
, offset
represents a time interval of each window.
2. min_periods
| int
| optional
The minimum number of values in the window. If a window contains there are less than min_periods
observations, then NaN
is returned for the computed statistic of that window. The default value depends on the following:
if window is offset-based, then
min_periods=1
.otherwise,
min_periods=window
.
3. center
| boolean
| optional
If
True
, then the observation is set to the center of the window.If
False
, then the observation is set to the right of the window.
By default, center=False
. Consult examples below for clarification.
4. win_type
| string
| optional
The type of the window (e.g. boxvar
, triang
). For more information, consult the official documentationopen_in_new.
5. on
| string
| optional
The label of the datetime-like column to use instead of DatetimeIndex
, This is only relevant when dealing with time-series.
6. axis
| int
or string
| optional
Whether to compute statistics for each column or each row. By default, axis=0
, that is, the statistic is computed for each column.
7. closed
| string
| optional
Whether the endpoints are inclusive or exclusive:
Value | Description |
---|---|
|
|
|
|
| Both endpoints are inclusive. |
| Both endpoints are exclusive. |
By default,
for offset-based windows,
closed="right"
.otherwise,
closed="both"
.
Return Value
A Window
or Rolling
object that will be used to compute some statistic.
Examples
Basic usage
Consider the following DataFrame:
df = pd.DataFrame({"A":[2,4,8,10],"B":[4,5,6,7]}, index=["a","b","c","d"])df
A Ba 2 4b 4 5c 8 6d 10 7
To compute the sum of values with a moving window of size 2
:
df.rolling(window=2).sum()
A Ba NaN NaNb 6.0 9.0c 12.0 11.0d 18.0 13.0
Here, note the following:
since
axis=0
(default), we are computing the statistic (sum) down each column.window=2
means that the sum is computed using two consecutive observations:we get
6.0
in the first column because2+4=6
.we get
12.0
because4+8=12
.we get
18.0
because8+10=18
.
we get
NaN
for the first row becausemin_periods
is equal to what we specify forwindow
for cases like this when the window is not offset-based. This means that the minimum number of observations required to compute the statistic is2
, but for the very first row, we only have one number in the window soNaN
is returned.
Specifying center
Consider the following DataFrame:
df = pd.DataFrame({"A":[2,4,8,10]}, index=["a","b","c","d"])df
Aa 2b 4c 8d 10
By default, center=False
, which means that the window will not be centered around an observation:
df.rolling(window=3, min_periods=0).sum() # center=False
Aa 2.0b 6.0c 14.0d 22.0
Here, the numbers are computed like so:
A[a]: 2 = 2A[b]: 2 + 4 = 6 # the observation is 4 (see how 4 is right-aligned)A[c]: 2 + 4 + 8 = 14 # the observation is 8A[d]: 4 + 8 + 10 = 22 # the observation is 10
Compare this with the output of center=True
:
df.rolling(window=3, min_periods=0, center=True).sum()
Aa 6.0b 14.0c 22.0d 18.0
Here, the numbers are computed like so:
A[a]: 2 + 4 = 6A[b]: 2 + 4 + 8 = 14 # the observation is 4 (see how 4 is centered here)A[c]: 4 + 8 + 10 = 22 # the observation is 8A[d]: 8 + 10 = 18
Time-series case
Consider the following time-series DataFrame:
idx = [pd.Timestamp('20201220 15:00:00'), pd.Timestamp('20201220 15:00:01'), pd.Timestamp('20201220 15:00:02'), pd.Timestamp('20201220 15:00:04'), pd.Timestamp('20201220 15:00:05')]df = pd.DataFrame({"A":[1,10,100,1000,10000]}, index=idx)df
A2020-12-20 15:00:00 12020-12-20 15:00:01 102020-12-20 15:00:02 1002020-12-20 15:00:04 10002020-12-20 15:00:05 10000
Summing a window with a period of 2 seconds:
df.rolling(window="2S").sum()
A2020-12-20 15:00:00 1.02020-12-20 15:00:01 11.02020-12-20 15:00:02 110.02020-12-20 15:00:04 1000.02020-12-20 15:00:05 11000.0
Note that since window is offset-based, the min_periods=1
by default.
You can specify the closed
parameter to indicate whether the endpoints should be inclusive/exclusive:
df.rolling(window="2S", closed="both").sum() # both endpoints are inclusive
A2020-12-20 15:00:00 1.02020-12-20 15:00:01 11.02020-12-20 15:00:02 111.02020-12-20 15:00:04 1100.02020-12-20 15:00:05 11000.0