search
Search
Login
Unlock 100+ guides
menu
menu
web
search toc
close
Comments
Log in or sign up
Cancel
Post
account_circle
Profile
exit_to_app
Sign out
What does this mean?
Why is this true?
Give me some examples!
search
keyboard_voice
close
Searching Tips
Search for a recipe:
"Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview
icon_star
Doc Search
icon_star
Code Search Beta
SORRY NOTHING FOUND!
mic
Start speaking...
Voice search is only supported in Safari and Chrome.
Navigate to

Pandas DataFrame | rolling method

schedule Aug 12, 2023
Last updated
local_offer
PythonPandas
Tags
mode_heat
Master the mathematics behind data science with 100+ top-tier guides
Start your free 7-days trial now!

Pandas DataFrame.rolling(~) method is used to compute statistics using moving windows. Note that a window is simply a sequence of values used to compute statistics like the mean.

Parameters

1. window | int or offset or BaseIndexer subclass

The size of the moving window.

When dealing with time-series, that is when the index of the source DataFrame is DatetimeIndex, offset represents a time interval of each window.

2. min_periods | int | optional

The minimum number of values in the window. If a window contains there are less than min_periods observations, then NaN is returned for the computed statistic of that window. The default value depends on the following:

  • if window is offset-based, then min_periods=1.

  • otherwise, min_periods=window.

3. center | boolean | optional

  • If True, then the observation is set to the center of the window.

  • If False, then the observation is set to the right of the window.

By default, center=False. Consult examples below for clarification.

4. win_type | string | optional

The type of the window (e.g. boxvar, triang). For more information, consult the official documentationopen_in_new.

5. on | string | optional

The label of the datetime-like column to use instead of DatetimeIndex, This is only relevant when dealing with time-series.

6. axis | int or string | optional

Whether to compute statistics for each column or each row. By default, axis=0, that is, the statistic is computed for each column.

7. closed | string | optional

Whether the endpoints are inclusive or exclusive:

Value

Description

"left"

  • Left endpoint is inclusive.

  • Right endpoint is exclusive.

"right"

  • Left endpoint is exclusive.

  • Right endpoint is inclusive.

"both"

Both endpoints are inclusive.

"neither"

Both endpoints are exclusive.

By default,

  • for offset-based windows, closed="right".

  • otherwise, closed="both".

Return Value

A Window or Rolling object that will be used to compute some statistic.

Examples

Basic usage

Consider the following DataFrame:

df = pd.DataFrame({"A":[2,4,8,10],"B":[4,5,6,7]}, index=["a","b","c","d"])
df
A B
a 2 4
b 4 5
c 8 6
d 10 7

To compute the sum of values with a moving window of size 2:

df.rolling(window=2).sum()
A B
a NaN NaN
b 6.0 9.0
c 12.0 11.0
d 18.0 13.0

Here, note the following:

  • since axis=0 (default), we are computing the statistic (sum) down each column.

  • window=2 means that the sum is computed using two consecutive observations:

    • we get 6.0 in the first column because 2+4=6.

    • we get 12.0 because 4+8=12.

    • we get 18.0 because 8+10=18.

  • we get NaN for the first row because min_periods is equal to what we specify for window for cases like this when the window is not offset-based. This means that the minimum number of observations required to compute the statistic is 2, but for the very first row, we only have one number in the window so NaN is returned.

Specifying center

Consider the following DataFrame:

df = pd.DataFrame({"A":[2,4,8,10]}, index=["a","b","c","d"])
df
A
a 2
b 4
c 8
d 10

By default, center=False, which means that the window will not be centered around an observation:

df.rolling(window=3, min_periods=0).sum() # center=False
A
a 2.0
b 6.0
c 14.0
d 22.0

Here, the numbers are computed like so:

A[a]: 2 = 2
A[b]: 2 + 4 = 6 # the observation is 4 (see how 4 is right-aligned)
A[c]: 2 + 4 + 8 = 14 # the observation is 8
A[d]: 4 + 8 + 10 = 22 # the observation is 10

Compare this with the output of center=True:

df.rolling(window=3, min_periods=0, center=True).sum()
A
a 6.0
b 14.0
c 22.0
d 18.0

Here, the numbers are computed like so:

A[a]: 2 + 4 = 6
A[b]: 2 + 4 + 8 = 14 # the observation is 4 (see how 4 is centered here)
A[c]: 4 + 8 + 10 = 22 # the observation is 8
A[d]: 8 + 10 = 18

Time-series case

Consider the following time-series DataFrame:

idx = [pd.Timestamp('20201220 15:00:00'),
pd.Timestamp('20201220 15:00:01'),
pd.Timestamp('20201220 15:00:02'),
pd.Timestamp('20201220 15:00:04'),
pd.Timestamp('20201220 15:00:05')]
df = pd.DataFrame({"A":[1,10,100,1000,10000]}, index=idx)
df
A
2020-12-20 15:00:00 1
2020-12-20 15:00:01 10
2020-12-20 15:00:02 100
2020-12-20 15:00:04 1000
2020-12-20 15:00:05 10000

Summing a window with a period of 2 seconds:

df.rolling(window="2S").sum()
A
2020-12-20 15:00:00 1.0
2020-12-20 15:00:01 11.0
2020-12-20 15:00:02 110.0
2020-12-20 15:00:04 1000.0
2020-12-20 15:00:05 11000.0

Note that since window is offset-based, the min_periods=1 by default.

You can specify the closed parameter to indicate whether the endpoints should be inclusive/exclusive:

df.rolling(window="2S", closed="both").sum() # both endpoints are inclusive
A
2020-12-20 15:00:00 1.0
2020-12-20 15:00:01 11.0
2020-12-20 15:00:02 111.0
2020-12-20 15:00:04 1100.0
2020-12-20 15:00:05 11000.0
robocat
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down
Comment
Citation
Ask a question or leave a feedback...
thumb_up
1
thumb_down
0
chat_bubble_outline
0
settings
Enjoy our search
Hit / to insta-search docs and recipes!