Pandas DataFrame | expanding method
Start your free 7-days trial now!
Pandas DataFrame.expanding(~)
method is used to compute cumulative statistics.
Parameters
1. min_periods
| int
| optional
The minimum number of values in a window to compute the statistic. If the number of observations in a window is less than min_periods
, then a NaN
is returned for that window. By default, min_periods=1
.
A window is essentially a sequence of numbers (observations) from which a statistic is computed.
2. center
| boolean
| optional
If
True
, then the observation is set to the center of the window.If
False
, then the observation is set to the right of the window.
By default, center=False
. Consult examples below for clarification.
3. axis
| int
or string
| optional
Whether to perform expansion row-wise or column-wise:
Axis | Description |
---|---|
| Expand column-wise. |
| Expand row-wise. |
Return Value
A Window
object that will be subsequently used to compute some cumulative statistics.
Examples
Computing the cumulative sum
Consider the following DataFrame:
df = pd.DataFrame({"A":[2,4,8],"B":[4,5,6]})df
A B0 2 41 4 52 8 6
Column-wise
To compute the cumulative sum column-wise:
df.expanding().sum()
A B0 2.0 4.01 6.0 9.02 14.0 15.0
Row-wise
To compute the cumulative sum row-wise:
df.expanding(axis=1).sum()
A B0 2.0 6.01 4.0 9.02 8.0 14.0
This example is just to demonstrate how to use the method. Pandas has a method DataFrame.cumsum(~)
that directly computes the cumulative sums.
Specifying min_periods
Consider the same DataFrame as above:
df = pd.DataFrame({"A":[2,4,8],"B":[4,5,6]})df
A B0 2 41 4 52 8 6
Calling expanding(~)
with min_periods=2
:
df.expanding(2).sum()
A B0 NaN NaN1 6.0 9.02 14.0 15.0
Here, we get NaN
for the first row because the very first cumulative sum is computed using only 1 value, which is lower than the specified minimum of 2.
Specifying the center parameter
Consider the following DataFrame:
df = pd.DataFrame({"A": [2,4,8,16,25]})df
A0 21 42 83 164 25
Calling expanding(~)
with the center
parameter set to True
:
df.expanding(center=True).sum()
A0 14.01 30.02 55.03 53.04 49.0
The size of the initial selection of values is determined as follows:
ceil(num_of_rows/2) = 3
Here's a breakdown of how the numbers are computed:
A[0]: sum(2, 4, 8) = 14A[1]: sum(2, 4, 8, 16) = 30A[2]: sum(2, 4, 8, 16, 25) = 55A[3]: sum(4, 8, 16, 25) = 53A[4]: sum(8, 16, 25) = 49