Pandas DataFrame | stack method
Start your free 7-days trial now!
Pandas DataFrame.stack(~)
method converts the specified column levels to row levels. This is the reverse of unstack(~)
.
Parameters
1. level
link | int
or string
| optional
The integer index or name(s) of the column level to convert into a row level. By default, level=-1
, which means that the inner-most column level is converted.
2. dropna
link | boolean
| optional
Whether or not to drop resulting rows that contain just NaN
. By default, dropna=True
.
Return Value
A Series
or a DataFrame
.
Examples
Stacking single-level DataFrames
Consider the following single-level DataFrame:
df = pd.DataFrame([[2,3],[4,5]], columns=["alice","bob"], index=["age","height"])df
alice bobage 2 3height 4 5
Calling stack()
on df
gives:
df.stack()
age alice 2 bob 3height alice 4 bob 5dtype: int64
Here, note the following:
the return type is
Series
with a 2-level index.the row labels and the column labels in
df
have merged to form a multi-index.
Stacking DataFrames with multi-level columns
Consider the following DataFrame with multi-level columns:
index = [("A", "alice"), ("A", "bob"), ("B","cathy")]multi_index = pd.MultiIndex.from_tuples(index)df = pd.DataFrame([[2,3,4],[5,6,7]], columns=multi_index, index=["age","height"])df
A B alice bob cathyage 2 3 4height 5 6 7
By default, level=-1
, which means that the inner-most column level ([alice,bob,cathy]
) will be converted into a row level:
df.stack()
A Bage alice 2.0 NaN bob 3.0 NaN cathy NaN 4.0height alice 5.0 NaN bob 6.0 NaN cathy NaN 7.0
Note the following:
the inner-most column level (
[alice, bob, cathy]
) became a row index, and is positioned as the inner-most level.stacking columns with multi-levels often yield many
NaN
since, for instance, no data exists about theage
ofalice
in groupB
.
To specify which levels to convert, pass the level
parameter like so:
df.stack(level=0)
alice bob cathyage A 2.0 3.0 NaN B NaN NaN 4.0height A 5.0 6.0 NaN B NaN NaN 7.0
Here, level=0
means that that outermost column level ([A,B]
) is converted into a row level.
Specifying dropna
Consider the following DataFrame:
index = [("A", "alice"), ("A", "bob"), ("B","cathy")]multi_index = pd.MultiIndex.from_tuples(index)df = pd.DataFrame([[2,3,None],[5,6,7]], columns=multi_index, index=["age","height"])df
A B alice bob cathyage 2 3 NaNheight 5 6 7.0
By default, dropna=True
, which means that rows that contain just NaN
will be removed from the result:
df.stack()
A Bage alice 2.0 NaN bob 3.0 NaNheight alice 5.0 NaN bob 6.0 NaN cathy NaN 7.0
Notice how cathy
's row for the age
level is missing. This is because it only contains NaN
.
To keep all rows, pass dropna=False
like so:
df.stack(dropna=False)
A Bage alice 2.0 NaN bob 3.0 NaN cathy NaN NaNheight alice 5.0 NaN bob 6.0 NaN cathy NaN 7.0
Notice how we now have cathy
's row under age
.