Pandas DataFrame | wide_to_long method
Start your free 7-days trial now!
Pandas wide_to_long(~)
method converts the format of a DataFrame from wide to long.
Let's go through a quick example. Suppose we have the following DataFrame:
name | age | height |
---|---|---|
alex | 40 | 150 |
bob | 50 | 160 |
This is considered to be a wide DataFrame since each row captures all relevant data about that person. Now, converting this to a long DataFrame:
name | variable | value |
---|---|---|
alex | age | 30 |
alex | height | 150 |
bob | age | 50 |
bob | height | 160 |
Now, each row captures a single variable about that person, which inevitably results in a vertically long DataFrame.
Pandas uses the term "unpivot" to denote the action of elongating the DataFrame based on a variable. In this example, we are unpivoting the variables age
and height
.
Parameters
1. df
link | DataFrame
The input DataFrame.
2. stubnames
link | string
or list-like
The prefix of the column labels that will elongated.
3. i
link | string
or list-like
The columns to use as the identifier.
4. j
link | string
The name assigned to the newly introduced columns.
5. sep
link | string
| optional
If the column you wish to elongate has the label "A_2012"
for instance, then instead of specifying "A_"
as the stubnames
, you can set stubnames="A"
and sep="_"
. By default, sep=""
.
6. suffix
link | string
| optional
A string or regex that matches the suffix of columns to elongate. By default, suffix="\d+"
.
Return Value
A DataFrame whose columns have been unpivoted.
Examples
Consider the following DataFrame:
df = pd.DataFrame({"name":["alex","bob","cathy"], "savings_2019":[10,20,30], "savings_2020":[40,50,60]})df
name savings_2019 savings_20200 alex 10 401 bob 20 502 cathy 30 60
Basic usage
To go from a wide DataFrame to a long one:
pd.wide_to_long(df, stubnames=["savings_"], i="name", j="year")
savings_name year alex 2019 10bob 2019 20cathy 2019 30alex 2020 40bob 2020 50cathy 2020 60
Here, notice how the underscore was included in the stubnames ("savings_"
). This is because without the _
, the method will use suffix like "_2019"
instead of just a number "2019"
to look for column labels to unpivot. This is a problem because the default suffix
parameter ("\d+"
) only allows for numbers.
The downside of this approach is that the new column name becomes savings_
, which is undoubtedly awkward.
Specifying the sep parameter
Instead of "saving_"
, we can just let stubnames
be "saving"
and set sep="_"
to get the same behaviour. The advantage of this is that our column name would now be savings
:
pd.wide_to_long(df, stubnames=["savings"], i="name", j="year", sep="_")
savingsname year alex 2019 10bob 2019 20cathy 2019 30alex 2020 40bob 2020 50cathy 2020 60
Specifying the suffix parameter
Consider the following DataFrame:
df = pd.DataFrame({"name":["alex","bob","cathy"], "savings_2019A":[10,20,30], "savings_2020A":[40,50,60]})df
name savings_2019A savings_2020A0 alex 10 401 bob 20 502 cathy 30 60
Here, the difference between this DataFrame and the previous DataFrame is that each column name has an "A"
character appended to it.
If we call the wide_to_long(~)
just as we did in the previous case, the result will be empty:
pd.wide_to_long(df, ["savings_"], i="name", j="year")
savings_2019A savings_2020A savings_name year
This is because the column name is now "2019A"
instead of a number "2019"
. Since the default value of suffix only allows for numbers ("\d+"
), the column "2019A"
does not get unpivoted. We can fix this by setting the regex "\w+"
, which allows for terms such as "2019A"
:
pd.wide_to_long(df, ["savings"], i="name", j="year", sep="_", suffix="\w+")
savingsname year alex 2019A 10bob 2019A 20cathy 2019A 30alex 2020A 40bob 2020A 50cathy 2020A 60
Just as a side note, you could also use the regex suffix=".*A"
to match all column labels that end with "A"
.