search
Search
Login
Unlock 100+ guides
menu
menu
web
search toc
close
Comments
Log in or sign up
Cancel
Post
account_circle
Profile
exit_to_app
Sign out
What does this mean?
Why is this true?
Give me some examples!
search
keyboard_voice
close
Searching Tips
Search for a recipe:
"Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview
icon_star
Doc Search
icon_star
Code Search Beta
SORRY NOTHING FOUND!
mic
Start speaking...
Voice search is only supported in Safari and Chrome.
Navigate to

Pandas DataFrame | wide_to_long method

schedule Aug 10, 2023
Last updated
local_offer
PythonPandas
Tags
mode_heat
Master the mathematics behind data science with 100+ top-tier guides
Start your free 7-days trial now!

Pandas wide_to_long(~) method converts the format of a DataFrame from wide to long.

Let's go through a quick example. Suppose we have the following DataFrame:

name

age

height

alex

40

150

bob

50

160

This is considered to be a wide DataFrame since each row captures all relevant data about that person. Now, converting this to a long DataFrame:

name

variable

value

alex

age

30

alex

height

150

bob

age

50

bob

height

160

Now, each row captures a single variable about that person, which inevitably results in a vertically long DataFrame.

Pandas uses the term "unpivot" to denote the action of elongating the DataFrame based on a variable. In this example, we are unpivoting the variables age and height.

Parameters

1. dflink | DataFrame

The input DataFrame.

2. stubnameslink | string or list-like

The prefix of the column labels that will elongated.

3. ilink | string or list-like

The columns to use as the identifier.

4. jlink | string

The name assigned to the newly introduced columns.

5. seplink | string | optional

If the column you wish to elongate has the label "A_2012" for instance, then instead of specifying "A_" as the stubnames, you can set stubnames="A" and sep="_". By default, sep="".

6. suffixlink | string | optional

A string or regex that matches the suffix of columns to elongate. By default, suffix="\d+".

Return Value

A DataFrame whose columns have been unpivoted.

Examples

Consider the following DataFrame:

df = pd.DataFrame({"name":["alex","bob","cathy"], "savings_2019":[10,20,30], "savings_2020":[40,50,60]})
df
   name  savings_2019 savings_2020
0  alex      10            40
1  bob       20            50
2  cathy     30            60

Basic usage

To go from a wide DataFrame to a long one:

pd.wide_to_long(df, stubnames=["savings_"], i="name", j="year")
             savings_
name   year
alex   2019    10
bob    2019    20
cathy  2019    30
alex   2020    40
bob    2020    50
cathy  2020    60

Here, notice how the underscore was included in the stubnames ("savings_"). This is because without the _, the method will use suffix like "_2019" instead of just a number "2019" to look for column labels to unpivot. This is a problem because the default suffix parameter ("\d+") only allows for numbers.

The downside of this approach is that the new column name becomes savings_, which is undoubtedly awkward.

Specifying the sep parameter

Instead of "saving_", we can just let stubnames be "saving" and set sep="_" to get the same behaviour. The advantage of this is that our column name would now be savings:

pd.wide_to_long(df, stubnames=["savings"], i="name", j="year", sep="_")
             savings
name   year
alex   2019    10
bob    2019    20
cathy  2019    30
alex   2020    40
bob    2020    50
cathy  2020    60

Specifying the suffix parameter

Consider the following DataFrame:

df = pd.DataFrame({"name":["alex","bob","cathy"], "savings_2019A":[10,20,30], "savings_2020A":[40,50,60]})
df
   name  savings_2019A  savings_2020A
0  alex       10             40
1  bob        20             50
2  cathy      30             60

Here, the difference between this DataFrame and the previous DataFrame is that each column name has an "A" character appended to it.

If we call the wide_to_long(~) just as we did in the previous case, the result will be empty:

pd.wide_to_long(df, ["savings_"], i="name", j="year")
            savings_2019A  savings_2020A  savings_
name year

This is because the column name is now "2019A" instead of a number "2019". Since the default value of suffix only allows for numbers ("\d+"), the column "2019A" does not get unpivoted. We can fix this by setting the regex "\w+", which allows for terms such as "2019A":

pd.wide_to_long(df, ["savings"], i="name", j="year", sep="_", suffix="\w+")
              savings
name   year
alex   2019A    10
bob    2019A    20
cathy  2019A    30
alex   2020A    40
bob    2020A    50
cathy  2020A    60

Just as a side note, you could also use the regex suffix=".*A" to match all column labels that end with "A".

robocat
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down
Comment
Citation
Ask a question or leave a feedback...
thumb_up
1
thumb_down
1
chat_bubble_outline
0
settings
Enjoy our search
Hit / to insta-search docs and recipes!