Pandas | pivot method
Start your free 7-days trial now!
Pandas pivot(~)
method converts the input DataFrame into what is called a pivot table. Please check the examples below to understand what a pivot table is.
Parameters
1. index
link | string
or object
| optional
The label of the column that will become the index of the resulting pivot table.
2. columns
link | string
or object
The label of the columns whose values will become the new column labels.
3. values
link | string
or object
or list
| optional
The label of the columns whose values will fill the resulting pivot table.
If you wish to aggregate values
(e.g. compute the mean, count and so on), use pivot_table(~)
instead.
Return value
A new DataFrame
that represents a pivot table.
Examples
Basic usage
Consider the following DataFrame about the height of people over time:
df = pd.DataFrame({"name":["alex","bob","alex","bob","cathy"], "year":[2012,2012,2013,2013,2013], "height":[150,160,160,170,160]})df
name year height0 alex 2012 1501 bob 2012 1602 alex 2013 1603 bob 2013 1704 cathy 2013 160
To create a pivot table using this DataFrame:
pd.pivot(df, index="name", columns="year", values="height")
year 2012 2013name alex 150.0 160.0bob 160.0 170.0cathy NaN 160.0
Here, note the following:
the
name
column became the new indexthe values of the
year
column became the new column labelsthe values of the
height
column was used to populate the new DataFramevalues that are not found (e.g. height of Cathy in
2012
) resulted inNaN
.
Multiple values
Consider the following DataFrame:
df = pd.DataFrame({"name":["alex","bob","alex","bob","cathy"], "year":[2012,2012,2013,2013,2013], "height":[150,160,160,170,160], "weight":[50,60,60,70,60]})df
name year height weight0 alex 2012 150 501 bob 2012 160 602 alex 2013 160 603 bob 2013 170 704 cathy 2013 160 60
Here, in addition to the height
column, we have a weight
column.
To generate a pivot table using both of these columns pass a list for values
like so:
pd.pivot(df, index="name", columns="year", values=["height","weight"])
height weightyear 2012 2013 2012 2013name alex 150.0 160.0 50.0 60.0bob 160.0 170.0 60.0 70.0cathy NaN 160.0 NaN 60.0
Dealing with ValueError
Consider the following DataFrame:
df = pd.DataFrame({"name":["alex","alex","bob"], "year":[2012,2012,2013], "height":[150,160,160]})df
name year height0 alex 2012 1501 alex 2012 1602 bob 2013 160
Here, we have two people called alex
whose heights are both taken in 2012
.
Creating a pivot table results in an ValueError
like so:
pd.pivot(df, index="name", columns="year", values="height")
ValueError: Index contains duplicate entries, cannot reshape
To understand why this happens, let us temporarily rename the second alex
to *
, and create the pivot table:
df = pd.DataFrame({"name":["alex","*","bob"], "year":[2012,2012,2013], "height":[150,160,160]})pd.pivot(df, index="name", columns="year", values="height")
year 2012 2013name * 160.0 NaNalex 150.0 NaNbob NaN 160.0
If *
was alex
, then it's easy to see that the resulting index would contain duplicate values, which is undesirable in Pandas. This is the cause of the ValueError
.
The fix for this ValueError
, then, is to ensure that duplicate values in the new index column do not exist. You can use properties like loc
and iloc
to update the individual entries to avoid duplication.