df = pd.DataFrame({"name":["alex","bob","alex","bob","cathy"], "year":[2012,2012,2013,2013,2013], "height":[150,160,160,170,160]})
df
                
            
               name   year  height
0  alex   2012   150
1  bob    2012   160
2  alex   2013   160
3  bob    2013   170
4  cathy  2013   160

To create a pivot table using this DataFrame:


        
        
            
                
                
                    pd.pivot(df, index="name", columns="year", values="height")
                
            
            year   2012   2013
name  
alex   150.0  160.0
bob    160.0  170.0
cathy  NaN    160.0

Here, note the following:

the name column became the new index
the values of the year column became the new column labels
the values of the height column was used to populate the new DataFrame
values that are not found (e.g. height of Cathy in 2012) resulted in NaN.

Multiple values

Consider the following DataFrame:


        
        
            
                
                
                    df = pd.DataFrame({"name":["alex","bob","alex","bob","cathy"], "year":[2012,2012,2013,2013,2013], "height":[150,160,160,170,160], "weight":[50,60,60,70,60]})
df
                
            
               name   year  height  weight
0  alex   2012   150      50
1  bob    2012   160      60
2  alex   2013   160      60
3  bob    2013   170      70
4  cathy  2013   160      60

Here, in addition to the height column, we have a weight column.

To generate a pivot table using both of these columns pass a list for values like so:


        
        
            
                
                
                    pd.pivot(df, index="name", columns="year", values=["height","weight"])
                
            
                   height          weight
year   2012    2013    2012    2013
name    
alex   150.0   160.0   50.0    60.0
bob    160.0   170.0   60.0    70.0
cathy  NaN     160.0   NaN     60.0

Dealing with ValueError

Consider the following DataFrame:


        
        
            
                
                
                    df = pd.DataFrame({"name":["alex","alex","bob"], "year":[2012,2012,2013], "height":[150,160,160]})
df
                
            
               name  year  height
0  alex  2012  150
1  alex  2012  160
2  bob   2013  160

Here, we have two people called alex whose heights are both taken in 2012.

Creating a pivot table results in an ValueError like so:


        
        
            
                
                
                    pd.pivot(df, index="name", columns="year", values="height")
                
            
            ValueError: Index contains duplicate entries, cannot reshape

To understand why this happens, let us temporarily rename the second alex to *, and create the pivot table:


        
        
            
                
                
                    df = pd.DataFrame({"name":["alex","*","bob"], "year":[2012,2012,2013], "height":[150,160,160]})
pd.pivot(df, index="name", columns="year", values="height")
                
            
            year   2012    2013
name    
*      160.0   NaN
alex   150.0   NaN
bob    NaN     160.0

If * was alex, then it's easy to see that the resulting index would contain duplicate values, which is undesirable in Pandas. This is the cause of the ValueError.

The fix for this ValueError, then, is to ensure that duplicate values in the new index column do not exist. You can use properties like loc and iloc to update the individual entries to avoid duplication.

Pandas | pivot_table method

Pandas pivot_table(~) method converts the input DataFrame into a pivot table.

chevron_right