The label of the columns whose values will fill the resulting pivot table. These values will be aggregated using the specified aggfunc, which just means that we compute a statistic (e.g. sum, average, max and so on) to summarise these values.

3. indexlink | string or list<string>| optional

The label of the columns that will become the index of the resulting pivot table.

4. aggfunclink | function or list<function> | optional

The function used to aggregate the values. By default, aggfunc=np.mean.

5. fill_valuelink | scalar | optional

The value to replace NaN. By default, fill_value=None.

6. marginslink | boolean | optional

Whether or not to add a new row and column that shows the sum of the values. By default, margins=False.

7. dropnalink | boolean | optional

Whether or not to remove entries with NaN. By default, dropna=True.

8. margins_namelink | string | optional

The label assigned to the new row and column that contain the sum of the values. This is only relevant when margins=True. By default, margins_name="All".

Return Value

A new DataFrame that represents a pivot table.

Examples

Basic usage

Consider the following DataFrame about the bonus received by employees:


        
        
            
                
                
                    df = pd.DataFrame({"name":["alice","alice","bob","alice","bob","bob"], "year":[2012,2012,2012,2013,2013,2013], "bonus":[10,15,15,10,10,20]})
df
                
            
               name   year  bonus
0  alice  2012   10
1  alice  2012   15
2  bob    2012   15
3  alice  2013   10
4  bob    2013   10
5  bob    2013   20

Our goal is to compute the total bonus of each employee per year. We can do this by creating what is called a pivot table using the pivot_table(~) method:


        
        
            
                
                
                    pd.pivot_table(df, values="bonus", index="name", columns="year", aggfunc=np.sum)
                
            
            year   2012  2013
name  
alice   25    10
bob     15    30

Note the following:

the values of column bonus is used to fill the resulting DataFrame.
the name column is assigned as the new index.
the values of the year column (2012 and 2013) became the new column labels.
the aggfunc is summation, as indicated by NumPy's sum(~) method.

Multiple aggregates

We could also compute multiple aggregates at once. In addition to the sum, let's also compute the average bonus of each employee per year:


        
        
            
                
                
                    pd.pivot_table(df, values="bonus", index="name", columns="year", aggfunc=[np.sum, np.mean])
                
            
                   sum          mean
year   2012  2013   2012  2013
name    
alice  25    10     12.5  10.0
bob    15    30     15.0  15.0

Notice how we had to pass in a list of aggregate functions for aggfunc here.

Multiple values

Suppose we want to aggregate two columns this time. To demonstrate this, here's a DataFrame with a new column indicating the number of days absent of each employee per year:


        
        
            
                
                
                    df = pd.DataFrame({"name":["alice","alice","bob","alice","bob","bob"], "year":[2012,2012,2012,2013,2013,2013], "bonus":[10,15,15,10,10,20], "days_absent":[1,2,5,2,3,5]})
df
                
            
               name    year  bonus  days_absent
0  alice   2012   10        1
1  alice   2012   15        2
2  bob     2012   15        5
3  alice   2013   10        2
4  bob     2013   10        3
5  bob     2013   20        5

To compute the total bonus as well as the days absent per year for each employee:


        
        
            
                
                
                    pd.pivot_table(df, values=["bonus","days_absent"], index="name", columns="year", aggfunc=np.sum)
                
            
                   bonus         days_absent
year   2012   2013   2012  2013
name    
alice  25     10      3     2
bob    15     30      5     8

Notice how we passed in an array of column labels for values here.

Multiple indices

Suppose we want to use two columns as the index of the resulting DataFrame. To demonstrate this, here's a DataFrame with employees now identified by their first and last names:


        
        
            
                
                
                    df = pd.DataFrame({"first_name":["alice","alice","bob","alice","bob","bob"], "last_name":["A","B","A","A","A","B"], "year":[2012,2012,2012,2013,2013,2013], "bonus":[10,15,15,10,10,20]})
df
                
            
               first_name  last_name  year  bonus
0  alice          A       2012  10
1  alice          B       2012  15
2  bob            A       2012  15
3  alice          A       2013  10
4  bob            A       2013  10
5  bob            B       2013  20

Now, alice A and alice B represent two different employees, so it would make sense to differentiate them when computing the total bonus per year for each employee. We do this by passing in a list for the index parameter:


        
        
            
                
                
                    pd.pivot_table(df, values=["bonus"], index=["first_name", "last_name"], columns="year", aggfunc=np.sum)
                
            
                                    bonus
            year        2012    2013
first_name  last_name  
alice          A        10.0    10.0
               B        15.0    NaN
bob            A        15.0    10.0
               B        NaN     20.0

We get NaN for the total bonus of alice B in 2013 since it does not exist in df.

Filling missing values

Consider the following DataFrame:


        
        
            
                
                
                    df = pd.DataFrame({"name":["alice","bob","bob"], "year":[2012,2012,2013], "bonus":[10,15,20]})
                
            
               name   year   bonus
0  alice  2012   10
1  bob    2012   15
2  bob    2013   20

Notice how we don't have data about Alice's bonus in 2013.

By default, missing values will be indicated by NaN in the resulting pivot table:


        
        
            
                
                
                    pd.pivot_table(df, values="bonus", index="name", columns="year", aggfunc=np.sum)
                
            
            year   2012  2013
name  
alice  10.0  NaN
bob    15.0  20.0

We can choose to fill these NaN with our own values by passing in the fill_value parameter:


        
        
            
                
                
                    pd.pivot_table(df, values="bonus", index="name", columns="year", aggfunc=np.sum, fill_value=-1)
                
            
            year   2012  2013
name  
alice   10   -1
bob     15   20

Specifying margin and margins_name

Consider the following DataFrame:


        
        
            
                
                
                    df = pd.DataFrame({"name":["alice","bob","alice","bob"], "year":[2012,2012,2013,2013], "bonus":[10,15,20,30]})
df
                
            
               name   year  bonus
0  alice  2012  10
1  bob    2012  15
2  alice  2013  20
3  bob    2013  30

By setting margins=True, we obtain an additional row and column that holds the total sum of the values:


        
        
            
                
                
                    pd.pivot_table(df, values="bonus", index="name", columns="year", aggfunc=np.sum, margins=True)
                
            
            year   2012  2013  All
name   
alice  10    20    30
bob    15    30    45
All    25    50    75

Note the following:

the row sum tells us the total bonus earned by each employee
the column sum is the total bonus earned per year of all employees
the bottom-right entry tells us the total bonus earned over the 2 years of all employees

By default, these newly introduced row and column are assigned the label "All". We can assign our own label using the margins_name parameter:


        
        
            
                
                
                    pd.pivot_table(df, ~~~, margins=True, margins_name="Total")
                
            
            year   2012  2013  Total
name   
alice  10    20    30
bob    15    30    45
Total  25    50    75

Specifying dropna

Consider the following DataFrame:


        
        
            
                
                
                    df = pd.DataFrame({"name":[np.NaN,np.NaN,"alice","bob"], "year":[2012,2012,2013,2013], "bonus":[10,15,20,25]})
df
                
            
               name   year  bonus
0  NaN    2012   10
1  NaN    2012   15
2  alice  2013   20
3  bob    2013   25

Here, we're missing the names of the employee for the year 2012. This means that the resulting pivot table would have column corresponding to 2012 filled with NaN.

By default, dropna=True, which means that columns whose entries are all NaN in the resulting pivot table will be removed:


        
        
            
                
                
                    pd.pivot_table(df, values="bonus", index="name", columns="year", aggfunc=np.sum)
                
            
            year   2013
name 
alice  20
bob    25

We can choose to keep the NaN by setting dropna=False like so:


        
        
            
                
                
                    pd.pivot_table(df, values="bonus", index="name", columns="year", aggfunc=np.sum, dropna=False)
                
            
            year   2012  2013
name  
alice  NaN   20
bob    NaN   25