df = pd.DataFrame({"A":[1,2], "B":[3,4], "C":[5,6]})
df
                
            
               A  B  C
0  1  3  5
1  2  4  6

Setting a single column as the index

To set column A as the index of df:


        
        
            
                
                
                    df.set_index("A")      # Returns a DataFrame
                
            
               B  C
A		
1  3  5
2  4  6

Here, the name assigned to the index is the column label, that is, "A".

Setting multiple columns as the index

To set columns A and B as the index of df:


        
        
            
                
                
                    df.set_index(["A","B"])
                
            
                  C
A  B	
1  3  5
2  4  6

Here, the DataFrame ends up with 2 indexes.

Keeping the column used for the index

To keep the column that will be used as the index, set drop=False:


        
        
            
                
                
                    df.set_index("A", drop=False)
                
            
               A  B  C
A			
1  1  3  5
2  2  4  6

Notice how the column A is still there.

Just as reference, here's df again:

Appending to the current index

To append a column to the existing index, set append=True:


        
        
            
                
                
                    df.set_index("A", append=True)
                
            
                  B  C
   A		
0  1  3  5
1  2  4  6

Notice how the original index [0,1] has been appended to.

Setting an index in-place

To set an index in-place, supply inplace=True:


        
        
            
                
                
                    df.set_index("A", inplace=True)
df
                
            
               B
A	
1  3
2  4

As shown in the output above, by setting inplace=True, the source DataFrame will be directly modified. Opt to set inplace=True when you're sure that you won't be needing the source DataFrame since this will save memory.

Verifying integrity

Consider the following DataFrame:


        
        
            
                
                
                    df = pd.DataFrame({"A":[1,1],"B":[3,4]})
df
                
            
               A  B
0  1  3
1  1  4

By default, verify_integrity=False, which means that no error will be thrown if the resulting index contains duplicates:


        
        
            
                
                
                    df.set_index("A")   # verify_integrity=False
                
            
               B
A   
1  3
1  4

Notice how the new index contains duplicate values (two 1s), but no error was thrown.

To throw an error in such in cases, pass verify_integrity=True like so:


        
        
            
                
                
                    df.set_index("A", verify_integrity=True)
                
            
            ValueError: Index has duplicate keys: Int64Index([1], dtype='int64', name='A')

Published by Isshin Inada

Edited by 0 others

Did you find this page useful?

thumb_up

thumb_down

Comment

Citation

Ask a question or leave a feedback...

Official Pandas Documentation

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.set_index.html

thumb_up

thumb_down

chat_bubble_outline

settings

Enjoy our search

Hit / to insta-search docs and recipes!