df = pd.DataFrame({"A":[3,4],"B":[5,6],"C":[7,8]})
df = df.rename(columns={"C":"A"})
df
                
            
               A  B  A
0  3  5  7
1  4  6  8

Here, we had to use the rename(~) method since the DataFrame(~) constructor automatically eliminates columns with duplicate labels.

Keeping first occurrence of duplicates

To drop columns with duplicate labels except the first occurrence, use duplicated(~) like so:


        
        
            
                
                
                    df.loc[:, ~df.columns.duplicated()]
                
            
               A  B
0  3  5
1  4  6

Explanation

Here, we first fetch the column labels (Index object) using the columns property. We then call duplicated(~), which returns a NumPy array of booleans where True indicates the presence of a duplicate label:


        
        
            
                
                
                    df.columns.duplicated()
                
            
            array([False, False,  True])

By default, keep="first" for duplicated(~) and so the first occurrence of a non-unique value will be marked as False.

Dropping duplicate columns is equivalent to keeping all non-duplicate columns, and so we invert the booleans using ~:


        
        
            
                
                
                    ~df.columns.duplicated()
                
            
            array([ True,  True, False])

Finally, we pass this boolean mask into loc to get the columns that correspond to True in the mask:


        
        
            
                
                
                    df.loc[:, ~df.columns.duplicated()]
                
            
               A  B
0  3  5
1  4  6

Here, the : before the comma indicates that we want to fetch all the rows.

Dropping all occurrences of duplicates

Consider the same df as above:

To drop all occurrences of duplicates:


        
        
            
                
                
                    df.loc[:, ~df.columns.duplicated(keep=False)]
                
            
               B
0  5
1  6

Explanation

The only difference between this and the previous case is that we set the parameter keep=False, and so duplicated(~) here returns True for all occurrences of non-unique values (as opposed to the default behaviour of returning False for the first occurrence):


        
        
            
                
                
                    df.columns.duplicated(keep=False)
                
            
            array([ True, False,  True])

Dropping columns with same values

Consider the following DataFrame:


        
        
            
                
                
                    df = pd.DataFrame({"A":[3,4],"B":[5,6],"C":[3,4]})
df
                
            
               A  B  C
0  3  5  3
1  4  6  4

Here, columns A and C contain the same values.

To drop duplicate columns:


        
        
            
                
                
                    df.T.drop_duplicates().T
                
            
               A  B
0  3  5
1  4  6

By default, keep="first" for drop_duplicates(~), which means that the first occurrence of the duplicates (column A) is kept. To remove all occurrences instead, set keep=False.

Explanation

There is no direct way of removing duplicate columns, but Pandas does offer the method drop_duplicates(), which removes duplicate rows. Therefore, we take the transpose of df using the T property:

We then call drop_duplicates() to remove the duplicate rows:


        
        
            
                
                
                    df.T.drop_duplicates()
                
            
               0  1
A  3  4
B  5  6

Finally, we take the transpose again to get back the original shape.

Pandas DataFrame | duplicated method

Returns a Series of booleans where True represents duplicate rows.

chevron_right

Pandas DataFrame | drop_duplicates method

Returns a DataFrame with duplicate rows removed.

chevron_right