df = pd.DataFrame({"A":[3,4,3,3],"B":[6,7,6,9]})
df
                
            
               A  B
0  3  6
1  4  7
2  3  6
3  3  9

Keeping the first occurrence

To remove duplicate rows where the value for column A is duplicate:


        
        
            
                
                
                    df.drop_duplicates(subset=["A"])   # keep="first"
                
            
               A  B
0  3  6
1  4  7

By default, keep="first", which means that the first occurrence of the duplicate row will be kept. This is why row 0 was kept while rows 2 and 3 were removed.

NOTE

By default, inplace=False, which means that the method returns a new DataFrame and the original DataFrame is kept intact. To directly modify the original DataFrame, set inplace=True.

Keeping the last occurrence

To keep only the last occurrence of duplicate rows, set keep="last":


        
        
            
                
                
                    df.drop_duplicates(subset=["A"], keep="last")
                
            
               A  B
1  4  7
3  3  9

Removing all occurrences

To remove all occurrences of duplicate rows, set keep=False:


        
        
            
                
                
                    df.drop_duplicates(subset=["A"], keep=False)
                
            
               A  B
1  4  7

Removing duplicate rows where all column values are duplicate

Consider the same DataFrame as before:


        
        
            
                
                
                    df = pd.DataFrame({"A":[3,4,3,3],"B":[6,7,6,9]})
df
                
            
               A  B
0  3  6
1  4  7
2  3  6
3  3  9

Keeping the first occurrence

To remove duplicate rows where the value for all the columns match:


        
        
            
                
                
                    df.drop_duplicates()   # keep="first"
                
            
               A  B
0  3  6
1  4  7
3  3  9

By default, keep="first", which means that the first occurrence of the duplicate row will be kept. This is why row 0 was kept while row 2 was removed.

Keeping the last occurrence

To remove all occurrences of duplicate rows except the last, set keep="last":


        
        
            
                
                
                    df.drop_duplicates(keep="last")
                
            
               A  B
1  4  7
2  3  6
3  3  9

Removing all occurrences

To remove all occurrences of duplicate rows, set keep=False:


        
        
            
                
                
                    df.drop_duplicates(keep=False)
                
            
               A  B
1  4  7
3  3  9

Pandas DataFrame | drop_duplicates method

Returns a DataFrame with duplicate rows removed.

chevron_right

Published by Isshin Inada

Edited by 0 others

Did you find this page useful?

thumb_up

thumb_down

Comment

Citation

Ask a question or leave a feedback...

thumb_up

thumb_down

chat_bubble_outline

settings

Enjoy our search

Hit / to insta-search docs and recipes!

Removing duplicate rows in Pandas DataFrame

Removing duplicate rows where a single column value is duplicate

Keeping the first occurrence

Keeping the last occurrence

Removing all occurrences

Removing duplicate rows where all column values are duplicate

Keeping the first occurrence

Keeping the last occurrence

Removing all occurrences

Related