Pandas DataFrame | duplicated method
Start your free 7-days trial now!
Pandas DataFrame.duplicated(~)
method returns a Series of booleans where True
represents duplicate rows.
Parameters
1. subset
| string
or array-like
of string
| optional
The label of the columns to consider. By default, all columns are considered.
2. keep
| boolean
or string
| optional
The marking rule for duplicates:
Value | Description |
---|---|
| All duplicates are marked as |
| All duplicates are marked as |
| All duplicates are marked as |
By default, keep="first"
.
Return value
A Series
where True
represents duplicate rows.
Examples
Consider the following DataFrame:
df = pd.DataFrame({"A":[1,2,1], "B":[3,4,3]})df
A B0 1 31 2 42 1 3
Here, the 1st and 3rd rows are duplicate.
Specifying the keep parameter
first
To mark all duplicate rows except the first one:
df.duplicated() # or explicitly set keep="first"
0 False1 False2 Truedtype: bool
last
To mark all duplicate rows except the last one:
df.duplicated(keep="last")
0 True1 False2 Falsedtype: bool
False
To mark all duplicate rows as True
:
df.duplicated(keep=False)
0 True1 False2 Truedtype: bool