Pandas DataFrame | set_index method
Start your free 7-days trial now!
Pandas's DataFrame.set_index(~)
sets the index of the DataFrame using one of its columns.
Parameters
1. keys
| string
or array-like
or list<string>
The names of the column(s) used for the index.
2. drop
link | boolean
| optional
If
True
, then the column used for the index will be deleted.If
False
, then column will be retained.
By default, drop=True
.
3. append
link | boolean
| optional
If
True
, then the columns will be appended to the current index.If
False
, then the columns will replace the current index.
By default, append=False
.
4. inplace
link | boolean
| optional
If
True
, then the source DataFrame will be modified and return.If
False
, then a new DataFrame will be returned.
By default, inplace=False
.
5. verify_integrity
link | boolean
| optional
If
True
, then an error is raised if the new index has duplicates.If
False
, then duplicate indexes are allowed.
By default, verify_integrity=False
.
Return Value
A DataFrame
with a new index.
Examples
Consider the following DataFrame:
df
A B C0 1 3 51 2 4 6
Setting a single column as the index
To set column A
as the index of df
:
df.set_index("A") # Returns a DataFrame
B CA 1 3 52 4 6
Here, the name assigned to the index is the column label, that is, "A"
.
Setting multiple columns as the index
To set columns A
and B
as the index of df
:
df.set_index(["A","B"])
CA B 1 3 52 4 6
Here, the DataFrame ends up with 2 indexes.
Keeping the column used for the index
To keep the column that will be used as the index, set drop=False
:
df.set_index("A", drop=False)
A B CA 1 1 3 52 2 4 6
Notice how the column A
is still there.
Just as reference, here's df
again:
df
A B C0 1 3 51 2 4 6
Appending to the current index
To append a column to the existing index, set append=True
:
df.set_index("A", append=True)
B C A 0 1 3 51 2 4 6
Notice how the original index [0,1]
has been appended to.
Setting an index in-place
To set an index in-place, supply inplace=True
:
df.set_index("A", inplace=True)df
BA 1 32 4
As shown in the output above, by setting inplace=True
, the source DataFrame will be directly modified. Opt to set inplace=True
when you're sure that you won't be needing the source DataFrame since this will save memory.
Verifying integrity
Consider the following DataFrame:
df
A B0 1 31 1 4
By default, verify_integrity=False
, which means that no error will be thrown if the resulting index contains duplicates:
df.set_index("A") # verify_integrity=False
BA 1 31 4
Notice how the new index contains duplicate values (two 1
s), but no error was thrown.
To throw an error in such in cases, pass verify_integrity=True
like so:
df.set_index("A", verify_integrity=True)
ValueError: Index has duplicate keys: Int64Index([1], dtype='int64', name='A')