Pandas DataFrame | reindex method
Start your free 7-days trial now!
Pandas DataFrame.reindex(~)
method sets a new index for the source DataFrame, and sets NaN
to values whose row or column label is new. Check examples for clarification.
Parameters
1. labels
| array-like
| optional
The new labels to set as the index. Whether to set new labels for the rows or columns is indicated using axis
.
2. index
| array-like
| optional
The new row labels.
3. columns
| array-like
| optional
The new column labels.
4. axis
| int
or str
| optional
Whether to apply the labels to the index or the columns:
Value | Description |
---|---|
| The labels will be applied to the index (i.e. row labels) |
| The labels will become column labels |
You can change the labels of the index or the columns in two ways:
specify
index
and/orcolumns
specify
labels
andaxis
It is better to use parameters index
or columns
than labels
and axis
since the intent is clearer, and the syntax is shorter.
5. method
| None
or string
| optional
The logic to use when filling missing values:
Value | Description |
---|---|
| Leave missing values as is. |
| Use the values of the previous row/column. |
| Use the next values of the next row/column. |
| Use the values of the nearest row/column. |
By default, method=None
. Check out our examples for clarification.
The method
parameter only takes effect when the row or column labels of the source DataFrame are monotonically increasing or decreasing.
6. copy
| boolean
| optional
Whether or not to create and return a new DataFrame, as opposed to directly modifying the source DataFrame. By default, copy=True
.
7. level
| string
| optional
The level to target. This is only relevant if the source DataFrame is multi-index.
8. fill_value
| scalar
| optional
The value to fill missing values. By default, fill_value=NaN
.
9. limit
| int
| optional
The maximum number of consecutive missing values to forward/backward fill. By default, limit=None
.
10. tolerance
| scalar
or list
| optional
Whether or not to perform filling based on the following criteria:
abs(index[indexer] - target) <= tolerance.
Specifying tolerance
without method
will result in an error. By default, tolerance=None
.
Return Value
A DataFrame
with the row labels or column labels updated.
Examples
Consider the following DataFrame:
df = pd.DataFrame({"A":[2,3], "B":[4,5]}, index=["a","b"])df
A Ba 2 4b 3 5
Changing the row labels
Changing the index (i.e. the row labels) to "a"
and "c"
:
df.reindex(index=["a","c"])
A Ba 2.0 4.0c NaN NaN
Here, notice the following:
the values at
[aA]
and[aB]
are left as is. This is because[aA]
and[aB]
both existed in the source DataFrame.the values at
[cA]
and[cB]
areNaN
. This is because[cA]
and[cB]
did not exist in the source DataFrame.
Changing the column labels
Here's the same df
we had before:
df
A Ba 2 4b 3 5
To set new column labels:
df.reindex(columns=["B","D"])
B Db 4 NaNd 5 NaN
Here, note the following:
the values at
[bB]
and[dB]
are left as is. This is because[bB]
and[dB]
both existed in the source DataFrame.the values at
[Db]
and[Dd]
areNaN
. This is because[Db]
and[Dd]
did not exist in the source DataFrame.
Specifying method
Consider the following DataFrame:
df = pd.DataFrame({"A":[2,3], "B":[4,5]}, index=["b","d"])df
A Bb 2 4d 3 5
None
By default, method=None
, which means no filling will be performed so values that have a new row label or column label will be NaN
:
df.reindex(index=["a","c"])
A Ba NaN NaNc NaN NaN
ffill
To fill using the previous values, pass in method="ffill"
like so:
df.reindex(index=["a","c"], method="ffill")
A Ba NaN NaNc 2.0 4.0
Here, note the following:
we still have
NaN
for index"a"
because there is no index that is smaller than"a"
, that is, the source DataFrame contains index"b"
and"d"
, which are both greater than"a"
.the values in index
"c"
are filled using the values in index"b"
of the source DataFrame. This is because index"b"
is the last index that is smaller than index"c"
.
bfill
Just as reference, here's df
again:
df
A Bb 2 4d 3 5
To fill using the next values, pass in method="bfill"
like so:
df.reindex(index=["a","c"], method="bfill")
A Ba 2 4c 3 5
Here, note the following:
the values in row
"a"
are filled with those in row"b"
of the source DataFrame. This is because the next index that is larger than index"a"
is index"b"
.the exact same reasoning applies for how index
"c"
was filled.
nearest
Although not officially documented, the method="nearest"
does not seem to work for strings. Hence, we'll use a DataFrame with an integer index to demonstrate how it works:
df = pd.DataFrame({"A":[2,3], "B":[4,5]}, index=[6,9])df
A B6 2 49 3 5
To fill using the values using nearest
:
df.reindex(index=[7,8], method="nearest")
A B7 2 48 3 5
Here, note the following:
index
7
is filled with values of index6
since index6
is closest to index7
of the source DataFrame.index
8
is filled with values of index9
since index8
is closest to index9
of the source DataFrame.
Specifying tolerance
Consider the following DataFrame:
df = pd.DataFrame({"A":[2,3], "B":[4,5]}, index=[3,6])df
A B3 2 46 3 5
Suppose we wanted to set a new index [5,7]
with forward-fill. We can specify a tolerance
to dictate whether the forward fill should take effect:
df.reindex(index=[5,7], method="ffill", tolerance=1)
A B5 NaN NaN7 3.0 5.0
Here, note the following:
the row with index
5
hasNaN
. This is becauseabs(3-5)=2
, which is greater than the specifiedtolerance
.the row with index
7
has been forward-filled using index6
of the source DataFrame. This is becauseabs(6-7)=1
, which is less than or equal to the specifiedtolerance
.