Pandas DataFrame | reindex method
Start your free 7-days trial now!
Pandas DataFrame.reindex(~) method sets a new index for the source DataFrame, and sets NaN to values whose row or column label is new. Check examples for clarification.
Parameters
1. labels | array-like | optional
The new labels to set as the index. Whether to set new labels for the rows or columns is indicated using axis.
2. index | array-like | optional
The new row labels.
3. columns | array-like | optional
The new column labels.
4. axis | int or str | optional
Whether to apply the labels to the index or the columns:
Value | Description |
|---|---|
| The labels will be applied to the index (i.e. row labels) |
| The labels will become column labels |
You can change the labels of the index or the columns in two ways:
specify
indexand/orcolumnsspecify
labelsandaxis
It is better to use parameters index or columns than labels and axis since the intent is clearer, and the syntax is shorter.
5. method | None or string | optional
The logic to use when filling missing values:
Value | Description |
|---|---|
| Leave missing values as is. |
| Use the values of the previous row/column. |
| Use the next values of the next row/column. |
| Use the values of the nearest row/column. |
By default, method=None. Check out our examples for clarification.
The method parameter only takes effect when the row or column labels of the source DataFrame are monotonically increasing or decreasing.
6. copy | boolean | optional
Whether or not to create and return a new DataFrame, as opposed to directly modifying the source DataFrame. By default, copy=True.
7. level | string | optional
The level to target. This is only relevant if the source DataFrame is multi-index.
8. fill_value | scalar | optional
The value to fill missing values. By default, fill_value=NaN.
9. limit | int | optional
The maximum number of consecutive missing values to forward/backward fill. By default, limit=None.
10. tolerance | scalar or list | optional
Whether or not to perform filling based on the following criteria:
abs(index[indexer] - target) <= tolerance.
Specifying tolerance without method will result in an error. By default, tolerance=None.
Return Value
A DataFrame with the row labels or column labels updated.
Examples
Consider the following DataFrame:
df = pd.DataFrame({"A":[2,3], "B":[4,5]}, index=["a","b"])df
A Ba 2 4b 3 5
Changing the row labels
Changing the index (i.e. the row labels) to "a" and "c":
df.reindex(index=["a","c"])
A Ba 2.0 4.0c NaN NaN
Here, notice the following:
the values at
[aA]and[aB]are left as is. This is because[aA]and[aB]both existed in the source DataFrame.the values at
[cA]and[cB]areNaN. This is because[cA]and[cB]did not exist in the source DataFrame.
Changing the column labels
Here's the same df we had before:
df
A Ba 2 4b 3 5
To set new column labels:
df.reindex(columns=["B","D"])
B Db 4 NaNd 5 NaN
Here, note the following:
the values at
[bB]and[dB]are left as is. This is because[bB]and[dB]both existed in the source DataFrame.the values at
[Db]and[Dd]areNaN. This is because[Db]and[Dd]did not exist in the source DataFrame.
Specifying method
Consider the following DataFrame:
df = pd.DataFrame({"A":[2,3], "B":[4,5]}, index=["b","d"])df
A Bb 2 4d 3 5
None
By default, method=None, which means no filling will be performed so values that have a new row label or column label will be NaN:
df.reindex(index=["a","c"])
A Ba NaN NaNc NaN NaN
ffill
To fill using the previous values, pass in method="ffill" like so:
df.reindex(index=["a","c"], method="ffill")
A Ba NaN NaNc 2.0 4.0
Here, note the following:
we still have
NaNfor index"a"because there is no index that is smaller than"a", that is, the source DataFrame contains index"b"and"d", which are both greater than"a".the values in index
"c"are filled using the values in index"b"of the source DataFrame. This is because index"b"is the last index that is smaller than index"c".
bfill
Just as reference, here's df again:
df
A Bb 2 4d 3 5
To fill using the next values, pass in method="bfill" like so:
df.reindex(index=["a","c"], method="bfill")
A Ba 2 4c 3 5
Here, note the following:
the values in row
"a"are filled with those in row"b"of the source DataFrame. This is because the next index that is larger than index"a"is index"b".the exact same reasoning applies for how index
"c"was filled.
nearest
Although not officially documented, the method="nearest" does not seem to work for strings. Hence, we'll use a DataFrame with an integer index to demonstrate how it works:
df = pd.DataFrame({"A":[2,3], "B":[4,5]}, index=[6,9])df
A B6 2 49 3 5
To fill using the values using nearest:
df.reindex(index=[7,8], method="nearest")
A B7 2 48 3 5
Here, note the following:
index
7is filled with values of index6since index6is closest to index7of the source DataFrame.index
8is filled with values of index9since index8is closest to index9of the source DataFrame.
Specifying tolerance
Consider the following DataFrame:
df = pd.DataFrame({"A":[2,3], "B":[4,5]}, index=[3,6])df
A B3 2 46 3 5
Suppose we wanted to set a new index [5,7] with forward-fill. We can specify a tolerance to dictate whether the forward fill should take effect:
df.reindex(index=[5,7], method="ffill", tolerance=1)
A B5 NaN NaN7 3.0 5.0
Here, note the following:
the row with index
5hasNaN. This is becauseabs(3-5)=2, which is greater than the specifiedtolerance.the row with index
7has been forward-filled using index6of the source DataFrame. This is becauseabs(6-7)=1, which is less than or equal to the specifiedtolerance.