search
Search
Login
Unlock 100+ guides
menu
menu
web
search toc
close
Comments
Log in or sign up
Cancel
Post
account_circle
Profile
exit_to_app
Sign out
What does this mean?
Why is this true?
Give me some examples!
search
keyboard_voice
close
Searching Tips
Search for a recipe:
"Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview
icon_star
Doc Search
icon_star
Code Search Beta
SORRY NOTHING FOUND!
mic
Start speaking...
Voice search is only supported in Safari and Chrome.
Navigate to

Pandas DataFrame | reindex method

schedule Aug 12, 2023
Last updated
local_offer
PythonPandas
Tags
mode_heat
Master the mathematics behind data science with 100+ top-tier guides
Start your free 7-days trial now!

Pandas DataFrame.reindex(~) method sets a new index for the source DataFrame, and sets NaN to values whose row or column label is new. Check examples for clarification.

Parameters

1. labels | array-like | optional

The new labels to set as the index. Whether to set new labels for the rows or columns is indicated using axis.

2. index | array-like | optional

The new row labels.

3. columns | array-like | optional

The new column labels.

4. axis | int or str | optional

Whether to apply the labels to the index or the columns:

Value

Description

0 or "index"

The labels will be applied to the index (i.e. row labels)

0 or "columns"

The labels will become column labels

NOTE

You can change the labels of the index or the columns in two ways:

  • specify index and/or columns

  • specify labels and axis

It is better to use parameters index or columns than labels and axis since the intent is clearer, and the syntax is shorter.

5. method | None or string | optional

The logic to use when filling missing values:

Value

Description

None

Leave missing values as is.

"pad" or "ffill"

Use the values of the previous row/column.

"backfill" or "bfill"

Use the next values of the next row/column.

"nearest"

Use the values of the nearest row/column.

By default, method=None. Check out our examples for clarification.

WARNING

The method parameter only takes effect when the row or column labels of the source DataFrame are monotonically increasing or decreasing.

6. copy | boolean | optional

Whether or not to create and return a new DataFrame, as opposed to directly modifying the source DataFrame. By default, copy=True.

7. level | string | optional

The level to target. This is only relevant if the source DataFrame is multi-index.

8. fill_value | scalar | optional

The value to fill missing values. By default, fill_value=NaN.

9. limit | int | optional

The maximum number of consecutive missing values to forward/backward fill. By default, limit=None.

10. tolerance | scalar or list | optional

Whether or not to perform filling based on the following criteria:

abs(index[indexer] - target) <= tolerance.

Specifying tolerance without method will result in an error. By default, tolerance=None.

Return Value

A DataFrame with the row labels or column labels updated.

Examples

Consider the following DataFrame:

df = pd.DataFrame({"A":[2,3], "B":[4,5]}, index=["a","b"])
df
A B
a 2 4
b 3 5

Changing the row labels

Changing the index (i.e. the row labels) to "a" and "c":

df.reindex(index=["a","c"])
A B
a 2.0 4.0
c NaN NaN

Here, notice the following:

  • the values at [aA] and [aB] are left as is. This is because [aA] and [aB] both existed in the source DataFrame.

  • the values at [cA] and [cB] are NaN. This is because [cA] and [cB] did not exist in the source DataFrame.

Changing the column labels

Here's the same df we had before:

df
A B
a 2 4
b 3 5

To set new column labels:

df.reindex(columns=["B","D"])
B D
b 4 NaN
d 5 NaN

Here, note the following:

  • the values at [bB] and [dB] are left as is. This is because [bB] and [dB] both existed in the source DataFrame.

  • the values at [Db] and [Dd] are NaN. This is because [Db] and [Dd] did not exist in the source DataFrame.

Specifying method

Consider the following DataFrame:

df = pd.DataFrame({"A":[2,3], "B":[4,5]}, index=["b","d"])
df
A B
b 2 4
d 3 5

None

By default, method=None, which means no filling will be performed so values that have a new row label or column label will be NaN:

df.reindex(index=["a","c"])
A B
a NaN NaN
c NaN NaN

ffill

To fill using the previous values, pass in method="ffill" like so:

df.reindex(index=["a","c"], method="ffill")
A B
a NaN NaN
c 2.0 4.0

Here, note the following:

  • we still have NaN for index "a" because there is no index that is smaller than "a", that is, the source DataFrame contains index "b" and "d", which are both greater than "a".

  • the values in index "c" are filled using the values in index "b" of the source DataFrame. This is because index "b" is the last index that is smaller than index "c".

bfill

Just as reference, here's df again:

df
A B
b 2 4
d 3 5

To fill using the next values, pass in method="bfill" like so:

df.reindex(index=["a","c"], method="bfill")
A B
a 2 4
c 3 5

Here, note the following:

  • the values in row "a" are filled with those in row "b" of the source DataFrame. This is because the next index that is larger than index "a" is index "b".

  • the exact same reasoning applies for how index "c" was filled.

nearest

Although not officially documented, the method="nearest" does not seem to work for strings. Hence, we'll use a DataFrame with an integer index to demonstrate how it works:

df = pd.DataFrame({"A":[2,3], "B":[4,5]}, index=[6,9])
df
A B
6 2 4
9 3 5

To fill using the values using nearest:

df.reindex(index=[7,8], method="nearest")
A B
7 2 4
8 3 5

Here, note the following:

  • index 7 is filled with values of index 6 since index 6 is closest to index 7 of the source DataFrame.

  • index 8 is filled with values of index 9 since index 8 is closest to index 9 of the source DataFrame.

Specifying tolerance

Consider the following DataFrame:

df = pd.DataFrame({"A":[2,3], "B":[4,5]}, index=[3,6])
df
A B
3 2 4
6 3 5

Suppose we wanted to set a new index [5,7] with forward-fill. We can specify a tolerance to dictate whether the forward fill should take effect:

df.reindex(index=[5,7], method="ffill", tolerance=1)
A B
5 NaN NaN
7 3.0 5.0

Here, note the following:

  • the row with index 5 has NaN. This is because abs(3-5)=2, which is greater than the specified tolerance.

  • the row with index 7 has been forward-filled using index 6 of the source DataFrame. This is because abs(6-7)=1, which is less than or equal to the specified tolerance.

robocat
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down
Comment
Citation
Ask a question or leave a feedback...
thumb_up
0
thumb_down
0
chat_bubble_outline
0
settings
Enjoy our search
Hit / to insta-search docs and recipes!