search
Search
Login
Unlock 100+ guides
menu
menu
web
search toc
close
Comments
Log in or sign up
Cancel
Post
account_circle
Profile
exit_to_app
Sign out
What does this mean?
Why is this true?
Give me some examples!
search
keyboard_voice
close
Searching Tips
Search for a recipe:
"Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview
icon_star
Doc Search
icon_star
Code Search Beta
SORRY NOTHING FOUND!
mic
Start speaking...
Voice search is only supported in Safari and Chrome.
Navigate to

Pandas DataFrame | update method

schedule Aug 12, 2023
Last updated
local_offer
PythonPandas
Tags
mode_heat
Master the mathematics behind data science with 100+ top-tier guides
Start your free 7-days trial now!

Pandas DataFrame.update(~) method replaces the values in the source DataFrame using non-NaN values from another DataFrame.

WARNING

The update is done in-place, which means that the source DataFrame will be directly modified.

Parameters

1. otherlink | Series or DataFrame

The Series or DataFrame that holds the values to update the source DataFrame.

  • If a Series is provided, then its name attribute must match the name of the column you wish to update.

  • If a DataFrame is provided, then the column names must match.

2. overwritelink | boolean | optional

  • If True, then all values in the source DataFrame will be updated using other.

  • If False, then only NaN values in the source DataFrame will be updated using other.

By default, overwrite=True.

3. filter_funclink | function | optional

The values you wish to update. The function takes in a column as a 1D Numpy array, and returns an 1D array of booleans that indicate whether or not a value should be updated.

4. errorslink | string | optional

Whether or not to raise errors:

Value

Description

"raise"

An error will be raised if a non-NaN value is to be updated by another non-NaN value.

"ignore"

No error will be raised.

By default, errors="ignore".

Return value

Nothing is returned since the update is performed in-place. This means that the source DataFrame will be directly modified.

Examples

Basic usage

Consider the following DataFrames:

df = pd.DataFrame({"A":[1,2], "B":[3,4]})
df_other = pd.DataFrame({"B":[5,6], "C":[7,8]})
  [df]     [df_other]
   A  B        B  C
0  1  3     0  5  7
1  2  4     1  6  8

Notice how the two DataFrames both have a column with label B. Performing the update gives:

df.update(df_other)
df
   A  B
0  1  5
1  2  6

The values in column B of the original DataFrame have been replaced by those in column B of the other DataFrame.

Case when other DataFrame contains missing values

Consider the following DataFrames:

df = pd.DataFrame({"A":[1,2], "B":[3,4]})
df_other = pd.DataFrame({"B":[5,np.NaN], "C":[7,8]})
  [df]     [df_other]
   A  B        B    C 
0  1  3     0  5.0  7
1  2  4     1  NaN  8

Notice how the other DataFrame has a NaN.

Performing the update gives:

df.update(df_other)
df
   A  B
0  1  5.0
1  2  4.0

The takeaway here is that if the new value is a missing value, then no update is performed for that value.

Specifying the overwrite parameter

Consider the following DataFrames:

df = pd.DataFrame({"A":[1,2], "B":[3,np.NaN]})
df_other = pd.DataFrame({"B":[5,6], "C":[7,8]})
   [df]      [df_other]
   A  B          B  C
0  1  3       0  5  7
1  2  NaN     1  6  8

Performing the update with default parameter overwrite=True gives:

df.update(df_other)
df
   A  B
0  1  5.0
1  2  6.0

Notice how all the values in column B of the source DataFrame got updated.

Now, let's compare this with overwrite=False:

df.update(df_other, overwrite=False)
df
   A  B
0  1  3.0
1  2  6.0

Here, the value 3 was left intact, while the NaN was replaced by the corresponding value of 6. This is because overwrite=False ensures that only NaNs get updated, while non-NaN values remain the unchanged.

Specifying the filter_func parameter

Consider the following DataFrames:

df = pd.DataFrame({"A":[1,2], "B":[3,4]})
df_other = pd.DataFrame({"B":[5,6], "C":[7,8]})
   [df]      [df_other]
   A  B          B  C
0  1  3       0  5  7
1  2  4       1  6  8

Suppose we only wanted to only update values that were larger than 3. We could do so by specifying a custom function like so:

def foo(vals):
   return vals > 3

df.update(df_other, filter_func=foo)
df
   A  B
0  1  3
1  2  6

Notice how the value 3 was left unchanged.

Specifying the errors parameter

Consider the following DataFrames:

df = pd.DataFrame({"A":[1,2], "B":[3,4]})
df_other = pd.DataFrame({"B":[5,6]})
   [df]    [df_other]
   A  B          B
0  1  3       0  5
1  2  4       1  6

Performing the update with the default parameter errors="ignore" gives:

df.update(df_other)   # errors="ignore"
df
   A  B
0  1  5
1  2  6

The update completes without any error, even if non-NaN values are updated with non-NaN values.

Performing the update with errors="raise" gives:

df.update(df_other, errors="raise")
df
ValueError: Data overlaps.

We end up with an error because we are trying to update non-NaN values with non-NaN values. Note that if column B in df_other just had NaN as its values, then no error will be thrown.

robocat
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down
Comment
Citation
Ask a question or leave a feedback...