Pandas DataFrame | update method
Start your free 7-days trial now!
Pandas DataFrame.update(~)
method replaces the values in the source DataFrame using non-NaN
values from another DataFrame.
The update is done in-place, which means that the source DataFrame will be directly modified.
Parameters
1. other
link | Series
or DataFrame
The Series
or DataFrame
that holds the values to update the source DataFrame.
If a
Series
is provided, then its name attribute must match the name of the column you wish to update.If a
DataFrame
is provided, then the column names must match.
2. overwrite
link | boolean
| optional
If
True
, then all values in the source DataFrame will be updated usingother
.If
False
, then onlyNaN
values in the source DataFrame will be updated usingother
.
By default, overwrite=True
.
3. filter_func
link | function
| optional
The values you wish to update. The function takes in a column as a 1D Numpy array, and returns an 1D array of booleans that indicate whether or not a value should be updated.
4. errors
link | string
| optional
Whether or not to raise errors:
Value | Description |
---|---|
| An error will be raised if a non- |
| No error will be raised. |
By default, errors="ignore"
.
Return value
Nothing is returned since the update is performed in-place. This means that the source DataFrame will be directly modified.
Examples
Basic usage
Consider the following DataFrames:
Notice how the two DataFrames both have a column with label B
. Performing the update gives:
df.update(df_other)df
A B0 1 51 2 6
The values in column B
of the original DataFrame have been replaced by those in column B
of the other DataFrame.
Case when other DataFrame contains missing values
Consider the following DataFrames:
Notice how the other
DataFrame has a NaN
.
Performing the update gives:
df.update(df_other)df
A B0 1 5.01 2 4.0
The takeaway here is that if the new value is a missing value, then no update is performed for that value.
Specifying the overwrite parameter
Consider the following DataFrames:
Performing the update with default parameter overwrite=True
gives:
df.update(df_other)df
A B0 1 5.01 2 6.0
Notice how all the values in column B
of the source DataFrame got updated.
Now, let's compare this with overwrite=False
:
df.update(df_other, overwrite=False)df
A B0 1 3.01 2 6.0
Here, the value 3
was left intact, while the NaN
was replaced by the corresponding value of 6
. This is because overwrite=False
ensures that only NaN
s get updated, while non-NaN
values remain the unchanged.
Specifying the filter_func parameter
Consider the following DataFrames:
Suppose we only wanted to only update values that were larger than 3
. We could do so by specifying a custom function like so:
def foo(vals): return vals > 3
df.update(df_other, filter_func=foo)df
A B0 1 31 2 6
Notice how the value 3
was left unchanged.
Specifying the errors parameter
Consider the following DataFrames:
Performing the update with the default parameter errors="ignore"
gives:
df.update(df_other) # errors="ignore"df
A B0 1 51 2 6
The update completes without any error, even if non-NaN
values are updated with non-NaN
values.
Performing the update with errors="raise"
gives:
df.update(df_other, errors="raise")df
ValueError: Data overlaps.
We end up with an error because we are trying to update non-NaN
values with non-NaN
values. Note that if column B in df_other just had NaN as its values, then no error will be thrown.