Pandas DataFrame | mask method
Start your free 7-days trial now!
Pandas DataFrame.mask(~)
replaces all values in the DataFrame that pass a certain criteria with the desired value.
Parameters
1. cond
| array-like
of booleans
A boolean mask, which is an array-like structure (e.g. Series and DataFrame) that contains either True
or False
as its entries.
2. other
| number
or string
or Series
or DataFrame
The values to replace the entries that have True
in cond
.
3. inplace
| boolean
| optional
If
True
, then the method will directly modify the source DataFrame instead of creating a new DataFrame.If
False
, then a new DataFrame will be created and returned.
By default, inplace=False
.
4. axis
| int
| optional
The axis along which to perform the method. By default, axis=None
.
5. level
| int
| optional
The levels on which to perform the method. This is only relevant if your source DataFrame is a multi-index.
6. errors
| string
| optional
Whether to raise or suppress errors:
Value | Description |
---|---|
| Allow for errors to be raised. |
| When error occurs, return the source DataFrame. |
By default, errors="raise"
.
7. try_cast
| boolean
| optional
Whether or not to cast the resulting DataFrame into the source DataFrame's type. By default, try_cast=False
.
Return Value
A DataFrame
with values replaced according to your parameters. Note that the shape is the same as that of the source DataFrame.
Examples
Applying custom masks
Consider the following DataFrame:
df = pd.DataFrame({"A":[1,2], "B":[3,4]})df
A B0 1 31 2 4
Our goal is to replace all values greater than 2
with the value 5
using the mask(~)
method.
In order to use mask(~)
, we first need to prepare the mask like so:
df_mask = df > 2
A B0 False True1 False True
Notice how all the values that fit our condition (value > 2) are flagged as True
, and those that aren't as False
.
Finally, we apply the mask like so:
df.mask(df_mask, 5)
A B0 1 51 2 5
We see that all values greater than 2
(values 3
and 4
in this case) have been replaced by 5
.
Applying Pandas built-in masks
Consider the following DataFrame:
df = pd.DataFrame({"A": [pd.np.NaN,2], "B":[3,pd.np.NaN]})df
A B0 NaN 3.01 2.0 NaN
Our df
contains two missing values. Our goal is to replace these missing values with a value of 5
.
Instead of creating our own boolean mask like we did before, we can leverage Panda DataFrame.isna(~)
method:
df.isna()
A B0 True False1 False True
We can perform the masking operation directly like so:
df.mask(df.isna(), 5)
A B0 5.0 3.01 2.0 5.0
Note that this is just an example to illustrate the use of mask(~)
- to fill missing values, opt to use fillna(~)
instead.
Using a DataFrame as the replacer
In the previous two examples, we have simply replaced all values fulfilling a certain criteria by a single number. The mask(~)
method can also take a DataFrame, which is used when you have multiple values as the replacer.
As an example, consider the following DataFrame:
df = pd.DataFrame({"A":[1,2],"B":[3,4]})df
A B0 1 31 2 4
Once again, let's say we want to modify all values that are greater than 2
.
We prepare the mask like so:
df_mask = df > 2
A B0 False True1 False True
Next, we create the DataFrame to use as our replacer:
df_replacer = pd.DataFrame({"A":[5,6], "B":[7,8]})df_replacer
A B0 5 71 6 8
Finally, use the mask(~)
method to apply our mask:
df.mask(df_mask, df_replacer)
A B0 1 71 2 8
Notice how values in df
that were flagged as True
in df_mask
were replaced by the corresponding entry in df_replacer
.