Updating rows based on column values in Pandas DataFrame
Start your free 7-days trial now!
Filling rows where condition is based on their values with a constant
Consider the following DataFrame:
df = pd.DataFrame({"A":[1,2,3],"B":[4,5,6],"C":[7,8,9]}, index=["a","b","c"])df
A B Ca 1 4 7b 2 5 8c 3 6 9
To fill rows where value for column A
is 1
or value for column C
is greater than or equal to 9
:
df.loc[(df["A"] == 1) | (df["C"] >= 9)] = 0df
A B Ca 0 0 0b 2 5 8c 0 0 0
Here, we are first extracting the following Series
of booleans:
(df["A"] == 1) | (df["C"] >= 9)
a Trueb Falsec Truedtype: bool
Passing in this boolean mask into the loc
property will return the rows that correspond to True
. We then fill these rows with the value 0
using standard assignment (=
).
Filling certain row values where condition is based on their values with a constant
Consider the following DataFrame:
df = pd.DataFrame({"A":[1,2,3],"B":[4,5,6],"C":[7,8,9]}, index=["a","b","c"])df
A B Ca 1 4 7b 2 5 8c 3 6 9
Instead of filling the entire rows with a constant, you can specify which rows to fill like so:
df.loc[(df["A"] == 1) | (df["C"] >= 9), "B"] = 0df
A B Ca 1 0 7b 2 5 8c 3 0 9
Here, the "B"
after the comma indicates that we want to only update column B
, and leave the other column values intact.
Filling rows where condition is based on a function of their values
Consider the following DataFrame:
df = pd.DataFrame({"A":[1,2,3],"B":[4,5,6],"C":[7,8,9]}, index= ["a","b","c"])df
A B Ca 1 4 7b 2 5 8c 3 6 9
To fill rows where the sum of the value for column A
and the value for column B
is greater than 6
:
def criteria(my_df): return my_df["A"] + my_df["B"] > 6
df.loc[criteria] = 0df
A B Ca 1 4 7b 0 0 0c 0 0 0
To clarify, criteria(my_df)
takes in as argument the source DataFrame, and returns a Series
of booleans where True
corresponds to the rows that satisfy the condition:
def criteria(df): print(df["A"] + df["B"] > 6) return df["A"] + df["B"] > 6
df.loc[criteria] = 0
a Falseb Truec Truedtype: bool
The loc
property will then return all the rows that correspond to True
in this boolean mask.
Filling rows using a function of their values
Consider the following DataFrame:
df = pd.DataFrame({"A":[1,2,3],"B":[4,5,6],"C":[7,8,9]}, index= ["a","b","c"])df
A B Ca 1 4 7b 2 5 8c 3 6 9
To double the values of rows where the value for column B
is larger than 4
:
df.loc[df["B"] > 4] = df * 2df
A B Ca 1 4 7b 4 10 16c 6 12 18
Here, loc
returns all the rows where the value for column B
is larger than 4
. These rows are then assigned new values using =
. Note that the assignment only updates the rows returned by loc
, and so the rows that do not satisfy the condition will be kept intact.