Getting rows with missing values (NaNs) in certain columns in Pandas DataFrame
Start your free 7-days trial now!
Example
Consider the following DataFrame:
df = pd.DataFrame({"A":[np.nan,3,np.nan],"B":[4,5,6],"C":[np.nan,7,8]}, index=["a","b","c"])df
A B Ca NaN 4 NaNb 3.0 5 7.0c NaN 6 8.0
Solution - single column case
To get rows with missing values in column C
:
df[df["C"].isna()]
A B Ca NaN 4 NaN
Explanation
We first fetch column C
as a Series
:
df["C"]
a NaNb 7.0c 8.0Name: C, dtype: float64
We then use the isna()
method, which returns a Series
of booleans where True
indicates the presence of a missing value:
df["C"].isna()
a Trueb Falsec FalseName: C, dtype: bool
With this boolean mask, we can then extract rows that correspond to True
using []
syntax:
df[df["C"].isna()]
A B Ca NaN 4 NaN
Solution - multiple columns case (OR)
Consider the same df
as above:
df = pd.DataFrame({"A":[np.nan,3,np.nan],"B":[4,5,6],"C":[np.nan,7,8]}, index=["a","b","c"])df
A B Ca NaN 4 NaNb 3.0 5 7.0c NaN 6 8.0
To get rows with missing values in columns A
or C
:
df[df[["A","C"]].isna().any(axis=1)]
A B Ca NaN 4 NaNc NaN 6 8.0
Explanation
We start off by extracting columns A
and C
:
df[["A","C"]]
A Ca NaN NaNb 3.0 7.0c NaN 8.0
We then use the isna()
method, which returns a Series
of booleans where True
indicates the presence of a missing value:
df[["A","C"]].isna()
A Ca True Trueb False Falsec True False
We then use any(axis=1)
to obtain a Series
where True
represents the presence of at least one True
in each row:
df[["A","C"]].isna().any(axis=1)
a Trueb Falsec Truedtype: bool
The parameter axis=1
is needed here since the default behaviour of any(~)
is to scan through each column (as opposed to each row).
Finally, with this boolean mask, we can then extract rows that correspond to True
using []
syntax:
df[df[["A","C"]].isna().any(axis=1)]
A B Ca NaN 4 NaNc NaN 6 8.0
Solution - multiple columns case (AND)
The solution is identical to the OR
case except that we use all(axis=1)
instead of any(~)
.
For instance, to find rows with missing values in both columns A
and C
:
df[df[["A","C"]].isna().all(axis=1)]
A B Ca NaN 4 NaN