Getting rows with missing values (NaNs) in Pandas DataFrame
Start your free 7-days trial now!
Example
Consider the following DataFrame:
df = pd.DataFrame({"A":[np.nan,3,np.nan],"B":[4,5,6],"C":[np.nan,7,8]}, index=["a","b","c"])df
A B Ca NaN 4 NaNb 3.0 5 7.0c NaN 6 8.0
Rows with at least one missing value
Solution
To get rows with missing values in df
:
df[df.isna().any(axis=1)]
A B Ca NaN 4 NaNc NaN 6 8.0
Explanation
The isna()
method returns a DataFrame of booleans where True
indicates the presence of a missing value:
df.isna()
A B Ca True False Trueb False False Falsec True False False
We then use any(axis=1)
to obtain a Series
where True
represents the presence of at least one True
in each row:
df.isna().any(axis=1)
a Trueb Falsec Truedtype: bool
The parameter axis=1
is needed here since the default behaviour of any(~)
is to scan through each column (as opposed to each row).
With this boolean mask, we can then extract rows that correspond to True
using []
syntax:
df[df.isna().any(axis=1)]
A B Ca NaN 4 NaNc NaN 6 8.0
Rows with missing value for a certain column
We show the same df
here for your reference:
df = pd.DataFrame({"A":[np.nan,3,np.nan],"B":[4,5,6],"C":[np.nan,7,8]}, index=["a","b","c"])df
A B Ca NaN 4 NaNb 3.0 5 7.0c NaN 6 8.0
Solution
To get rows with missing value in column C
:
df[df["C"].isna()]
A B Ca NaN 4 NaN
Explanation
We first begin by extracting column C
as a Series:
df["C"]
a NaNb 7.0c 8.0Name: C, dtype: float64
Next, we use the Series' isna()
method to get a Series of booleans where True
indicates the presence of NaN
:
df["C"].isna()
a Trueb Falsec FalseName: C, dtype: bool
Finally, we pass in this boolean mask to extract the rows corresponding to True
using [~]
syntax:
df[df["C"].isna()]
A B Ca NaN 4 NaN
Rows with missing value for multiple columns
We show the same df
here for your reference:
df = pd.DataFrame({"A":[np.nan,3,np.nan],"B":[4,5,6],"C":[np.nan,7,8]}, index=["a","b","c"])df
A B Ca NaN 4 NaNb 3.0 5 7.0c NaN 6 8.0
Solution
To get rows with missing value in columns A
and C
:
df[df[["A","C"]].isna().all(axis=1)]
A B Ca NaN 4 NaN
Explanation
We first fetch columns A
and C
as a DataFrame using [~]
syntax:
df[["A","C"]]
A Ca NaN NaNb 3.0 7.0c NaN 8.0
We then use the isna()
method to get a DataFrame of booleans where True
indicates the presence of NaN
:
df[["A","C"]].isna()
A Ca True Trueb False Falsec True False
Next, we use all(axis=1)
get a Series of booleans where True
indicates a row with all True
s:
df[["A","C"]].isna().all(axis=1)
a Trueb Falsec Falsedtype: bool
Finally, we use the []
syntax to extract the rows corresponding to True
:
df[df[["A","C"]].isna().all(axis=1)]
A B Ca NaN 4 NaN
Rows with missing values for all columns
Consider the following DataFrame:
df = pd.DataFrame({"A":[np.nan,np.nan],"B":[np.nan,4]}, index=["a","b"])df
A Ba NaN NaNb NaN 4.0
Solution
To get rows with missing values for all columns:
df[df.isna().all(axis=1)]
A Ba NaN NaN
Explanation
The logic is exactly the same as the case for getting rows with at least one missing value, except that we use all(~)
instead of any(~)
. The difference is as follows: