Getting index of rows with missing values (NaNs) in Pandas DataFrame
Start your free 7-days trial now!
Getting index (row label)
Consider the following DataFrame with some missing values:
df = pd.DataFrame({"A":[3,pd.np.NaN,5],"B":[6,7,pd.np.NaN]}, index=["a","b","c"])df
A Ba 3.0 6.0b NaN 7.0c 5.0 NaN
Solution
To get the index of rows with missing values in Pandas optimally:
temp = df.isna().any(axis=1)temp[temp].index
Index(['b', 'c'], dtype='object')
Explanation
We first check for the presence of NaN
s using isna()
, which returns a DataFrame of booleans where True
indicates the presence of a NaN
:
df.isna()
A Ba False Falseb True Falsec False True
Next, we use any(axis=1)
, which scans each row and returns a Series of booleans where True
indicates a row with at least one True
:
df.isna().any(axis=1)
a Falseb Truec Truedtype: bool
We then temporarily store this intermediate result in a variable called temp
. Our goal now is to extract the Index where the corresponding value is True
(b
and c
in this case).
We first exclude indexes where the corresponding values is False
by treating temp
as a boolean mask:
temp[temp]
b Truec Truedtype: bool
Finally, all we need to do is to access the index property of this Series:
temp[temp].index
Index(['b', 'c'], dtype='object')
Getting integer index
Again, consider the same df
as above:
df = pd.DataFrame({"A":[3,pd.np.NaN,5],"B":[6,7,pd.np.NaN]}, index=["a","b","c"])df
A Ba 3.0 6.0b NaN 7.0c 5.0 NaN
Solution
To get the integer indexes of rows with missing values:
np.where(df.isna().any(axis=1))[0] # returns a NumPy array
array([1, 2])
Explanation
Similar to the case above, we start by checking for the presence of NaN
values using isna()
:
df.isna() # returns a DataFrame
A Ba False Falseb True Falsec False True
We then check for rows where there is at least one True
:
df.isna().any(axis=1) # returns a Series
a Falseb Truec Truedtype: bool
To get the integer index of the boolean True
, use np.where(~)
:
np.where(df.isna().any(axis=1)) # returns a tuple of size one
(array([1, 2]),)
Here, np.where(~)
returns a tuple of size one, and so we use [0]
to extract the NumPy array of indexes:
np.where(df.isna().any(axis=1))[0] # returns a NumPy array
array([1, 2])