search
Search
Login
Unlock 100+ guides
menu
menu
web
search toc
close
Comments
Log in or sign up
Cancel
Post
account_circle
Profile
exit_to_app
Sign out
What does this mean?
Why is this true?
Give me some examples!
search
keyboard_voice
close
Searching Tips
Search for a recipe:
"Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview
icon_star
Doc Search
icon_star
Code Search Beta
SORRY NOTHING FOUND!
mic
Start speaking...
Voice search is only supported in Safari and Chrome.
Navigate to
check_circle
Mark as learned
thumb_up
0
thumb_down
0
chat_bubble_outline
0
Comment
auto_stories Bi-column layout
settings

Getting rows with missing values (NaNs) in Pandas DataFrame

schedule Aug 12, 2023
Last updated
local_offer
PythonPandas
Tags
mode_heat
Master the mathematics behind data science with 100+ top-tier guides
Start your free 7-days trial now!

Example

Consider the following DataFrame:

df = pd.DataFrame({"A":[np.nan,3,np.nan],"B":[4,5,6],"C":[np.nan,7,8]}, index=["a","b","c"])
df
A B C
a NaN 4 NaN
b 3.0 5 7.0
c NaN 6 8.0

Rows with at least one missing value

Solution

To get rows with missing values in df:

df[df.isna().any(axis=1)]
A B C
a NaN 4 NaN
c NaN 6 8.0

Explanation

The isna() method returns a DataFrame of booleans where True indicates the presence of a missing value:

df.isna()
A B C
a True False True
b False False False
c True False False

We then use any(axis=1) to obtain a Series where True represents the presence of at least one True in each row:

df.isna().any(axis=1)
a True
b False
c True
dtype: bool

The parameter axis=1 is needed here since the default behaviour of any(~) is to scan through each column (as opposed to each row).

With this boolean mask, we can then extract rows that correspond to True using [] syntax:

df[df.isna().any(axis=1)]
A B C
a NaN 4 NaN
c NaN 6 8.0

Rows with missing value for a certain column

We show the same df here for your reference:

df = pd.DataFrame({"A":[np.nan,3,np.nan],"B":[4,5,6],"C":[np.nan,7,8]}, index=["a","b","c"])
df
A B C
a NaN 4 NaN
b 3.0 5 7.0
c NaN 6 8.0

Solution

To get rows with missing value in column C:

df[df["C"].isna()]
A B C
a NaN 4 NaN

Explanation

We first begin by extracting column C as a Series:

df["C"]
a NaN
b 7.0
c 8.0
Name: C, dtype: float64

Next, we use the Series' isna() method to get a Series of booleans where True indicates the presence of NaN:

df["C"].isna()
a True
b False
c False
Name: C, dtype: bool

Finally, we pass in this boolean mask to extract the rows corresponding to True using [~] syntax:

df[df["C"].isna()]
A B C
a NaN 4 NaN

Rows with missing value for multiple columns

We show the same df here for your reference:

df = pd.DataFrame({"A":[np.nan,3,np.nan],"B":[4,5,6],"C":[np.nan,7,8]}, index=["a","b","c"])
df
A B C
a NaN 4 NaN
b 3.0 5 7.0
c NaN 6 8.0

Solution

To get rows with missing value in columns A and C:

df[df[["A","C"]].isna().all(axis=1)]
A B C
a NaN 4 NaN

Explanation

We first fetch columns A and C as a DataFrame using [~] syntax:

df[["A","C"]]
A C
a NaN NaN
b 3.0 7.0
c NaN 8.0

We then use the isna() method to get a DataFrame of booleans where True indicates the presence of NaN:

df[["A","C"]].isna()
A C
a True True
b False False
c True False

Next, we use all(axis=1) get a Series of booleans where True indicates a row with all Trues:

df[["A","C"]].isna().all(axis=1)
a True
b False
c False
dtype: bool

Finally, we use the [] syntax to extract the rows corresponding to True:

df[df[["A","C"]].isna().all(axis=1)]
A B C
a NaN 4 NaN

Rows with missing values for all columns

Consider the following DataFrame:

df = pd.DataFrame({"A":[np.nan,np.nan],"B":[np.nan,4]}, index=["a","b"])
df
A B
a NaN NaN
b NaN 4.0

Solution

To get rows with missing values for all columns:

df[df.isna().all(axis=1)]
A B
a NaN NaN

Explanation

The logic is exactly the same as the case for getting rows with at least one missing value, except that we use all(~) instead of any(~). The difference is as follows:

  • all(~) returns a Series of booleans where True indicates a row with all missing column values (axis=1).

  • any(~) returns a Series of booleans where True indicates a row with at least one missing column value.

robocat
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down
Comment
Citation
Ask a question or leave a feedback...