search
Search
Login
Unlock 100+ guides
menu
menu
web
search toc
close
Comments
Log in or sign up
Cancel
Post
account_circle
Profile
exit_to_app
Sign out
What does this mean?
Why is this true?
Give me some examples!
search
keyboard_voice
close
Searching Tips
Search for a recipe:
"Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview
icon_star
Doc Search
icon_star
Code Search Beta
SORRY NOTHING FOUND!
mic
Start speaking...
Voice search is only supported in Safari and Chrome.
Navigate to

Pandas DataFrame | dropna method

schedule Aug 10, 2023
Last updated
local_offer
PythonPandas
Tags
mode_heat
Master the mathematics behind data science with 100+ top-tier guides
Start your free 7-days trial now!

Pandas DataFrame.dropna(~) method removes rows or columns with missing values.

Parameters

1. axislink | int or string | optional

Whether or not to remove rows or columns with missing values:

Axis

Description

0 or "index"

Scans through each row, and if a missing value exists, drop the row.

1 or "columns"

Scans through each column, and if a missing value exists, drop the column.

By default, axis=0.

2. howlink | string | optional

The criteria by which to remove a row/column:

How

Description

"any"

If the row or column consists of at least one missing value, then remove it.

"all"

If the row or column consists of all missing values, then remove it.

By default, how="any".

3. thresh | int | optional

The number of non-NaN a row/column must at least contain to not be dropped. For instance, if thresh=2, then

  • a column with 1 non-missing value will be dropped.

  • a column with 2 non-missing values will be kept.

  • a column with 3 non-missing values will be kept.

By default, no minimum is set.

4. subsetlink | array-like of strings | optional

The columns to check for missing values when scans are performed row-wise (when axis=0). By default, all columns are considered. Consult examples below for clarification.

5. inplacelink | boolean | optional

  • If True, then the method will directly modify the source DataFrame instead of creating a new DataFrame.

  • If False, then a new DataFrame will be created and returned.

By default, inplace=False.

Return Value

A DataFrame with rows or columns that contain missing values removed according to the provided parameters.

Examples

Consider the following DataFrame:

df = pd.DataFrame({"A":[pd.np.NaN,2],"B":[3,4],"C":[5,6]})
df
   A    B  C
0  NaN  3  5
1  2.0  4  6

Removing rows with missing values

To remove rows with missing value(s):

df.dropna()   # or axis=0 or axis="index"
   A    B  C
1  2.0  4  6

Notice how the first row (i.e. index=0) was removed since it contained a missing value.

Removing columns with missing values

To remove columns with missing value(s):

df.dropna(axis="columns")   # or axis=1
   B  C
0  3  5
1  4  6

Notice how column A was removed since it contained a missing value.

Removing column with ALL missing values

Consider the following DataFrame:

df = pd.DataFrame({"A":[pd.np.NaN,2], "B":[3,4], "C":[pd.np.NaN,pd.np.NaN]})
df
   A    B  C
0  NaN  3  NaN
1  2.0  4  NaN

To remove columns whose values are all missing values, set how="all":

df.dropna(how="all" axis="columns")
   A    B
0  NaN  3
1  2.0  4

Notice how only column C was removed as it contained only missing values.

Setting a threshold

Consider the following DataFrame:

import numpy as np
df = pd.DataFrame({"A":["a",np.nan,np.nan],"B":[3,4,np.nan]})
df
A B
0 a 3.0
1 NaN 4.0
2 NaN NaN

To remove columns with at least 2 non-NaN values, set thresh=2:

df.dropna(thresh=2, axis=1)
B
0 3.0
1 4.0
2 NaN

Notice how column A, which only had one non-missing value, was removed, while column B with 2 non-missing values was kept.

Removing rows with missing values for certain columns only

Consider the following DataFrame:

df = pd.DataFrame({"A":[pd.np.NaN,2], "B":[3,4], "C":[pd.np.NaN,pd.np.NaN]})
df
   A    B  C
0  NaN  3  NaN
1  2.0  4  NaN

To remove rows where the value corresponding to column A is missing:

df.dropna(subset=["A"], axis="index")
   A    B  C
1  2.0  4  NaN

Notice how only the first row (index=0) was removed despite the fact that both the two rows contained missing values. This is because, by specifying subset=["A"], the method only checks for missing values in column A.

Removing columns with missing values for certain rows only

Consider the following DataFrame:

df = pd.DataFrame({"A":[pd.np.NaN,2], "B":[3,4], "C":[5,6]})
df
   A    B  C
0  NaN  3  5
1  2.0  4  6

To remove columns where the value corresponding to row index 1 is missing:

df.dropna(subset=[1], axis=1)   # or axis="columns"
   A    B
0  NaN  3
1  2.0  4

Notice how only column C was removed, despite the fact that column A also contained a missing value. This is because by specifying subset=[1], the method will only check for missing values at row index=1 (i.e. the second row). Since the value corresponding to column C in row index=1 was a missing value, the method removed column C.

Removing rows/columns in-place

To drop row(s) or column(s) in-place, we need to set inplace=True. This will directly modify the source DataFrame instead of creating and returning a new DataFrame.

As an example, consider the following DataFrame:

df = pd.DataFrame({"A":[pd.np.NaN,2], "B":[3,4], "C":[5,6]})
df
   A    B  C
0  NaN  3  5
1  2.0  4  6

We remove all rows containing missing value(s) with inplace=True:

df.dropna(inplace=True)
df
   A    B  C
1  2.0  4  6

As shown in the output, the source DataFrame has been modified.

robocat
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down
Comment
Citation
Ask a question or leave a feedback...