search
Search
Login
Unlock 100+ guides
menu
menu
web
search toc
close
Comments
Log in or sign up
Cancel
Post
account_circle
Profile
exit_to_app
Sign out
What does this mean?
Why is this true?
Give me some examples!
search
keyboard_voice
close
Searching Tips
Search for a recipe:
"Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview
icon_star
Doc Search
icon_star
Code Search Beta
SORRY NOTHING FOUND!
mic
Start speaking...
Voice search is only supported in Safari and Chrome.
Navigate to

Pandas DataFrame | query method

schedule Aug 11, 2023
Last updated
local_offer
PythonPandas
Tags
mode_heat
Master the mathematics behind data science with 100+ top-tier guides
Start your free 7-days trial now!

Pandas' DataFrame.query(~) method filters rows according to the provided boolean expression.

NOTE

The query(~) method returns a copy of the data, so you cannot use the method to update values of the DataFrame. To perform updates, use properties like loc and iloc, which returns references instead of copies.

Parameters

1. expr | string

The criteria by which to filter the rows. The string must be a boolean expression (e.g. "A == 2" and "A > 3" where A is a column name).

2. inplace | boolean | optional

  • If True, then the method will directly modify the source DataFrame instead of creating a new DataFrame.

  • If False, then a new DataFrame will be created and returned.

By default, inplace=False.

Return Value

A DataFrame containing the filtered rows.

Examples

Consider the following DataFrame:

import pandas as pd
df = pd.DataFrame({"A":[1,2,3], "B":[4,5,6], "C":[7,8,9]})
df
A B C
0 1 4 7
1 2 5 8
2 3 6 9

Equality queries

To get all rows where the value for column A is 2:

df.query("A == 2")
A B C
1 2 5 8

For a not-equals query, use != instead.

Queries with string values

When referring to string-typed values, wrap them the values in ' or " like so:

df = pd.DataFrame({"A":["a","b"]})
df.query("A == 'a'")
A
0 a

Queries with boolean values

To query by a boolean value:

df = pd.DataFrame({"A":[True,False]})
df.query("A == True")
A
0 True

Range queries

To get all rows where the value for column A is greater than 1:

df.query("A > 1")
A B C
1 2 5 8
2 3 6 9

Interval queries

To get all rows where the value for column A is between 1 and 3 (both ends exclusive):

df.query("1 < A < 3")
A B C
1 2 5 8

AND queries

To get all rows where column A == 3 and column C == 9:

df.query("A == 3 and C == 9")
A B C
2 3 6 9

OR queries

To get all rows where column A == 1 or column C == 9:

df.query("A == 1 or C == 9")
A B C
0 1 4 7
2 3 6 9

Queries with column-comparisons

To get all rows where the value for column C is greater than that for column A:

df.query("C > A")
A B C
0 1 4 7
1 2 5 8
2 3 6 9

Queries with arithmetic operations

To get all rows where the value for C plus 1 is 10:

df.query("C + 1 == 10")
A B C
2 3 6 9

Queries with object access

You can also access object properties and methods using the dot notation:

df.query("B.values.max() < C")
A B C
0 1 4 7
1 2 5 8
2 3 6 9

Here, note the following:

  • B is of type Series, and so you have access to all the properties and methods available for Series.

  • One such property is values, which is simply a Numpy array representation of the Series.

  • Next, we are calling max(), which is a method available to all Numpy arrays. In this case, max() returns 6 since 6 is the largest value in column B.

IN queries

Consider the following DataFrame:

df = pd.DataFrame({"A":["aa","aa","aA","AA"],"B":["b","b","b","b"]})
df
A B
0 aa b
1 aa b
2 aA b
3 AA b

To get all rows where the value for A is in arr:

arr = ["aa","AA"]
df.query("A in @arr")
A B
0 aa b
1 aa b
3 AA b

Here, the arr is prefixed with @ since it is an external variable.

NOT IN queries

Consider the same DataFrame:

df = pd.DataFrame({"A":["aa","aa","aA","AA"],"B":["b","b","b","b"]})
df
A B
0 aa b
1 aa b
2 aA b
3 AA b

To get all rows where the value for A is not in arr:

arr = ["aa","AA"]
df.query("A not in @arr")
A B
2 aA b

Referencing the index

Consider the following DataFrame:

df = pd.DataFrame({"A":[2,3,4],"B":[5,6,7]})
df
A B
0 2 5
1 3 6
2 4 7

To refer to the index, simply include the word index like so:

df.query("index > 1")
A B
2 4 7

Handling column names with spaces

When your column names contain spaces, you need to add the back-ticks ` around the column name.

Consider the following DataFrame:

df = pd.DataFrame({"A A":[1,2],"B":[3,4]})
df
A A B
0 1 3
1 2 4

Here, the first column (i.e. A A) contains a space in its name. To deal with this, wrap ` like follows:

df.query("`A A` == 2")
A A B
1 2 4
robocat
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down
Comment
Citation
Ask a question or leave a feedback...