The column or index level name of the source DataFrame to perform the join on. The index of the other will be used for the join. If you want to specify a non-index column to join on for other, use merge(~) instead, which has a right_on parameter.

3. howlink | string | optional

The type of join to perform:

Value	Description
`"left"`link	All rows from the source DataFrame will be present in the resulting DataFrame. This is the SQL equivalent of a left-join.
`"right"`link	All rows from the right DataFrame will be present in the resulting DataFrame. This is the SQL equivalent of a right-join.
`"outer"`link	All rows from the source and right DataFrame will be present in the resulting DataFrame. This is the SQL equivalent of an outer-join.
`"inner"`link	All rows that have matching values in both the source and right DataFrame will be present in the resulting DataFrame. This is the SQL equivalent to inner-join.

By default, how="left".

Here's the classic Venn Diagram illustrating the differences:

4. lsuffixlink | string | optional

The suffix to append to the overlapping label of the source DataFrame. This is only relevant if there are duplicate column labels in the result. By default, lsuffix="".

5. rsuffixlink | string | optional

The suffix to append to the overlapping label of other. This is only relevant if there are duplicate column labels in the result. By default, rsuffix="".

6. sortlink | boolean | optional

Whether or not to sort the rows based on the join key. By default, sort=False.

Return Value

A merged DataFrame.

Examples

Basic usage

Consider the following DataFrame about some products of a shop:


        
        
            
                
                
                    df_products = pd.DataFrame({"product": ["computer", "smartphone", "headphones"],
                            "bought_by": ["bob", "alex", "bob"]},
                             index=["A","B","C"])
df_products
                
            
               product      bought_by
A  computer     bob
B  smartphone   alex
C  headphones   bob

Here's a DataFrame about some customers of the shop:


        
        
            
                
                
                    df_customers = pd.DataFrame({"age": [10, 20, 30]},
                             index=["alex","bob","cathy"])
df_customers
                
            
                   age
alex   10
bob    20
cathy  30

To perform a left-join on the bought_by column of df_products:


        
        
            
                
                
                    df_products.join(df_customers, on="bought_by")   # how="left"
                
            
               product     bought_by   age
A  computer      bob       20
B  smartphone    alex      10
C  headphones    bob       20

By default, the index of other will be used as the join key. If you want more flexibility as to which columns are used for the join, use the merge(~) method instead.

Comparison of different joins

Consider the following DataFrames about products and customers:


        
        
            
                
                
                    df_products = pd.DataFrame({"product": ["computer", "smartphone", "headphones"],
                            "bought_by": ["bob", "alex", "david"]},
                             index=["A","B","C"])
df_customers = pd.DataFrame({"age": [10, 20, 30]}, index=["alex","bob","cathy"])
                
            
                    [df_products]         |   [df_customers]
   product      bought_by     |            age
A  computer     bob           |     alex   10
B  smartphone   alex          |     bob    20
C  headphones   david         |     cathy  30

Left join


        
        
            
                
                
                    df_products.join(df_customers, on="bought_by", how="left")
                
            
               product     bought_by  age
A  computer    bob        20.0
B  smartphone  alex       10.0
C  headphones  david      NaN

Right join


        
        
            
                
                
                    df_products.join(df_customers, on="bought_by", how="right")
                
            
                 product     bought_by  age
B    smartphone  alex       10
A    computer    bob        20
NaN  NaN         cathy      30

Inner join


        
        
            
                
                
                    df_products.join(df_customers, on="bought_by", how="inner")
                
            
               product     bought_by  age
A  computer    bob        20
C  headphones  bob        20
B  smartphone  alex       10

Outer join


        
        
            
                
                
                    df_products.join(df_customers, on="bought_by", how="outer")
                
            
                 product     bought_by  age
A    computer    bob        20
C    headphones  bob        20
B    smartphone  alex       10
NaN  NaN         cathy      30

Specifying lsuffix and rsuffix

Suppose we wanted to join the following DataFrames:


        
        
            
                
                
                    df_products = pd.DataFrame({"product": ["computer", "smartphone", "headphones"],
                            "age": [5,6,7],
                            "bought_by": ["bob", "alex", "bob"]},
                             index=["A","B","C"])
df_customers = pd.DataFrame({"age": [10, 20, 30]},
                             index=["alex","bob","cathy"])
                
            
               product     age  bought_by   |         age
A  computer     5   bob         |  alex   10
B  smartphone   6   alex        |  bob    20
C  headphones   7   bob         |  cathy  30

Notice how both DataFrames have the age column.

Due to this overlap in the naming, performing the join results in a ValueError:


        
        
            
                
                
                    df_products.join(df_customers, on="bought_by")
                
            
            ValueError: columns overlap but no suffix specified: Index(['age'], dtype='object')

This error can be resolved by specifying lsuffix or rsuffix:


        
        
            
                
                
                    df_products.join(df_customers, on="bought_by", lsuffix="_product")
                
            
               product     age_product  bought_by   age
A  computer        5         bob        10
B  smartphone      6         alex       10
C  headphones      7         bob        20

Note that lsuffix and rsuffix only take effect when there are duplicate column labels.

Specifying sort

Consider the same example as before:


        
        
            
                
                
                            [df_products]         |   [df_customers]
   product      bought_by     |            age
A  computer     bob           |     alex   10
B  smartphone   alex          |     bob    20
C  headphones   bob           |     cathy  30

By default, sort=False, which means that the resulting DataFrame's rows are not sorted by the join key (bought_by):


        
        
            
                
                
                    df_products.join(df_customers, on="bought_by")   # sort=False
                
            
               product     bought_by  age
A  computer    bob        20
B  smartphone  alex       10
C  headphones  bob        20

Setting sort=True yields:


        
        
            
                
                
                    df_products.join(df_customers, on="bought_by", sort=True)
                
            
               product     bought_by  age
B  smartphone  alex       10
A  computer    bob        20
C  headphones  bob        20

Published by Isshin Inada

Edited by 0 others

Did you find this page useful?

thumb_up

thumb_down

Comment

Citation

Ask a question or leave a feedback...

Official Pandas Documentation

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.join.html

thumb_up

thumb_down

chat_bubble_outline

settings

Enjoy our search

Hit / to insta-search docs and recipes!