The suffix names to append to the duplicate column labels in the resulting DataFrame. You can also pass a single None instead of a string in suffixes to indicate that the left or right column label should be left as is. By default, suffixes=("_x", "_y").

10. howlink | string | optional

The type of join to perform:

Value	Description
`"left"`	All rows from the source DataFrame will be present in the resulting DataFrame. This is the SQL equivalent of a left-join.
`"right"`	All rows from the right DataFrame will be present in the resulting DataFrame. This is the SQL equivalent of a right-join.
`"outer"`	All rows from the source and right DataFrame will be present in the resulting DataFrame. This is the SQL equivalent of an outer-join.
`"inner"`	All rows that have matching values in both the source and right DataFrame will be present in the resulting DataFrame. This is the SQL equivalent to inner-join.

By default, how="outer".

Here's the classic Venn Diagram illustrating the differences:

Return Value

A merged DataFrame.

Examples

Basic usage

Consider a shop with some data about their products and customers:


        
        
            
                
                
                    df_products = pd.DataFrame({"product": ["computer", "smartphone", "headphones"],
                            "bought_by": ["bob", "alex", "david"]},
                             index=["A","B","C"])
df_customers = pd.DataFrame({"name":["alex","bob","cathy"], "age":[10, 20, 30]})
                
            
                    [df_products]         |   [df_customers]
   product      bought_by     |        name   age
A  computer       bob         |     0  alex   10
B  smartphone     alex        |     1  bob    20
C  headphones     david       |     2  cathy  30

To perform an outer join on two DataFrames based on columns bought_by and name:


        
        
            
                
                
                    pd.merge_ordered(df_products, df_customers, left_on="bought_by", right_on="name", how="outer")
                
            
               product     bought_by  name   age
0  smartphone  alex       alex   10.0
1  computer    bob        bob    20.0
2  NaN         NaN        cathy  30.0
3  headphones  david      NaN    NaN

Specifying fill_method

Unlike merge(~), merge_ordered(~) allows to fill missing values that arise due to the join.

Again, consider the same DataFrames as above:


        
        
            
                
                
                            [df_products]         |   [df_customers]
   product        bought_by   |       name   age
A  computer       bob         |    0  alex   10
B  smartphone     alex        |    1  bob    20
C  headphones     david       |    2  cathy  30

By default, fill_method=None, which means that the resulting NaN are left as is:


        
        
            
                
                
                    pd.merge_ordered(df_products, df_customers, left_on="bought_by", right_on="name", how="outer")
                
            
               product     bought_by  name   age
0  smartphone  alex       alex   10.0
1  computer    bob        bob    20.0
2  NaN         NaN        cathy  30.0
3  headphones  david      NaN    NaN

To fill those NaN, set fill_method="ffill" like so:


        
        
            
                
                
                    pd.merge_ordered(df_products, df_customers, left_on="bought_by", right_on="name", how="outer", fill_method="ffill")
                
            
               product     bought_by  name   age
0  smartphone  alex       alex   10
1  computer    bob        bob    20
2  computer    bob        cathy  30
3  headphones  david      cathy  30

Notice how all the NaN are filled with the previous non-NaN value.

WARNING

Note that this example is just to illustrate how the filling works - we will never perform such fillings. A practical use case of this filling logic is reserved mainly for Time-series when it makes more sense to fill with the previously recorded datetime.

Specifying left_by

Let us use the same example as above:


        
        
            
                
                
                            [df_products]         |   [df_customers]
   product      bought_by     |        name   age
A  computer       bob         |     0  alex   10
B  smartphone     alex        |     1  bob    20
C  headphones     david       |     2  cathy  30

By default, left_by=None, which means that resulting DataFrame is constructed using a traditional join:


        
        
            
                
                
                    pd.merge_ordered(df_products, df_customers, left_on="bought_by", right_on="name", how="outer")
                
            
               product     bought_by  name   age
0  smartphone  alex       alex   10.0
1  computer    bob        bob    20.0
2  NaN         NaN        cathy  30.0
3  headphones  david      NaN    NaN

Setting left_by="product" will repeat each product item for every row in the joined key (bought_by):


        
        
            
                
                
                    pd.merge_ordered(df_products, df_customers, left_on="bought_by", right_on="name", how="outer", left_by="product")
                
            
               product    bought_by  age   name
0  computer    NaN       10.0  alex
1  computer    bob       20.0  bob
2  computer    NaN       30.0  cathy
3  smartphone  alex      10.0  alex
4  smartphone  NaN       20.0  bob
5  smartphone  NaN       30.0  cathy
6  headphones  NaN       10.0  alex
7  headphones  NaN       20.0  bob
8  headphones  NaN       30.0  cathy
9  headphones  david     NaN   NaN

Specifying suffixes

Consider the following DataFrames:


        
        
            
                
                
                    df_products = pd.DataFrame({"product": ["computer", "smartphone", "headphones"],
                            "age": [7,8,9],
                            "bought_by": ["bob", "alex", "bob"]},
                             index=["A","B","C"])
df_customers = pd.DataFrame({"name":["alex","bob","cathy"], "age":[10, 20, 30]})
                
            
                    [df_products]            |   [df_customers]
   product      age  bought_by   |        name   age
A  computer      7   bob         |     0  alex   10
B  smartphone    8   alex        |     1  bob    20
C  headphones    9   david       |     2  cathy  30

Notice how the two DataFrames have an overlapping column label - age.

By default, suffixes=("_x","_y"), which means that if the merged DataFrame has overlapping column labels, then the suffix "_x" will be appended to the overlapping column label of the left DataFrame, and "_y" to the right DataFrame:


        
        
            
                
                
                    pd.merge_ordered(df_products, df_customers, left_on="bought_by", right_on="name", how="outer")
                
            
               product      age_x  bought_by  name   age_y
0  smartphone   8.0    alex       alex    10
1  computer     7.0    bob        bob     20
...

We can specify our own suffixes like so:


        
        
            
                
                
                    pd.merge_ordered(df_products, df_customers, left_on="bought_by", right_on="name", how="outer", suffixes=["_A","_B"])
                
            
               product     age_A  bought_by  name  age_B
0  smartphone  8.0    alex       alex  10
1  computer    7.0    bob        bob   20
...

You can also pass a None instead of a string to indicate that the left or right column label should be left as is:


        
        
            
                
                
                    pd.merge_ordered(df_products, df_customers, left_on="bought_by", right_on="name", how="outer", suffixes=[None,"_B"])
                
            
               product     age  bought_by  name  age_B
0  smartphone  8.0  alex       alex  10
1  computer    7.0  bob        bob   20
...