Python | merge_asof method
Start your free 7-days trial now!
Pandas merge_asof(~)
method is used to perform a left join on two DataFrames where the join keys are matched not by equality but by proximity.
Both left and right DataFrames must be sorted by the join key.
Parameters
1. left
link | DataFrame
The left DataFrame to perform the join on.
2. right
link | DataFrame
The right DataFrame to perform the join on.
3. on
link | string
The label of the column to join on. The label must be present in both left
and right
.
The on
parameter is there only for convenience. If the column to join on have different labels, then you must use a combination of left_on
, right_on
, left_index
or right_index
.
4. left_on
| string
or array-like
The label of the column in left
to perform join on.
5. right_on
| string
or array-like
The label of the column in right
to perform join on.
6. left_index
| boolean
| optional
Whether or not to perform the join on the index of the left DataFrame. By default, left_index=False
.
7. right_index
| boolean
| optional
Whether or not to perform the join on the index of the right DataFrame. By default, right_index=False
.
Many textbooks and documentation use the words merge keys or join keys to denote the columns by which a join is performed.
8. by
| string
or list<string>
| optional
The label of the columns that must additionally match up for the join to take effect. Just like on
, by
must be present in both left
and right
.
9. left_by
link | string
| optional
The label of the column in left
to perform additional matching on. See examples below for clarification.
10. right_by
| string
| optional
The label of the column in right
to perform additional matching on. See examples below for clarification.
If left_by
is specified, then right_by
must also be specified, and vice versa.
11. suffixes
link | tuple
of (string, string)
| optional
The suffix names to append to the duplicate column labels in the resulting DataFrame. You can also pass a single None
instead of a string in suffixes
to indicate that the left or right column label should be left as is. By default, suffixes=("_x", "_y")
.
12. tolerance
| int
or Timedelta
| optional
The maximum acceptable difference between a pair of join keys. By default, tolerance=None
.
13. allow_exact_matches
| boolean
| optional
Whether or not to allow exact matches between a pair of join keys. By default, allow_exact_matches=True
.
14. direction
| string
| optional
The direction of proximity matching:
Value | Description |
---|---|
| Match if left join key is smaller than (or equal to) the right join key. |
| Match if left join key is larger than (or equal to) the right join key, |
| Match regardless of the relative size of the two keys. |
By default, direction="backward"
. Note that the (or equal to) part depends on allow_exact_matches
.
Return Value
The merged DataFrame
.
Examples
Basic usage
Consider the following two DataFrames:
Performing the join on column B
:
pd.merge_asof(df, df_other, on="B")
A B C0 2 3 71 3 5 82 4 9 8
Note the following:
the original column
B
of the left DataFrame appears in the resulting DataFrame.the values in the pair of column
B
s do not match exactly -[3,5,9]
and[2,5,10]
.since the value
3
does not exist in theright
join key, the method looks for the closest value that is smaller than3
(direction="backward"
), which in this case is2
. The corresponding value for columnC
is7
, so this is why we see the value7
there in the first row.again, the value
9
does not exist in the right join key, so the closest match smaller than9
is5
. The corresponding value for columnC
for this match is8
, so we end up with a8
at the bottom-right entry.on="B"
can actually be omitted here since the method will infer the join key if there is a single pair of overlapping column labels inleft
andright
.
Specifying left_by
Consider the following DataFrames:
By default, without specifying by
, left_by
or right_by
:
pd.merge_asof(df, df_other, on="B")
A B C0 7 3 71 9 5 8
We can restrict the matches by allowing those where the specified column values match up:
pd.merge_asof(df, df_other, on="B", left_by="A", right_by="C")
A B C0 7 3 7.01 9 5 NaN
Note the following:
the value
7.0
is still returned for the first row because the column value forA
matched up with the column value forC
(both were7
).we got
NaN
for the second row because the column value forA
(9
) did not match up with the column value forC
(8
).
Specifying suffixes
Consider the following two DataFrames:
By default, suffixes=("_x","_y")
, which means that if duplicate column labels arise in the resulting DataFrame, "_x"
will be appended as a suffix to the overlapping label for the left, and "_y"
for the right:
pd.merge_asof(df, df_other, on="B")
A_x B A_y0 2 3 71 3 5 8
We can specify our own suffixes like so:
pd.merge_asof(df, df_other, on="B", suffixes=["_X","_Y"])
A_X B A_Y0 2 3 71 3 5 8
You can pass in a single None
instead of a string to keep the original name for either the left or right overlapping label:
pd.merge_asof(df_one, df_two, on="B", suffixes=["_X",None])
A_X B A0 2 3 71 3 5 82 4 9 8