Pandas DataFrame | combine method
Start your free 7-days trial now!
Pandas DataFrame.combine(~)
method combines the columns of two DataFrames. Note that only columns that share the same column label will be combined.
Parameters
1. other
link | DataFrame
The DataFrame that you to want combine with.
2. func
link | function
A function that takes in two arguments: a column in the first DataFrame and a column in the second DataFrame that are to be combined, both as type Series
. The return value of this function must be a Series
that represent the resulting column.
3. fill_value
| scalar
| optional
The value that will fill the instances of missing values (NaN
). The filling happens before the merging process. By default, fill_value=None
.
4. overwrite
link | boolean
| optional
The meaning of the boolean values is as follows:
Value | Description |
---|---|
| If a column in one DataFrame does not exist in the other DataFrame, then the merged column will have its entries filled with |
| If a column in the source DataFrame does not exist in the other DataFrame, then the column will appear in the merged DataFrame with its entries kept intact. However, the reverse is not true; if the other DataFrame has columns that do no exist in the source DataFrame, then those columns will also appear in the final DataFrame, but with its entries filled with |
By default, overwrite=True
. See examples below for clarification.
Return Value
A DataFrame with the columns combined as per the parameters.
Examples
Basic usage
Consider the following DataFrames:
df = pd.DataFrame({"A":[3,4], "B":[5,6]})df_other = pd.DataFrame({"A":[1,8], "B":[2,9]})
A B | A B0 3 5 | 0 1 21 4 6 | 1 8 9
To combine the columns of the two DataFrames to leave only the higher values:
df.combine(df_other, np.maximum)
A B0 3 51 8 9
Custom function
We can also pass in a custom function for func
:
def foo(col, col_other): # a pair of Series return col + col_other
df.combine(df_other, foo)
A B0 4 71 12 15
Note the following:
foo
simply computes and returns the sum of a pair of matching columns in the two DataFrames.foo
is called twice here since there are two matching pairs of column labels.
Specifying overwrite
Consider the following DataFrames that have mismatches in the column labels:
df = pd.DataFrame({"A":[3,4], "B":[5,6]})df_other = pd.DataFrame({"A":[1,8], "C":[2,9]})
A B | A C0 3 5 | 0 1 21 4 6 | 1 8 9
By default, overwrite=True
, which means that columns that do not exist in the other DataFrame will be filled with NaN
and vice versa:
df.combine(df_other, np.maximum)
A B C0 3 NaN NaN1 8 NaN NaN
Here, columns B
and C
are NaN
because df
did not have column C
, while df_other
did not have column B
.
We can keep the columns of the source DataFrame intact by setting overwrite=False
:
df.combine(df_other, np.maximum, overwrite=False)
A B C0 3 5 NaN1 8 6 NaN
Here, notice how column C
, which is a column present only in df_other
, still have its entries filled with NaN
.