Pandas | concat method
Start your free 7-days trial now!
Pandas concat(~)
method concatenates a list of Series or DataFrame, either horizontally or vertically.
Parameters
1. objs
link | list-like
or map-like
of Series
or DataFrame
The array-likes or DataFrames to stack horizontally or vertically.
2. axis
link | int
or string
| optional
Whether to concatenate horizontally or vertically:
Axis | Description |
---|---|
| Concatenate horizontally. |
| Concatenate vertically. |
By default, axis=0
.
3. join
link | string
| optional
Whether to perform an inner or outer (full) join:
"inner"
: performs an inner join"outer"
: performs an outer join
By default, join="outer"
.
4. ignore_index
link | boolean
| optional
If True
, then the index of the resulting DataFrame will be reset to 0,1,...,n-1
where n
is the number of rows of the DataFrame. By default, ignore_index=False
.
5. keys
link | sequence
| optional
Used to construct a hierarchical index. By default, keys=None
.
6. levels
| list<sequence>
| optional
The levels used to construct a MultiIndex. By default, keys
will be used.
7. names
link | list<string>
| optional
The labels assigned to the levels in the resulting hierarchical index. By default, names=None
.
8. verify_integrity
link | boolean
| optional
If True
, then an error will be thrown if the resulting Series/DataFrame contains duplicate index or column labels. This checking process may be computationally expensive. By default, verify_integrity=False
.
9. sort
link | boolean
| optional
Whether or not to sort non-concatenation axis. This is only applicable for join="outer"
, and not for join="inner"
.
10. copy
| boolean
| optional
Whether to return a new Series/DataFrame or reuse the provided objs
if possible. By default, copy=True
.
Return Value
The return type depends on the following parameters:
When
axis=0
and concatenation is betweenSeries
, then aSeries
is returned.When the concatenation involves at least one DataFrame, then a
DataFrame
is returned.When
axis=1
, then a DataFrame is returned.
Examples
Consider the following DataFrames:
Concatenating multiple DataFrames vertically
To concatenate multiple DataFrames vertically:
pd.concat([df, df_other]) # axis=0
A B0 2 41 3 50 6 81 7 9
Concatenating multiple DataFrames horizontally
To concatenate multiple DataFrames horizontally, pass in axis=1
like so:
pd.concat([df, df_other], axis=1)
A B A B0 2 4 6 81 3 5 7 9
Specifying join
Consider the following DataFrames:
Here, both the DataFrames both have column B
.
Outer join
By default, join="outer"
, which means that all columns will appear in the resulting DataFrame, and the columns with the same label will be stacked:
pd.concat([df,df_other], join="inner")
A B C0 2.0 3 NaN0 NaN 4 5.0
The reason why we get NaN
for some entries is that, since column B
is shared between the DataFrames, the values get stacked for B
, but columns A
and C
only have a single value, so NaN
must be inserted as a filler.
Inner join
To perform inner-join instead, set join="inner"
like so:
pd.concat([df,df_other], join="inner")
B0 30 4
Here, only columns that appear in all the DataFrames will appear in the resulting DataFrame. Since only column B
is shared between df
and df_other
, we only see column B
in the output.
Concatenating Series
Concatenating Series works in the same as concatenating DataFrames.
To concatenate two Series vertically:
To concatenate two Series horizontally:
Specifying ignore_index
By default, ignore_index=False
, which means the original indexes of the inputs will be preserved:
To reset the index to the default integer indices:
Specifying keys
To form a multi-index, specify the keys
parameters:
To add more levels, pass a tuple
like so:
Specifying names
The names
parameter is used to assign a label to the index of the resulting Series/DataFrame:
Here, the label "Groups"
is assigned to the index of the Series.
Specifying verify_integrity
By default, verify_integrity=False
, which means that duplicate indexes and column labels are allowed:
Notice how we have overlapping indexes 0
and 1
.
Setting verify_integrity=True
will throw an error in such cases:
If you want to ensure that the resulting Series/DataFrame has a unique index, consider setting ignore_index=True
.
Specifying sort
By default, sort=False
, which means that the resulting column labels or indexes will not be sorted:
Notice how the columns are not sorted by column labels.
When axis=0
and sort=True
, the columns will be sorted by column labels:
When axis=1
and sort=True
, the rows will be sorted by row labels: