PySpark DataFrame | unionByName method
Start your free 7-days trial now!
PySpark DataFrame's unionByName(~)
method concatenates PySpark DataFrames vertically by aligning the column labels.
Parameters
1. other
| PySpark DataFrame
The other DataFrame with which to concatenate.
2. allowMissingColumns
| boolean
| optional
If
True
, then no error will be thrown if the column labels of the two DataFrames do not align. If in case of misalignments, thennull
values will be set.If
False
, then an error will be thrown if the column labels of the two DataFrames do not align.
By default, allowMissingColumns=False
.
Return Value
A new PySpark DataFrame
.
Examples
Concatenating PySpark DataFrames vertically by aligning columns
Consider the following PySpark DataFrame:
+---+---+---+| A| B| C|+---+---+---+| 1| 2| 3|+---+---+---+
Here's another PySpark DataFrame:
+---+---+---+| A| B| C|+---+---+---+| 4| 5| 6|| 7| 8| 9|+---+---+---+
To concatenate these two DataFrames vertically by aligning the columns:
+---+---+---+| A| B| C|+---+---+---+| 1| 2| 3|| 4| 5| 6|| 7| 8| 9|+---+---+---+
Dealing with cases when column labels mismatch
By default, allowMissingColumns=False
, which means that if the two DataFrames do not have exactly matching column labels, then an error will be thrown.
For example, consider the following PySpark DataFrames:
+---+---+---+| A| B| C|+---+---+---+| 1| 2| 3|+---+---+---+
Here's the other PySpark DataFrame that have slightly different column labels:
+---+---+---+| B| C| D|+---+---+---+| 4| 5| 6|| 7| 8| 9|+---+---+---+
Since the column labels do not match, calling unionByName(~)
will result in an error:
AnalysisException: Cannot resolve column name "A" among (B, C, D)
To allow for misaligned columns, set allowMissingColumns=True
:
+----+---+---+----+| A| B| C| D|+----+---+---+----+| 1| 2| 3|null||null| 4| 5| 6||null| 7| 8| 9|+----+---+---+----+
Notice how we have null
values for the misaligned columns.