PySpark DataFrame | withColumn method
Start your free 7-days trial now!
PySpark DataFrame's withColumn(~)
method can be used to:
add a new column
update an existing column
Parameters
1. colName
| string
The label of the new column. If colName
already exists, then supplied col
will update the existing column. If colName
does not exist, then col
will be a new column.
2. col
| Column
The new column.
Return Value
A PySpark DataFrame (pyspark.sql.dataframe.DataFrame
).
Examples
Consider the following PySpark DataFrame:
+-----+---+| name|age|+-----+---+| Alex| 25|| Bob| 30||Cathy| 50|+-----+---+
Updating column values based on original column values in PySpark
To update an existing column, supply its column label as the first argument:
+-----+---+| name|age|+-----+---+| Alex| 50|| Bob| 60||Cathy|100|+-----+---+
Note that you must pass in a Column
object as the second argument, and so you cannot simply use a list as the new column values.
Adding a new column to a PySpark DataFrame
To add a new column AGEE
with 0
s:
Here, F.lit(0)
returns a Column
object holding 0
s. Note that since column labels are case insensitive, if you pass in "AGE"
as the first argument, you would end up overwriting the age
column.