Creating a new column based on other columns in Pandas DataFrame
Start your free 7-days trial now!
To create a new column based on other columns, either:
use column-arithmetics for fastest performance.
use NumPy's
where(~)
method for creating binary columnsuse the
apply(~)
method, which is the slowest but offers the most flexibilityuse the Series'
replace(~)
method for mapping new values from existing columns.
Creating new columns using arithmetics
Consider the following DataFrame:
df
A Ba 3 5b 4 6
The fastest and simplest way of creating a new column is to use simple column-arithmetics:
df["C"] = df["A"] + df["B"]df
A B Ca 3 5 8b 4 6 10
For slightly more complicated operations, use the DataFrame's native methods:
df
A B Ca 3 5 5b 4 6 6
Note the following:
we are populating the new column
C
with the maximum of each row (axis=1
).the return type of
df.max(axis=1)
isSeries
.
Creating binary column values
Consider the following Pandas DataFrame:
To create a new column of binary values that are based on the age
column, use NumPy's where(~)
method:
Here, the first argument of the where(~)
method is a boolean mask. If the boolean value is True
, then resulting value will be 'JUNIOR'
, otherwise the value will be 'SENIOR'
.
Creating column with multiple values
Once again, consider the following Pandas DataFrame:
To create a new column with multiple values based on the age
column, use the apply(~)
function:
Here, the apply(~)
function is iteratively called for each row, and takes in as argument a Series
representing a row.
Creating column via mapping
Consider the same Pandas DataFrame as before:
To create a new column that is based on some mapping of an existing column:
mapping = { 'Alex': 'ALEX', 'Bob': 'BOB', 'Cathy': 'CATHY'}df['upper_name'] = df['name'].replace(mapping)
name age upper_name0 Alex 20 ALEX1 Bob 30 BOB2 Cathy 40 CATHY
Creating column using the assign method
Consider the following Pandas DataFrame:
df
A Ba 3 5b 4 6
We could also use the DataFrame's assign(~)
method, which takes in as argument a function with the DataFrame as the input and returns the new column values:
A B C0 3 5 01 4 6 0
Note the following:
if the sum of column
A
is larger than that of columnB
, then[-1,-1]
will be used as the new column, otherwise[0,0]
will be used.the keyword argument (
C
) became the new column label.