Pandas
keyboard_arrow_down 655 guides
chevron_leftData Manipulation Cookbook
Adding a prefix to column valuesAdding leading zeros to strings of a columnAdding new column using listsAdding padding to a column of stringsBit-wise ORChanging column type to stringConditionally updating values of a DataFrameConverting all object-typed columns to categorical typeConverting column type to dateConverting column type to floatConverting column type to integerConverting K and M to numerical formConverting string categories or labels to numeric valuesEncoding categorical variablesExpanding lists vertically in a DataFrameExpanding strings vertically in a DataFrameExtracting numbers from columnFilling missing value in Index of DataFrameFiltering column values using boolean masksLogical AND operationMaking DataFrame string column lowercaseMapping True and False to 1 and 0 respectivelyMapping values of a DataFrame using a dictionaryModifying a single value in a DataFrameRemoving characters from columnsRemoving comma from column valuesRemoving first n characters from column valuesRemoving last n characters from column valuesRemoving leading substringRemoving trailing substringReplacing infinities with another value in DataFrameReplacing values in a DataFrameRounding valuesSorting categorical columnsUsing previous row to create new columns
check_circle
Mark as learned thumb_up
0
thumb_down
0
chat_bubble_outline
0
Comment auto_stories Bi-column layout
settings
Encoding categorical variables in Pandas
schedule Aug 11, 2023
Last updated local_offer
Tags Python●Pandas
tocTable of Contents
expand_more Master the mathematics behind data science with 100+ top-tier guides
Start your free 7-days trial now!
Start your free 7-days trial now!
To encode categorical variables, either using one-hot encoding or dummy coding, use Pandas get_dummies(~)
method.
As an example, consider the following DataFrame:
df = pd.DataFrame({"name":["alice","bob","cathy","david"],"major":["math","physics","math","english"]})df
name major0 alice math1 bob physics2 cathy math3 david english
One-hot encoding
To perform one-hot encoding on the variable major
:
pd.get_dummies(df, columns=["major"])
name major_english major_math major_physics0 alice 0 1 01 bob 0 0 12 cathy 0 1 03 david 1 0 0
Dummy coding
To perform dummy coding on major
, pass drop_first=True
like so:
pd.get_dummies(df, columns=["major"], drop_first=True)
name major_math major_physics0 alice 1 01 bob 0 12 cathy 1 03 david 0 0
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down
Comment
Citation
Ask a question or leave a feedback...
thumb_up
0
thumb_down
0
chat_bubble_outline
0
settings
Enjoy our search
Hit / to insta-search docs and recipes!