search
Search
Join our weekly DS/ML newsletter layers DS/ML Guides
menu
menu search toc more_vert
Robocat
Guest 0reps
Thanks for the thanks!
close
Comments
Log in or sign up
Cancel
Post
account_circle
Profile
exit_to_app
Sign out
help Ask a question
Share on Twitter
search
keyboard_voice
close
Searching Tips
Search for a recipe:
"Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview
icon_star
Doc Search
icon_star
Code Search Beta
SORRY NOTHING FOUND!
mic
Start speaking...
Voice search is only supported in Safari and Chrome.
Navigate to
A
A
brightness_medium
share
arrow_backShare
Twitter
Facebook

Pandas | factorize method

Pandas
chevron_right
Documentation
chevron_right
General Functions
schedule Jul 1, 2022
Last updated
local_offer PythonPandas
Tags

Pandas factorize(~) method returns the following:

  • an array of integer indices to map the input array to the unique values.

  • all the unique values of the input array.

Parameters

1. valueslink | sequence

A 1D sequence of values.

2. sortlink | boolean | optional

Whether or not to sort the resulting array of unique values. By default, sort=False.

3. na_sentinellink | int | optional

The value to mark NaN in the array of integer indices. By default, na_sentinel=-1.

Return Value

The following two NumPy arrays are returned:

  • an array of integer indices that maps the input array to the array of unique values.

  • an array containing the unique values of the input array.

Examples

Basic usage

codes, uniques = pd.factorize(["B", "A", "A", "C", "B"])
print("codes:", codes)
print("uniques:", uniques)
codes: [0 1 1 2 0]
uniques: ['B' 'A' 'C']

Note the following:

  • the codes array maps the values in the input array to the uniques array.

  • the unique values are ordered as they appear in the input array.

You can recreate the input array using codes and uniques like so:

uniques[codes]
array(['B', 'A', 'A', 'C', 'B'], dtype=object)

Specifying sort

By default, sort=False, which means that the returned array of unique values is not sorted.

To have the array of unique values sorted, set sort=True like so:

codes, uniques = pd.factorize(["B", "A", "A", "C", "B"], sort=True)
print("codes:", codes)
print("uniques:", uniques)
codes: [1 0 0 2 1]
uniques: ['A' 'B' 'C']

Notice how the uniques are sorted, and the codes array also reflects this.

Specifying na_sentinel

By default, NaN values are marked as -1 in the codes array:

codes, uniques = pd.factorize(["B", np.NaN, "A", "C", "B"])
print("codes:", codes)
print("uniques:", uniques)
codes: [ 0 -1 1 2 0]
uniques: ['B' 'A' 'C']

We can choose our own value by passing in na_sentinel like so:

codes, uniques = pd.factorize(["B", np.NaN, "A", "C", "B"], na_sentinel=50)
print("codes:", codes)
print("uniques:", uniques)
codes: [ 0 50 1 2 0]
uniques: ['B' 'A' 'C']
mail
Join our newsletter for updates on new DS/ML comprehensive guides (spam-free)
robocat
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
Ask a question or leave a feedback...
0
thumb_down
0
chat_bubble_outline
0
settings
Enjoy our search
Hit / to insta-search docs and recipes!