search
Search
Login
Unlock 100+ guides
menu
menu
web
search toc
close
Comments
Log in or sign up
Cancel
Post
account_circle
Profile
exit_to_app
Sign out
What does this mean?
Why is this true?
Give me some examples!
search
keyboard_voice
close
Searching Tips
Search for a recipe:
"Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview
icon_star
Doc Search
icon_star
Code Search Beta
SORRY NOTHING FOUND!
mic
Start speaking...
Voice search is only supported in Safari and Chrome.
Navigate to

Pandas | factorize method

schedule Aug 12, 2023
Last updated
local_offer
PythonPandas
Tags
mode_heat
Master the mathematics behind data science with 100+ top-tier guides
Start your free 7-days trial now!

Pandas factorize(~) method returns the following:

  • an array of integer indices to map the input array to the unique values.

  • all the unique values of the input array.

Parameters

1. valueslink | sequence

A 1D sequence of values.

2. sortlink | boolean | optional

Whether or not to sort the resulting array of unique values. By default, sort=False.

3. na_sentinellink | int | optional

The value to mark NaN in the array of integer indices. By default, na_sentinel=-1.

Return Value

The following two NumPy arrays are returned:

  • an array of integer indices that maps the input array to the array of unique values.

  • an array containing the unique values of the input array.

Examples

Basic usage

codes, uniques = pd.factorize(["B", "A", "A", "C", "B"])
print("codes:", codes)
print("uniques:", uniques)
codes: [0 1 1 2 0]
uniques: ['B' 'A' 'C']

Note the following:

  • the codes array maps the values in the input array to the uniques array.

  • the unique values are ordered as they appear in the input array.

You can recreate the input array using codes and uniques like so:

uniques[codes]
array(['B', 'A', 'A', 'C', 'B'], dtype=object)

Specifying sort

By default, sort=False, which means that the returned array of unique values is not sorted.

To have the array of unique values sorted, set sort=True like so:

codes, uniques = pd.factorize(["B", "A", "A", "C", "B"], sort=True)
print("codes:", codes)
print("uniques:", uniques)
codes: [1 0 0 2 1]
uniques: ['A' 'B' 'C']

Notice how the uniques are sorted, and the codes array also reflects this.

Specifying na_sentinel

By default, NaN values are marked as -1 in the codes array:

codes, uniques = pd.factorize(["B", np.NaN, "A", "C", "B"])
print("codes:", codes)
print("uniques:", uniques)
codes: [ 0 -1 1 2 0]
uniques: ['B' 'A' 'C']

We can choose our own value by passing in na_sentinel like so:

codes, uniques = pd.factorize(["B", np.NaN, "A", "C", "B"], na_sentinel=50)
print("codes:", codes)
print("uniques:", uniques)
codes: [ 0 50 1 2 0]
uniques: ['B' 'A' 'C']
robocat
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down
Comment
Citation
Ask a question or leave a feedback...
thumb_up
0
thumb_down
0
chat_bubble_outline
0
settings
Enjoy our search
Hit / to insta-search docs and recipes!