Pandas | factorize method
Start your free 7-days trial now!
Pandas factorize(~)
method returns the following:
an array of integer indices to map the input array to the unique values.
all the unique values of the input array.
Parameters
1. values
link | sequence
A 1D sequence of values.
2. sort
link | boolean
| optional
Whether or not to sort the resulting array of unique values. By default, sort=False
.
3. na_sentinel
link | int
| optional
The value to mark NaN
in the array of integer indices. By default, na_sentinel=-1
.
Return Value
The following two NumPy arrays are returned:
an array of integer indices that maps the input array to the array of unique values.
an array containing the unique values of the input array.
Examples
Basic usage
codes, uniques = pd.factorize(["B", "A", "A", "C", "B"])print("codes:", codes)print("uniques:", uniques)
codes: [0 1 1 2 0]uniques: ['B' 'A' 'C']
Note the following:
the
codes
array maps the values in the input array to theuniques
array.the unique values are ordered as they appear in the input array.
You can recreate the input array using codes
and uniques
like so:
uniques[codes]
array(['B', 'A', 'A', 'C', 'B'], dtype=object)
Specifying sort
By default, sort=False
, which means that the returned array of unique values is not sorted.
To have the array of unique values sorted, set sort=True
like so:
codes, uniques = pd.factorize(["B", "A", "A", "C", "B"], sort=True)print("codes:", codes)print("uniques:", uniques)
codes: [1 0 0 2 1]uniques: ['A' 'B' 'C']
Notice how the uniques
are sorted, and the codes
array also reflects this.
Specifying na_sentinel
By default, NaN
values are marked as -1
in the codes
array:
codes, uniques = pd.factorize(["B", np.NaN, "A", "C", "B"])print("codes:", codes)print("uniques:", uniques)
codes: [ 0 -1 1 2 0]uniques: ['B' 'A' 'C']
We can choose our own value by passing in na_sentinel
like so:
codes, uniques = pd.factorize(["B", np.NaN, "A", "C", "B"], na_sentinel=50)print("codes:", codes)print("uniques:", uniques)
codes: [ 0 50 1 2 0]uniques: ['B' 'A' 'C']