Getting the longest string in a column in Pandas DataFrame
Start your free 7-days trial now!
Consider the following DataFrame:
A0 a1 abc2 def
Solution
To get the longest string in column A
:
Here, the returned value is a DataFrame
. If you wanted a list of the longest strings instead:
Note the following:
to_numpy()
converts the DataFrame into a 2D Numpy arrayravel()
is used to flatten this 2D array into a 1D arraytolist()
is used to convert this NumPy array into a standard Python list.
Explanation
We first start by computing the length of each string in column A
using the Series' str.len()
method:
lengths
0 11 32 3Name: A, dtype: int64
We then get a Series of booleans where True
indicates the position of the longest strings:
0 False1 True2 TrueName: A, dtype: bool
We then use NumPy's where(~)
to get all the integer indexes of True
:
argmax
array([1, 2])
The [0]
is needed at the back because where(~)
returns a tuple where the first element is the integer indexes.
Finally, we use the iloc
property to extract the rows in df
given these integer indexes:
A1 abc2 def
Why argmax method does not work
You may be tempted to directly use the Series' argmax(~)
method like so:
However, the problem with argmax(~)
is that it only returns the integer index of the first occurrence of the maximum, as demonstrated in the output above.