Getting shortest and longest strings in Pandas DataFrame
Start your free 7-days trial now!
Consider the following Pandas DataFrame:
Getting the shortest strings in a Pandas column
To get the shortest strings in the vals
column:
s_length = df['vals'].str.len()bool_mask = (s_length == s_length.min())df['vals'][bool_mask]
0 aa4 eeName: vals, dtype: object
Here, we first obtain a Series
holding the length of each string using the str.len()
method:
s_length = df['vals'].str.len()s_length
0 21 42 33 44 2Name: vals, dtype: int64
We then compute the minimum string length using s_length.min()
, and create a boolean mask where True
corresponds to strings that are shortest:
bool_mask = (s_length == s_length.min())bool_mask
0 True1 False2 False3 False4 TrueName: vals, dtype: bool
Finally, we use the [~]
notation to fetch the values in the vals
column corresponding to True
:
df['vals'][bool_mask] # Returns a Series
0 aa4 eeName: vals, dtype: object
We could also fetch the rows who have the shortest vals
value as a DataFrame like so:
df[bool_mask] # Returns a DataFrame
vals0 aa4 ee
Getting the longest strings in a Pandas column
The logic for getting the longest strings is very similar to getting the shortest strings:
s_length = df['vals'].str.len()bool_mask = (s_length == s_length.max())df['vals'][bool_mask] # Returns a Series
1 bbbb3 ddddName: vals, dtype: object
Here, we use the max()
instead of min()
to compute the length of the longest string.
Again, to get the rows whose vals
string value is longest as a DataFrame instead of a Series:
df[bool_mask] # Returns a DataFrame
vals1 bbbb3 dddd