Pandas Series str | extractall method
Start your free 7-days trial now!
Pandas Series' str.extractall(~)
extracts all the matched substrings using regular expression.
To extract the first match instead of all matches, use str.extract(~)
.
Parameters
1. pat
link | str
Regular expression to match.
2. flags
| int
| optional
The flags to set from the re
library (e.g. re.IGNORECASE
). Multiple flags can be set by combining them with the bitwise |
(e.g. re.IGNORECASE | re.MULTILINE
).
Return Value
A multi-index DataFrame.
Examples
Basic usage
Consider the following DataFrame:
Aa k23b 45kc 67k89
To get extract substrings that match a given regex:
df['A'].str.extractall('(\d+)') # returns a multi-index DataFrame
0match a 0 23b 0 45c 0 67 1 89
Here, the input string is a regex, and \d+
indicates a number, while ()
indicates the portion we want to extract.
Since the resulting DataFrame is a multi-index, we can obtain the matches for specific indexes like so:
df_result = df['A'].str.extractall('(\d+)')
0match 0 671 89
Multiple capturing groups
Consider the following DataFrame:
Aa k23b 45yc 67k89
We can capture multiple groups using multiple brackets:
df['A'].str.extractall('(\d+)([ky])')
0 1 match b 0 45 yc 0 67 k