Pandas Series str | extract method
Start your free 7-days trial now!
Pandas Series str.extract(~)
extracts the first matched substrings using regular expression.
To extract all matches instead of just the first one, use str.extractall(~)
.
Parameters
1. pat
link | str
Regular expression to match.
2. flags
| int
| optional
The flags to set from the re
library (e.g. re.IGNORECASE
). Multiple flags can be set by combining them with the bitwise |
(e.g. re.IGNORECASE | re.MULTILINE
).
3. expand
link | boolean
| optional
If
True
, then a pattern with one group will return DataFrame.If
False
, then a pattern with one group will returnSeries
orIndex
.
By default, expand=True
.
Return Value
If
expand=True
, then a DataFrame is returned.If
expand=False
, then a pattern with one group will returnSeries
orIndex
.In case of multiple capturing groups, then a DataFrame is returned regardless of
expand
.
Examples
Basic usage
Consider the following DataFrame:
A0 a11 b22 c3
To get extract substrings that match a given regex:
df['A'].str.extract('[ab](\d+)')
00 11 22 NaN
Here, [ab]
means either a
or b
, and \d+
denotes a number. We use ()
to indicate the part we want to extract.
Multiple capturing groups
We can capture multiple groups using multiple brackets like so:
df['A'].str.extract('([ab])(\d+)') # returns a DataFrame
0 10 a 11 b 22 NaN NaN
Setting expand
Consider the following DataFrame:
A0 a11 b22 c3
By default, expand=True
, which means that even if there is only one capturing group, a DataFrame will be returned:
df['A'].str.extract('[ab](\d+)') # expand=True
00 11 22 NaN
To get a Series (or Index) instead, set expand=False
:
df['A'].str.extract('[ab](\d+)', expand=False)
0 11 22 NaNName: A, dtype: object