near_me
Linear Algebra
keyboard_arrow_down 54 guides
chevron_leftText Extraction Cookbook
check_circle
Mark as learned thumb_up
1
thumb_down
0
chat_bubble_outline
0
Comment auto_stories Bi-column layout
settings
Extracting all text from an element in Beautiful Soup
schedule Aug 12, 2023
Last updated local_offer
Tags Python●Beautiful Soup
tocTable of Contents
expand_more Master the mathematics behind data science with 100+ top-tier guides
Start your free 7-days trial now!
Start your free 7-days trial now!
To extract all text from an element in Beautiful Soup, use the get_text()
method.
Examples
Consider the following HTML document:
my_html = """ <div> <p>I like tea.</p> <p>I like <b>soup</b>.</p> I like soda. </div>"""soup = BeautifulSoup(my_html)
Extracting raw text
To extract all text:
Notice how you end up with awkward structure due to the spacings.
Extracting stripped text
To solve the problem of awkward spacings, add the strip=True
parameter:
This looks much cleaner.
Specifying a separator
To join the bits and pieces of text using "**"
as the separator:
To explain the output, recall that our HTML document's middle line was as follows:
<p>I like <b>soup</b>.</p>
Each pair of opening and closing tags are replaced by your specified separator - that's all.
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down
Comment
Citation
Ask a question or leave a feedback...
thumb_up
1
thumb_down
0
chat_bubble_outline
0
settings
Enjoy our search
Hit / to insta-search docs and recipes!