Course


Print tags

Learn how to extract text from the soup!

Text extraction

In the example we will learn how extract text using BS4, we will use the following HTML file:

Image

Extract header tag, heading and lists

We use the BS4 module to get three HTML tags (head, h6 and li). The HTML file above is given as the standard input. Lines 5-6 extracts the data from the stdin and saves into the list variable data.

Python (3.6.9)
  • Show Input  

Data download

You can download the index.html file zipped from below:

zip

Supporting Material