Problem
In this starter project, we are given a task to extract all hyperlinks outgoing from the Scientific Programming School homepage
.

Solution
The solution below uses BS4 function findAll
of the soup
class object. It first downloads the raw html code with the line:
html_page = urllib.urlopen("https://scientificprogramming.io")
Then a BeautifulSoup object soup
is created
soup = BeautifulSoup(html_page)
Finally, we use this object to find all links:
for link in soup.findAll('a', attrs={'href': re.compile("^https://")}):
print(link.get('href'))
Note that we use Python Regular Expression (re.compile
) and only search for the SSL enabled linkes that starts with the https
. Let's now execute the code:
Python (3.7.3)
-
Input