Do you want to get the HTML source code of a webpage with Python selenium? In this article you will learn how to do that.
Selenium is a Python module for browser automation. You can use it to grab HTML code, what webpages are made of: HyperText Markup Language (HTML).
What is HTML source? This is the code that is used to construct a web page. It is a markup language.
To get it, first you need to have selenium and the web driver install. You can let Python fire the web browser, open the web page URL and grab the HTML source.
To start, install the selenium module for Python.
pip install selenium
For windows users, do this instead:
pip.exe install selenium
It’s recommended that you do that in a virtual environment using virtualenv.
If you use the PyCharm IDE, you can install the module from inside the IDE.
Make sure you have the web driver installed, or it will not work.
You can retrieve the HTML source of an URL with the code shown below.
It first starts the web browser (Firefox), loads the page and then outputs the HTML code.
The code below starts the Firefox web rbowser, opens a webpage with the get() method and finally stores the webpage html with browser.page_source.
This is done in a few steps first importing selenium and the time module.
from selenium import webdriver
It starts the web browser with a single line of code. In this example we use Firefox, but any of the supported browsers. will do (Chrome, Edge, PhantomJS).
# start web browser
The URL you want to get is opened, this just opens the link in the browser.
# get source code
Then you can use the attribute .page_source to get the HTML code.
html = browser.page_source
You can then optionally output the HTML source (or do something else with it).
Don’t forget to close the web browser.
# close web browser
If you are new to selenium, then I highly recommend this book.