Selenium get HTML source in Python

Do you want to get the HTML source code of a webpage with Python selenium? In this article you will learn how to do that. Selenium is a Python module for browser automation. You can use it to grab HTML code, what webpages are made of: HyperText Markup Language (HTML).

What is HTML source? This is the code that is used to construct a web page. It is a markup language.

To get it, first you need to have selenium and the web driver install. You can let Python fire the web browser, open the web page URL and grab the HTML source.

Practice now: Test your Python skills with interactive challenges

Install Selenium

To start, install the selenium module for Python.

pip install selenium

For windows users, do this instead:

pip.exe install selenium

It's recommended that you do that in a virtual environment using virtualenv. If you use the PyCharm IDE, you can install the module from inside the IDE.

Make sure you have the web driver installed, or it will not work.

Selenium get HTML

You can retrieve the HTML source of an URL with the code shown below. It first starts the web browser (Firefox), loads the page and then outputs the HTML code.

The code below starts the Firefox web rbowser, opens a webpage with the get() method and finally stores the webpage html with browser.page_source.

#_*_coding: utf-8_*_

from selenium import webdriver
import time

# start web browser
browser=webdriver.Firefox()

# get source code
browser.get("https://en.wikipedia.org")
html = browser.page_source
time.sleep(2)
print(html)

# close web browser
browser.close()

selenium get html

This is done in a few steps first importing selenium and the time module.

from selenium import webdriver
import time

It starts the web browser with a single line of code. In this example we use Firefox, but any of the supported browsers. will do (Chrome, Edge, PhantomJS).

# start web browser
browser=webdriver.Firefox()

The URL you want to get is opened, this just opens the link in the browser.

# get source code
browser.get("https://en.wikipedia.org")

Then you can use the attribute .page_source to get the HTML code.

html = browser.page_source
time.sleep(2)
print(html)

You can then optionally output the HTML source (or do something else with it).

time.sleep(2)
print(html)

Don't forget to close the web browser.

# close web browser
browser.close()

Practice now: Test your Python skills with interactive challenges

Download examples