Read Excel files (extensions:.xlsx, .xls) with Python Pandas. To read an excel file as a DataFrame, use the pandas read_excel()
method.
You can read the first sheet, specific sheets, multiple sheets or all sheets. Pandas converts this to the DataFrame structure, which is a tabular like structure.
Related course: Data Analysis with Python Pandas
Excel
In this article we use an example Excel file. The programs we’ll make reads Excel into Python.
Creat an excel file with two sheets, sheet1 and sheet2. You can use any Excel supporting program like Microsoft Excel or Google Sheets.
The contents of each are as follows:
sheet1:
sheet2:
Install xlrd
Pandas. .read_excel a.) uses a library called xlrd internally.
xlrd is a library for reading (input) Excel files (.xlsx, .xls) in Python.
Related article: How to use xlrd, xlwt to read and write Excel files in Python
If you call pandas.read_excel s() in an environment where xlrd is not installed, you will receive an error message similar to the following:
ImportError: Install xlrd >= 0.9.0 for Excel support
xlrd can be installed with pip. (pip3 depending on the environment)
1 | $ pip install xlrd |
Read excel
Specify the path or URL of the Excel file in the first argument.
If there are multiple sheets, only the first sheet is used by pandas.
It reads as DataFrame.
1 | import pandas as pd |
The code above outputs the excel sheet content:
1 | Unnamed: 0 A B C |
Get sheet
You can specify the sheet to read with the argument sheet_name.
Specify by number (starting at 0)
1 | df_sheet_index = pd.read_excel('sample.xlsx', sheet_name=1) |
1 | # AA BB CC |
Specify by sheet name:
1 | df_sheet_name = pd.read_excel('sample.xlsx', sheet_name='sheet2') |
1 | # AA BB CC |
Load multiple sheets
It is also possible to specify a list in the argumentsheet_name. It is OK even if it is a number of 0 starting or the sheet name.
The specified number or sheet name is the key key, and the data pandas. The DataFrame is read as the ordered dictionary OrderedDict with the value value.
1 | df_sheet_multi = pd.read_excel('sample.xlsx', sheet_name=[0, 'sheet2']) |
Then you can use it like this:
1 | print(df_sheet_multi[0]) |
Load all sheets
If sheet_name argument is none, all sheets are read.
1 | df_sheet_all = pd.read_excel('sample.xlsx', sheet_name=None) |
In this case, the sheet name becomes the key.
1 | print(df_sheet_all['sheet1']) |
Related course: Data Analysis with Python Pandas