Read Excel files (extensions:.xlsx, .xls) with Python Pandas. To read an excel file as a DataFrame, use the pandas read_excel() method.

You can read the first sheet, specific sheets, multiple sheets or all sheets. Pandas converts this to the DataFrame structure, which is a tabular like structure.

Related course: Data Analysis with Python Pandas

Excel

In this article we use an example Excel file. The programs we’ll make reads Excel into Python.

Creat an excel file with two sheets, sheet1 and sheet2. You can use any Excel supporting program like Microsoft Excel or Google Sheets.

The contents of each are as follows:

sheet1:

Excel sheet 1

sheet2:

Excel sheet 1

Install xlrd

Pandas. .read_excel a.) uses a library called xlrd internally.

xlrd is a library for reading (input) Excel files (.xlsx, .xls) in Python.

Related article: How to use xlrd, xlwt to read and write Excel files in Python

If you call pandas.read_excel s() in an environment where xlrd is not installed, you will receive an error message similar to the following:

ImportError: Install xlrd >= 0.9.0 for Excel support

xlrd can be installed with pip. (pip3 depending on the environment)

1
$ pip install xlrd

Read excel

Specify the path or URL of the Excel file in the first argument.
If there are multiple sheets, only the first sheet is used by pandas.
It reads as DataFrame.

1
2
3
4
5
import pandas as pd

df = pd.read_excel('sample.xlsx')

print(df)

The code above outputs the excel sheet content:

1
2
3
4
  Unnamed: 0   A   B   C
0 one 11 12 13
1 two 21 22 23
2 three 31 32 33

Get sheet

You can specify the sheet to read with the argument sheet_name.

Specify by number (starting at 0)

1
2
3
df_sheet_index = pd.read_excel('sample.xlsx', sheet_name=1)

print(df_sheet_index)
1
2
3
4
#        AA  BB  CC
# ONE 11 12 13
# TWO 21 22 23
# THREE 31 32 33

Specify by sheet name:

1
2
3
df_sheet_name = pd.read_excel('sample.xlsx', sheet_name='sheet2')

print(df_sheet_name)
1
2
3
4
#        AA  BB  CC
# ONE 11 12 13
# TWO 21 22 23
# THREE 31 32 33

Load multiple sheets

It is also possible to specify a list in the argumentsheet_name. It is OK even if it is a number of 0 starting or the sheet name.

The specified number or sheet name is the key key, and the data pandas. The DataFrame is read as the ordered dictionary OrderedDict with the value value.

1
2
3
df_sheet_multi = pd.read_excel('sample.xlsx', sheet_name=[0, 'sheet2'])

print(df_sheet_multi)

Then you can use it like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
print(df_sheet_multi[0])
# A B C
# one 11 12 13
# two 21 22 23
# three 31 32 33

print(type(df_sheet_multi[0]))
# <class 'pandas.core.frame.DataFrame'>

print(df_sheet_multi['sheet2'])
# AA BB CC
# ONE 11 12 13
# TWO 21 22 23
# THREE 31 32 33

print(type(df_sheet_multi['sheet2']))
# <class 'pandas.core.frame.DataFrame'>

Load all sheets

If sheet_name argument is none, all sheets are read.

1
2
df_sheet_all = pd.read_excel('sample.xlsx', sheet_name=None)
print(df_sheet_all)

In this case, the sheet name becomes the key.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
print(df_sheet_all['sheet1'])
# A B C
# one 11 12 13
# two 21 22 23
# three 31 32 33

print(type(df_sheet_all['sheet1']))
# <class 'pandas.core.frame.DataFrame'>

print(df_sheet_all['sheet2'])
# AA BB CC
# ONE 11 12 13
# TWO 21 22 23
# THREE 31 32 33

print(type(df_sheet_all['sheet2']))
# <class 'pandas.core.frame.DataFrame'>

Related course: Data Analysis with Python Pandas