Read CSV with Pandas - Python Tutorial

To read the csv file as pandas.DataFrame, use the pandas function read_csv() or read_table().

The difference between read_csv() and read_table() is almost nothing. In fact, the same function is called by the source:

read_csv() delimiter is a comma character
read_table() is a delimiter of tab \t.

Related course: Data Analysis with Python Pandas

Read CSV

Read csv with Python

The pandas function read_csv() reads in values, where the delimiter is a comma character.
You can export a file into a csv file in any modern office suite including Google Sheets.

Use the following csv data as an example.

name,age,state,point
Alice,24,NY,64
Bob,42,CA,92
Charlie,18,CA,70
Dave,68,TX,70
Ellen,24,CA,88
Frank,30,NY,57
Alice,24,NY,64
Bob,42,CA,92
Charlie,18,CA,70
Dave,68,TX,70
Ellen,24,CA,88
Frank,30,NY,57

You can load the csv like this:

# Load pandas
import pandas as pd

# Read CSV file into DataFrame df
df = pd.read_csv('sample.csv', index_col=0)

# Show dataframe
print(df)

It then outputs the data frame:

#          age state  point
# name                     
# Alice     24    NY     64
# Bob       42    CA     92
# Charlie   18    CA     70
# Dave      68    TX     70
# Ellen     24    CA     88
# Frank     30    NY     57

If you want to export data from a DataFrame or pandas.Series as a csv file or append it to an existing csv file, use the to_csv() method.

Read csv without header

Read a csv file that does not have a header (header line):

11,12,13,14
21,22,23,24
31,32,33,34

Specify the path relative path to the absolute path or the relative path from the current directory (the working directory).See the following articles for information on verifying or modifying the current directory.

If none of the arguments are set, the first line is recognized as a header and assigned to the column name columns.

import pandas as pd

df = pd.read_csv('data/src/sample.csv')
print(df)
#    11  12  13  14
# 0  21  22  23  24
# 1  31  32  33  34

print(df.columns)
# Index(['11', '12', '13', '14'], dtype='object')

If header = None, the sequential number is assigned to the column name columns.

df_none = pd.read_csv('data/src/sample.csv', header=None)
print(df_none)
#     0   1   2   3
# 0  11  12  13  14
# 1  21  22  23  24
# 2  31  32  33  34

names=('A', 'B', 'C', 'D') As a result, arbitrary values can be set as column names.Specify in lists and tuples.

df_names = pd.read_csv('data/src/sample.csv', names=('A', 'B', 'C', 'D'))
print(df_names)
#     A   B   C   D
# 0  11  12  13  14
# 1  21  22  23  24
# 2  31  32  33  34

Related course: Data Analysis with Python Pandas

Read csv with header

Read the following csv file with header:

a,b,c,d
11,12,13,14
21,22,23,24
31,32,33,34

Specify the line number of the header as 0, such as header= 0.The default is header= 0, and if the first line is header, the result is the same result.

df_header = pd.read_csv('data/src/sample_header.csv')
print(df_header)
#     a   b   c   d
# 0  11  12  13  14
# 1  21  22  23  24
# 2  31  32  33  34

df_header_0 = pd.read_csv('data/src/sample_header.csv', header=0)
print(df_header_0)
#     a   b   c   d
# 0  11  12  13  14
# 1  21  22  23  24
# 2  31  32  33  34

Data is read from the line specified by header, and the above lines are ignored.

df_header_2 = pd.read_csv('data/src/sample_header.csv', header=2)
print(df_header_2)
#    21  22  23  24
# 0  31  32  33  34

Read csv with index

Read a csv file with header and index (header column), such as:

,a,b,c,d
ONE,11,12,13,14
TWO,21,22,23,24
THREE,31,32,33,34

The index column is not recognized, especially if nothing is specified.
So add index_col=0

Specifies the column number of the column that you want to use as the index as the index, starting with 0.

df_header_index_col = pd.read_csv('data/src/sample_header_index.csv', index_col=0)
print(df_header_index_col)
#         a   b   c   d
# ONE    11  12  13  14
# TWO    21  22  23  24
# THREE  31  32  33  34

print(df_header_index_col.index)
# Index(['ONE', 'TWO', 'THREE'], dtype='object')

Related course: Data Analysis with Python Pandas