To read the csv file as pandas.DataFrame, use the pandas function read_csv() or read_table().

The difference between read_csv() and read_table() is almost nothing. In fact, the same function is called by the source:

  • read_csv() delimiter is a comma character
  • read_table() is a delimiter of tab \t.

Related course: Data Analysis with Python Pandas

Read CSV

Read csv with Python

The pandas function read_csv() reads in values, where the delimiter is a comma character.
You can export a file into a csv file in any modern office suite including Google Sheets.

Use the following csv data as an example.

name,age,state,point
Alice,24,NY,64
Bob,42,CA,92
Charlie,18,CA,70
Dave,68,TX,70
Ellen,24,CA,88
Frank,30,NY,57
Alice,24,NY,64
Bob,42,CA,92
Charlie,18,CA,70
Dave,68,TX,70
Ellen,24,CA,88
Frank,30,NY,57

You can load the csv like this:

1
2
3
4
5
6
7
8
# Load pandas
import pandas as pd

# Read CSV file into DataFrame df
df = pd.read_csv('sample.csv', index_col=0)

# Show dataframe
print(df)

It then outputs the data frame:

1
2
3
4
5
6
7
8
#          age state  point
# name
# Alice 24 NY 64
# Bob 42 CA 92
# Charlie 18 CA 70
# Dave 68 TX 70
# Ellen 24 CA 88
# Frank 30 NY 57

If you want to export data from a DataFrame or pandas.Series as a csv file or append it to an existing csv file, use the to_csv() method.

Read csv without header

Read a csv file that does not have a header (header line):

11,12,13,14
21,22,23,24
31,32,33,34

Specify the path relative path to the absolute path or the relative path from the current directory (the working directory).See the following articles for information on verifying or modifying the current directory.

If none of the arguments are set, the first line is recognized as a header and assigned to the column name columns.

1
2
3
4
5
6
7
8
9
10
import pandas as pd

df = pd.read_csv('data/src/sample.csv')
print(df)
# 11 12 13 14
# 0 21 22 23 24
# 1 31 32 33 34

print(df.columns)
# Index(['11', '12', '13', '14'], dtype='object')

If header = None, the sequential number is assigned to the column name columns.

1
2
3
4
5
6
df_none = pd.read_csv('data/src/sample.csv', header=None)
print(df_none)
# 0 1 2 3
# 0 11 12 13 14
# 1 21 22 23 24
# 2 31 32 33 34

names=('A', 'B', 'C', 'D') As a result, arbitrary values can be set as column names.Specify in lists and tuples.

1
2
3
4
5
6
df_names = pd.read_csv('data/src/sample.csv', names=('A', 'B', 'C', 'D'))
print(df_names)
# A B C D
# 0 11 12 13 14
# 1 21 22 23 24
# 2 31 32 33 34

Related course: Data Analysis with Python Pandas

Read csv with header

Read the following csv file with header:

a,b,c,d
11,12,13,14
21,22,23,24
31,32,33,34

Specify the line number of the header as 0, such as header= 0.The default is header= 0, and if the first line is header, the result is the same result.

1
2
3
4
5
6
7
8
9
10
11
12
13
df_header = pd.read_csv('data/src/sample_header.csv')
print(df_header)
# a b c d
# 0 11 12 13 14
# 1 21 22 23 24
# 2 31 32 33 34

df_header_0 = pd.read_csv('data/src/sample_header.csv', header=0)
print(df_header_0)
# a b c d
# 0 11 12 13 14
# 1 21 22 23 24
# 2 31 32 33 34

Data is read from the line specified by header, and the above lines are ignored.

1
2
3
4
df_header_2 = pd.read_csv('data/src/sample_header.csv', header=2)
print(df_header_2)
# 21 22 23 24
# 0 31 32 33 34

Read csv with index

Read a csv file with header and index (header column), such as:

,a,b,c,d
ONE,11,12,13,14
TWO,21,22,23,24
THREE,31,32,33,34

The index column is not recognized, especially if nothing is specified.
So add index_col=0

Specifies the column number of the column that you want to use as the index as the index, starting with 0.

1
2
3
4
5
6
7
8
9
df_header_index_col = pd.read_csv('data/src/sample_header_index.csv', index_col=0)
print(df_header_index_col)
# a b c d
# ONE 11 12 13 14
# TWO 21 22 23 24
# THREE 31 32 33 34

print(df_header_index_col.index)
# Index(['ONE', 'TWO', 'THREE'], dtype='object')

Related course: Data Analysis with Python Pandas