The simple datastructure pandas.DataFrame is described in this article. It includes the related information about the creation, index, addition and deletion. The text is very detailed.

In short: it’s a two-dimensional data structure (like table) with rows and columns.

Related course: Data Analysis with Python Pandas

Create DataFrame

What is a Pandas DataFrame

Pandas is a data manipulation module. DataFrame let you store tabular data in Python.
The DataFrame lets you easily store and manipulate tabular data like rows and columns.

A dataframe can be created from a list (see below), or a dictionary or numpy array (see bottom).

Create DataFrame from list

You can turn a single list into a pandas dataframe:

1
2
3
import pandas as pd
data = [1,2,3]
df = pd.DataFrame(data)

The contents of the dataframe is then:

1
2
3
4
5
6
>>> df
0
0 1
1 2
2 3
>>>

Before the contents, you’ll see every element has an index (0,1,2).
This works for tables (n-dimensional arrays) too:

1
2
3
4
import pandas as pd

data = [['Axel',32], ['Alice', 26], ['Alex', 45]]
df = pd.DataFrame(data,columns=['Name','Age'])

This outputs:

1
2
3
4
5
6
>>> df
Name Age
0 Axel 32
1 Alice 26
2 Alex 45
>>>

Related course: Data Analysis with Python Pandas

Columns

Select column

To select a column, you can use the column name.

Step 1: Create frame:

1
2
3
4
5
6
>>> df = pd.DataFrame(data,columns=['Name','Age'])
>>> df
Name Age
0 Axel 32
1 Alice 26
2 Alex 45

Step 2: Select by column name:

1
2
3
4
5
>>> df['Name']
0 Axel
1 Alice
2 Alex
Name: Name, dtype: object
1
2
3
4
5
6
>>> df['Age']
0 32
1 26
2 45
Name: Age, dtype: int64
>>>

Column Addition

You can add a column to a dataframe. So this:

1
2
3
4
5
>>> df
Name Age
0 Axel 32
1 Alice 26
2 Alex 45

Becomes this:

1
2
3
4
5
6
>>> df
Name Age Example
0 Axel 32 1
1 Alice 26 2
2 Alex 45 3
>>>

Here’s how to do that:

Step 1: Create the dataframe

1
2
3
4
5
6
7
8
>>> data = [['Axel',32], ['Alice', 26], ['Alex', 45]]
>>> df = pd.DataFrame(data,columns=['Name','Age'])
>>>
>>> df
Name Age
0 Axel 32
1 Alice 26
2 Alex 45

Step 2: Create a new dataframe with column

1
>>> c = pd.DataFrame([1,2,3], columns=['Example'])

Step 3: Set the column name of your dataframe to that of the newly created one:

1
2
3
4
5
6
7
>>> df['Example'] = c['Example']
>>> df
Name Age Example
0 Axel 32 1
1 Alice 26 2
2 Alex 45 3
>>>

Column deletion

To delete a column, you can use the keyword del.
The original dataframe:

1
2
3
4
5
>>> df
Name Age Example
0 Axel 32 1
1 Alice 26 2
2 Alex 45 3

Then delete it:

1
>>> del df['Example']

And it will delete that column:

1
2
3
4
5
6
>>> df
Name Age
0 Axel 32
1 Alice 26
2 Alex 45
>>>

Related course: Data Analysis with Python Pandas

Rows

Select row

You can select a row using .loc[label].

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
>>> df
Name Age
0 Axel 32
1 Alice 26
2 Alex 45
>>>
>>> df.loc[0]
Name Axel
Age 32
Name: 0, dtype: object
>>>
>>> df.loc[2]
Name Alex
Age 45
Name: 2, dtype: object
>>>

You can select by index too, .iloc[index].

1
2
3
4
5
>>> df.iloc[0]
Name Axel
Age 32
Name: 0, dtype: object
>>>

Append row

You can append a row by calling the .append() method on the dataframe.
First create a new dataframe:

1
>>> user = pd.DataFrame([['Vivian',33]], columns= ['Name','Age'])

Then add it to the existing dataframe:

1
>>> df = df.append(user)
1
2
3
4
5
6
7
>>> df
Name Age
0 Axel 32
1 Alice 26
2 Alex 45
0 Vivian 33
>>>

Delete row

To delete a row, you can use the method .drop(index).

Start by creating a frame:

1
2
3
4
5
6
7
>>> data = [['Axel',32], ['Alice', 26], ['Alex', 45]]
>>> df = pd.DataFrame(data,columns=['Name','Age'])
>>> df
Name Age
0 Axel 32
1 Alice 26
2 Alex 45

Lets delete the first row:

1
2
3
4
5
6
>>> df = df.drop(0)
>>> df
Name Age
1 Alice 26
2 Alex 45
>>>

DataFrame creation

Create DataFrame from dictionary

If you have a dictionary, you can turn it into a dataframe.

1
2
3
4
5
6
7
8
9
>>> import pandas as pd
aa>>> d = {'one':[1,2,3], 'two':[2,3,4], 'three':[3,4,5] }
>>> df = pd.DataFrame(d)
>>> df
one two three
0 1 2 3
1 2 3 4
2 3 4 5
>>>

The keys in the dictionary are columns in the DataFrame, but there is no value for the index, so you need to set it yourself, and no default is to count from zero.

1
2
3
4
5
6
7
>>> df = pd.DataFrame(d, index=['first','second','third'])
>>> df
one two three
first 1 2 3
second 2 3 4
third 3 4 5
>>>

Create DataFrame from array

An array (numpy array) can be converted into an dataframe too.

1
2
3
4
5
6
>>> import numpy as np
>>> ar = np.array([[1,2,3],[4,5,6],[6,7,8]])
>>> ar
array([[1, 2, 3],
[4, 5, 6],
[6, 7, 8]])

Then turn it into a dataframe with the line:

1
2
3
4
5
6
7
>>> df = pd.DataFrame(ar)
>>> df
0 1 2
0 1 2 3
1 4 5 6
2 6 7 8
>>>

Creating a DataFrame assignment columns and index is created from a multi-dimensional array, otherwise it is the default, ugly.

1
2
3
4
5
6
7
>>> df = pd.DataFrame(ar, index=['A','B','C'], columns=['One','Two','Three'])
>>> df
One Two Three
A 1 2 3
B 4 5 6
C 6 7 8
>>>

Create from DataFrame

You can copy parts of a dataframe into a new dataframe.
Using the dataframe above:

1
2
3
4
5
6
7
>>> df2 = df[['One','Two']].copy()
>>> df2
One Two
A 1 2
B 4 5
C 6 7
>>>

Create from CSV

If you have a csv file (Google Sheets can save as csv), you can load it like this:

1
2
3
4
5
6
7
8
# Import pandas as pd
import pandas as pd

# Import the cats.csv data: cats
cats = pd.read_csv('cats.csv')

# Print out cats
print(cats)