Pandas Dataframe - Python Tutorial

The simple datastructure pandas.DataFrame is described in this article. It includes the related information about the creation, index, addition and deletion. The text is very detailed.

In short: it’s a two-dimensional data structure (like table) with rows and columns.

Related course: Data Analysis with Python Pandas

Create DataFrame

What is a Pandas DataFrame

Pandas is a data manipulation module. DataFrame let you store tabular data in Python.
The DataFrame lets you easily store and manipulate tabular data like rows and columns.

A dataframe can be created from a list (see below), or a dictionary or numpy array (see bottom).

Create DataFrame from list

You can turn a single list into a pandas dataframe:

1
2
3

import pandas as pd
data = [1,2,3]
df = pd.DataFrame(data)

The contents of the dataframe is then:

>>> df
   0
0  1
1  2
2  3
>>>

Before the contents, you’ll see every element has an index (0,1,2).
This works for tables (n-dimensional arrays) too:

import pandas as pd

data = [['Axel',32], ['Alice', 26], ['Alex', 45]]
df = pd.DataFrame(data,columns=['Name','Age'])

This outputs:

>>> df
    Name  Age
0   Axel   32
1  Alice   26
2   Alex   45
>>>

Related course: Data Analysis with Python Pandas

Columns

Select column

To select a column, you can use the column name.

Step 1: Create frame:

>>> df = pd.DataFrame(data,columns=['Name','Age'])
>>> df
    Name  Age
0   Axel   32
1  Alice   26
2   Alex   45

Step 2: Select by column name:

>>> df['Name']
0     Axel
1    Alice
2     Alex
Name: Name, dtype: object

>>> df['Age']
0    32
1    26
2    45
Name: Age, dtype: int64
>>>

Column Addition

You can add a column to a dataframe. So this:

>>> df
    Name  Age
0   Axel   32
1  Alice   26
2   Alex   45

Becomes this:

>>> df
    Name  Age  Example
0   Axel   32        1
1  Alice   26        2
2   Alex   45        3
>>>

Here’s how to do that:

Step 1: Create the dataframe

>>> data = [['Axel',32], ['Alice', 26], ['Alex', 45]]
>>> df = pd.DataFrame(data,columns=['Name','Age'])
>>> 
>>> df
    Name  Age
0   Axel   32
1  Alice   26
2   Alex   45

Step 2: Create a new dataframe with column

1	>>> c = pd.DataFrame([1,2,3], columns=['Example'])

Step 3: Set the column name of your dataframe to that of the newly created one:

>>> df['Example'] = c['Example']
>>> df
    Name  Age  Example
0   Axel   32        1
1  Alice   26        2
2   Alex   45        3
>>>

Column deletion

To delete a column, you can use the keyword del.
The original dataframe:

>>> df
    Name  Age  Example
0   Axel   32        1
1  Alice   26        2
2   Alex   45        3

Then delete it:

1	>>> del df['Example']

And it will delete that column:

>>> df
    Name  Age
0   Axel   32
1  Alice   26
2   Alex   45
>>>

Related course: Data Analysis with Python Pandas

Rows

Select row

You can select a row using .loc[label].

>>> df
    Name  Age
0   Axel   32
1  Alice   26
2   Alex   45
>>> 
>>> df.loc[0]
Name    Axel
Age       32
Name: 0, dtype: object
>>> 
>>> df.loc[2]
Name    Alex
Age       45
Name: 2, dtype: object
>>>

You can select by index too, .iloc[index].

>>> df.iloc[0]
Name    Axel
Age       32
Name: 0, dtype: object
>>>

Append row

You can append a row by calling the .append() method on the dataframe.
First create a new dataframe:

1	>>> user = pd.DataFrame([['Vivian',33]], columns= ['Name','Age'])

Then add it to the existing dataframe:

1	>>> df = df.append(user)

>>> df
     Name  Age
0    Axel   32
1   Alice   26
2    Alex   45
0  Vivian   33
>>>

Delete row

To delete a row, you can use the method .drop(index).

Start by creating a frame:

>>> data = [['Axel',32], ['Alice', 26], ['Alex', 45]]
>>> df = pd.DataFrame(data,columns=['Name','Age'])
>>> df
    Name  Age
0   Axel   32
1  Alice   26
2   Alex   45

Lets delete the first row:

>>> df = df.drop(0)
>>> df
    Name  Age
1  Alice   26
2   Alex   45
>>>

DataFrame creation

Create DataFrame from dictionary

If you have a dictionary, you can turn it into a dataframe.

>>> import pandas as pd
aa>>> d = {'one':[1,2,3], 'two':[2,3,4], 'three':[3,4,5] }
>>> df = pd.DataFrame(d)
>>> df
   one  two  three
0    1    2      3
1    2    3      4
2    3    4      5
>>>

The keys in the dictionary are columns in the DataFrame, but there is no value for the index, so you need to set it yourself, and no default is to count from zero.

>>> df = pd.DataFrame(d, index=['first','second','third'])
>>> df
        one  two  three
first     1    2      3
second    2    3      4
third     3    4      5
>>>

Create DataFrame from array

An array (numpy array) can be converted into an dataframe too.

>>> import numpy as np
>>> ar = np.array([[1,2,3],[4,5,6],[6,7,8]])
>>> ar
array([[1, 2, 3],
       [4, 5, 6],
       [6, 7, 8]])

Then turn it into a dataframe with the line:

>>> df = pd.DataFrame(ar)
>>> df
   0  1  2
0  1  2  3
1  4  5  6
2  6  7  8
>>>

Creating a DataFrame assignment columns and index is created from a multi-dimensional array, otherwise it is the default, ugly.

>>> df = pd.DataFrame(ar, index=['A','B','C'], columns=['One','Two','Three'])
>>> df
   One  Two  Three
A    1    2      3
B    4    5      6
C    6    7      8
>>>

Create from DataFrame

You can copy parts of a dataframe into a new dataframe.
Using the dataframe above:

>>> df2 = df[['One','Two']].copy()
>>> df2
   One  Two
A    1    2
B    4    5
C    6    7
>>>

Create from CSV

If you have a csv file (Google Sheets can save as csv), you can load it like this:

# Import pandas as pd
import pandas as pd

# Import the cats.csv data: cats
cats = pd.read_csv('cats.csv')

# Print out cats
print(cats)