Iterate pandas dataframe - Python Tutorial

DataFrame Looping (iteration) with a for statement. You can loop over a pandas dataframe, for each column row by row.

Related course: Data Analysis with Python Pandas

Below pandas. Using a DataFrame as an example.

import pandas as pd

df = pd.DataFrame({'age': [20, 32], 'state': ['NY', 'CA'], 'point': [64, 92]},
                  index=['Alice', 'Bob'])

print(df)

This outputs this dataframe:

1
2
3

       age state  point
Alice   20    NY     64
Bob     32    CA     92

Loop over columns

If you stick the DataFrame directly into a for loop, the column names (column names) are retrieved in order as follows:

for column_name in df:
    print(type(column_name))
    print(column_name)
    print('------\n')

This outputs:

<class 'str'>
age
------

<class 'str'>
state
------

<class 'str'>
point
------

Iterate dataframe

.iteritems()

You can use the iteritems() method to use the column name (column name) and the column data (pandas. Series) tuple (column name, Series) can be obtained.

import pandas as pd

df = pd.DataFrame({'age': [20, 32], 'state': ['NY', 'CA'], 'point': [64, 92]},
                  index=['Alice', 'Bob'])

for column_name, item in df.iteritems():
    print(type(column_name))
    print(column_name)
    print('~~~~~~')

    print(type(item))
    print(item)
    print('------')

This outputs:

<class 'str'>
age
~~~~~~
<class 'pandas.core.series.Series'>
Alice    20
Bob      32
Name: age, dtype: int64
------
<class 'str'>
state
~~~~~~
<class 'pandas.core.series.Series'>
Alice    NY
Bob      CA
Name: state, dtype: object
------
<class 'str'>
point
~~~~~~
<class 'pandas.core.series.Series'>
Alice    64
Bob      92
Name: point, dtype: int64
------

.iterrows()

You can use the iterrows() method to use the index name (row name) and the data (pandas. Series) tuple (index, Series) can be obtained.

import pandas as pd

df = pd.DataFrame({'age': [20, 32], 'state': ['NY', 'CA'], 'point': [64, 92]},
                  index=['Alice', 'Bob'])

for index, row in df.iterrows():
    print(type(index))
    print(index)
    print('~~~~~~')

    print(type(row))
    print(row)
    print('------')

This results in:

<class 'str'>
Alice
~~~~~~
<class 'pandas.core.series.Series'>
age      20
state    NY
point    64
Name: Alice, dtype: object
------
<class 'str'>
Bob
~~~~~~
<class 'pandas.core.series.Series'>
age      32
state    CA
point    92
Name: Bob, dtype: object
------

.itertuples()

You can use the itertuples() method to retrieve a column of index names (row names) and data for that row, one row at a time. The first element of the tuple is the index name.

By default, it returns namedtuple namedtuple named Pandas. Namedtuple allows you to access the value of each element in addition to [].

import pandas as pd

df = pd.DataFrame({'age': [20, 32], 'state': ['NY', 'CA'], 'point': [64, 92]},
                  index=['Alice', 'Bob'])

for row in df.itertuples():
    print(type(row))
    print(row)
    print('------')

    print(row[3])
    print(row.point)
    print('------\n')

This outputs the following:

<class 'pandas.core.frame.Pandas'>
Pandas(Index='Alice', age=20, state='NY', point=64)
------
64
64
------

<class 'pandas.core.frame.Pandas'>
Pandas(Index='Bob', age=32, state='CA', point=92)
------
92
92
------

Retrieve column values

It’s possible to get the values of a specific column in order.

The iterrows(), itertuples() method described above can retrieve elements for all columns in each row, but can also be written as follows if you only need elements for a particular column:

print(df['age'])
# Alice    24
# Bob      42
# Name: age, dtype: int64

print(type(df['age']))
# <class 'pandas.core.series.Series'>

When you apply a Series to a for loop, you can get its value in order. If you specify a column in the DataFrame and apply it to a for loop, you can get the value of that column in order.

1 2	for age in df['age']: print(age)

It is also possible to obtain the values of multiple columns together using the built-in function zip().

1 2	for age, point in zip(df['age'], df['point']): print(age, point)

If you want to get the index (line name), use the index attribute.

print(df.index)
# Index(['Alice', 'Bob'], dtype='object')

print(type(df.index))
# <class 'pandas.core.indexes.base.Index'>

for index in df.index:
    print(index)
# Alice
# Bob

for index, state in zip(df.index, df['state']):
    print(index, state)
# Alice NY
# Bob CA

Related course: Data Analysis with Python Pandas