DataFrame Looping (iteration) with a for statement. You can loop over a pandas dataframe, for each column row by row.

Related course: Data Analysis with Python Pandas

Below pandas. Using a DataFrame as an example.

1
2
3
4
5
6
import pandas as pd

df = pd.DataFrame({'age': [20, 32], 'state': ['NY', 'CA'], 'point': [64, 92]},
index=['Alice', 'Bob'])

print(df)

This outputs this dataframe:

1
2
3
       age state  point
Alice 20 NY 64
Bob 32 CA 92

Loop over columns

If you stick the DataFrame directly into a for loop, the column names (column names) are retrieved in order as follows:

1
2
3
4
for column_name in df:
print(type(column_name))
print(column_name)
print('------\n')

This outputs:

1
2
3
4
5
6
7
8
9
10
11
<class 'str'>
age
------

<class 'str'>
state
------

<class 'str'>
point
------

Iterate dataframe

.iteritems()

You can use the iteritems() method to use the column name (column name) and the column data (pandas. Series) tuple (column name, Series) can be obtained.

1
2
3
4
5
6
7
8
9
10
11
12
13
import pandas as pd

df = pd.DataFrame({'age': [20, 32], 'state': ['NY', 'CA'], 'point': [64, 92]},
index=['Alice', 'Bob'])

for column_name, item in df.iteritems():
print(type(column_name))
print(column_name)
print('~~~~~~')

print(type(item))
print(item)
print('------')

This outputs:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
<class 'str'>
age
~~~~~~
<class 'pandas.core.series.Series'>
Alice 20
Bob 32
Name: age, dtype: int64
------
<class 'str'>
state
~~~~~~
<class 'pandas.core.series.Series'>
Alice NY
Bob CA
Name: state, dtype: object
------
<class 'str'>
point
~~~~~~
<class 'pandas.core.series.Series'>
Alice 64
Bob 92
Name: point, dtype: int64
------

.iterrows()

You can use the iterrows() method to use the index name (row name) and the data (pandas. Series) tuple (index, Series) can be obtained.

1
2
3
4
5
6
7
8
9
10
11
12
13
import pandas as pd

df = pd.DataFrame({'age': [20, 32], 'state': ['NY', 'CA'], 'point': [64, 92]},
index=['Alice', 'Bob'])

for index, row in df.iterrows():
print(type(index))
print(index)
print('~~~~~~')

print(type(row))
print(row)
print('------')

This results in:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
<class 'str'>
Alice
~~~~~~
<class 'pandas.core.series.Series'>
age 20
state NY
point 64
Name: Alice, dtype: object
------
<class 'str'>
Bob
~~~~~~
<class 'pandas.core.series.Series'>
age 32
state CA
point 92
Name: Bob, dtype: object
------

.itertuples()

You can use the itertuples() method to retrieve a column of index names (row names) and data for that row, one row at a time. The first element of the tuple is the index name.

By default, it returns namedtuple namedtuple named Pandas. Namedtuple allows you to access the value of each element in addition to [].

1
2
3
4
5
6
7
8
9
10
11
12
13
import pandas as pd

df = pd.DataFrame({'age': [20, 32], 'state': ['NY', 'CA'], 'point': [64, 92]},
index=['Alice', 'Bob'])

for row in df.itertuples():
print(type(row))
print(row)
print('------')

print(row[3])
print(row.point)
print('------\n')

This outputs the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
<class 'pandas.core.frame.Pandas'>
Pandas(Index='Alice', age=20, state='NY', point=64)
------
64
64
------

<class 'pandas.core.frame.Pandas'>
Pandas(Index='Bob', age=32, state='CA', point=92)
------
92
92
------

Retrieve column values

It’s possible to get the values of a specific column in order.

The iterrows(), itertuples() method described above can retrieve elements for all columns in each row, but can also be written as follows if you only need elements for a particular column:

1
2
3
4
5
6
7
print(df['age'])
# Alice 24
# Bob 42
# Name: age, dtype: int64

print(type(df['age']))
# <class 'pandas.core.series.Series'>

When you apply a Series to a for loop, you can get its value in order. If you specify a column in the DataFrame and apply it to a for loop, you can get the value of that column in order.

1
2
for age in df['age']:
print(age)

It is also possible to obtain the values of multiple columns together using the built-in function zip().

1
2
for age, point in zip(df['age'], df['point']):
print(age, point)

If you want to get the index (line name), use the index attribute.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
print(df.index)
# Index(['Alice', 'Bob'], dtype='object')

print(type(df.index))
# <class 'pandas.core.indexes.base.Index'>

for index in df.index:
print(index)
# Alice
# Bob

for index, state in zip(df.index, df['state']):
print(index, state)
# Alice NY
# Bob CA

Related course: Data Analysis with Python Pandas