two dimensional DtatFrame indexing in panda

One of the important libraries in data science and data engineering is that Panda. Data in panada is defined as DataFrame class.

In this post, I want to speak about indexing in DataFrame.

If you define a DataFrame as follow:

import pandas as pd

K = pd.DataFrame(np.random.rand(5,6))

0 0.457355 0.695109 0.960173 0.895233 0.913107 0.997462
1 0.159627 0.006112 0.751829 0.641470 0.430603 0.005721
2 0.167967 0.232892 0.000698 0.646807 0.359331 0.859992
3 0.114184 0.332704 0.224112 0.058897 0.547509 0.734783
4 0.623049 0.403003 0.384613 0.663572 0.866130 0.084359

You can not index, the value of K similar to array in python.

K[0,0]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File “”, line 1, in
K[0,0]

File “/home/kazem/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py”, line 2800, in getitem
indexer = self.columns.get_loc(key)

File “/home/kazem/anaconda3/lib/python3.6/site-packages/pandas/core/indexes/range.py”, line 353, in get_loc
return super().get_loc(key, method=method, tolerance=tolerance)

File “/home/kazem/anaconda3/lib/python3.6/site-packages/pandas/core/indexes/base.py”, line 2648, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))

File “pandas/_libs/index.pyx”, line 111, in pandas._libs.index.IndexEngine.get_loc

File “pandas/_libs/index.pyx”, line 135, in pandas._libs.index.IndexEngine.get_loc

File “pandas/_libs/index_class_helper.pxi”, line 109, in pandas._libs.index.Int64Engine._check_type

KeyError: (0, 0)

Row selection :

Suppose, you want to extract first row, you must use iloc command.

K.iloc[0]

or

K.iloc[0,:]

Out[ ]:
0 0.457355
1 0.695109
2 0.960173
3 0.895233
4 0.913107
5 0.997462
Name: 0, dtype: float64

For selecting the first row, it is just enough to enter zero number in row indexing.

Multiple row selection :

If you want to extract the first and third row, you can use :

K.iloc[[0,2],:]


Out[14]:
0 1 2 3 4 5
0 0.457355 0.695109 0.960173 0.895233 0.913107 0.997462
2 0.167967 0.232892 0.000698 0.646807 0.359331 0.859992

As you see, the first and third row is extracted.

Now, if you want to extract the rows from first to third :

In [19]: K.iloc[range(0,3),:]


Out[19]:
0 1 2 3 4 5
0 0.457355 0.695109 0.960173 0.895233 0.913107 0.997462
1 0.159627 0.006112 0.751829 0.641470 0.430603 0.005721
2 0.167967 0.232892 0.000698 0.646807 0.359331 0.859992

This way is true for column too.

K.iloc[:,range(0,3)]


Out[20]:
0 1 2
0 0.457355 0.695109 0.960173
1 0.159627 0.006112 0.751829
2 0.167967 0.232892 0.000698
3 0.114184 0.332704 0.224112
4 0.623049 0.403003 0.384613

In the top code, we extract the first to third column.

Binary Indexing:

In some applications, it is necessary to index a DataFrame with a binary variable. The variable that is used for binary indexing must have the following conditions:

  • The size of the input vector must be the same with the number of rows or columns of DataFrame
  • The class of input vector must be bool.

Example :

import pandas as pd

K = pd.DataFrame(np.random.rand(5,6))

idx = [1,0,1,1,0]

We want to extract the first and third and fourth row of input DataFrame (K). Every place in idx that is one, shows the selected row. K has five rows, then idx has five cells too.

x = [bool(d) for d in idx]

Idx is a list and we must convert it to bool.

Now, we can simply apply the indexing as :

K.iloc[x,:]


Out[26]:
0 1 2 3 4 5
0 0.457355 0.695109 0.960173 0.895233 0.913107 0.997462
2 0.167967 0.232892 0.000698 0.646807 0.359331 0.859992
3 0.114184 0.332704 0.224112 0.058897 0.547509 0.734783

Binary Indexing in NumPy array :

import numpy as np

G = np.round(10*np.random.rand(6,3))

out :

array([[8., 7., 1.],
[9., 9., 6.],
[4., 9., 9.],
[3., 5., 5.],
[4., 2., 6.],
[2., 0., 9.]])

rb = G <= 5

out :

array([[False, False, True],
[False, False, False],
[ True, False, False],
[ True, True, True],
[ True, True, False],
[ True, True, False]])

G=0

out :

array([[8., 7., 0.],
[9., 9., 6.],
[0., 9., 9.],
[0., 0., 0.],
[0., 0., 6.],
[0., 0., 9.]])

Leave a Reply

Your email address will not be published. Required fields are marked *