Missing index values results in KeyError #6

Tojens · 2019-11-19T13:24:51Z

Hi,

Hoping for some help (once again) :)

I've almost managed to get this repo working with my own data, but it seems like package updates to pandas has made some of the code uncooperative.

It is failing once it encounters missing index values because the functionality of .loc[] has changed but manages to run a couple small batches before it encounters a missing index value, which is explained in the traceback below:

FutureWarning: 
Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

See the documentation here:
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#deprecate-loc-reindex-listlike

Current batch num: 10
Current batch num: 20
Current batch num: 30

---------------------------------------------------------------------------

KeyError                                  Traceback (most recent call last)
C:\thjen\Lidar\pointnet-master\sem_seg\train_custom.py in <module>
    387 
    388 if __name__ == "__main__":
--> 389     train()
    390     LOG_FOUT.close()
    391     del df

C:\thjen\Lidar\pointnet-master\sem_seg\train_custom.py in train()
    304             sys.stdout.flush()
    305 
--> 306             train_one_epoch(sess, ops, train_writer)
    307             eval_one_epoch(sess, ops, test_writer)
    308 

C:\thjen\Lidar\pointnet-master\sem_seg\train_custom.py in train_one_epoch(sess, ops, train_writer)
    322 
    323     num_batches = 0
--> 324     for batch_data, batch_label in generator(df, train_hashes, BATCH_SIZE, NUM_POINT, N_AUGMENTATIONS):
    325         num_batches += 1 * BATCH_SIZE
    326         if num_batches % 10 == 0:

C:\thjen\Lidar\pointnet-master\sem_seg\train_custom.py in generator(df, hashes, BATCH_SIZE, NUM_POINT, N_AUGMENTATIONS, shuffled)
    193 
    194     for batch in batches:
--> 195         df_batch = [random_sample_block(df.loc[h], seed) for s,h in batch]
    196         data = np.stack([b[data_channels].values for b in df_batch])
    197         label = np.stack([l.label.values for l in df_batch])

C:\thjen\Lidar\pointnet-master\sem_seg\train_custom.py in <listcomp>(.0)
    193
    194     for batch in batches:
--> 195         df_batch = [random_sample_block(df.loc[h], seed) for s,h in batch]
    196         data = np.stack([b[data_channels].values for b in df_batch])
    197         label = np.stack([l.label.values for l in df_batch])

~\AppData\Local\Continuum\anaconda3\envs\tf1\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key)
   1422 
   1423             maybe_callable = com.apply_if_callable(key, self.obj)
-> 1424             return self._getitem_axis(maybe_callable, axis=axis)
   1425 
   1426     def _is_scalar_access(self, key: Tuple):

~\AppData\Local\Continuum\anaconda3\envs\tf1\lib\site-packages\pandas\core\indexing.py in _getitem_axis(self, key, axis)
   1837                     raise ValueError("Cannot index with multidimensional key")
   1838 
-> 1839                 return self._getitem_iterable(key, axis=axis)
   1840 
   1841             # nested tuple slicing

~\AppData\Local\Continuum\anaconda3\envs\tf1\lib\site-packages\pandas\core\indexing.py in _getitem_iterable(self, key, axis)
   1131         else:
   1132             # A collection of keys
-> 1133             keyarr, indexer = self._get_listlike_indexer(key, axis, raise_missing=False)
   1134             return self.obj._reindex_with_indexers(
   1135                 {axis: [keyarr, indexer]}, copy=True, allow_dups=True

~\AppData\Local\Continuum\anaconda3\envs\tf1\lib\site-packages\pandas\core\indexing.py in _get_listlike_indexer(self, key, axis, raise_missing)
   1090 
   1091         self._validate_read_indexer(
-> 1092             keyarr, indexer, o._get_axis_number(axis), raise_missing=raise_missing
   1093         )
   1094         return keyarr, indexer

~\AppData\Local\Continuum\anaconda3\envs\tf1\lib\site-packages\pandas\core\indexing.py in _validate_read_indexer(self, key, indexer, axis, raise_missing)
   1175                 raise KeyError(
   1176                     "None of [{key}] are in the [{axis}]".format(
-> 1177                         key=key, axis=self.obj._get_axis_name(axis)
   1178                     )
   1179                 )

KeyError: "None of [Int64Index([142, 14277], dtype='int64', name='hash')] are in the [index]"

If you have any suggestions how I might fix this problem with missing index values, I'd be very interested in hearing them.

The text was updated successfully, but these errors were encountered:

davdmaccartney · 2019-11-20T00:54:02Z

Hi, i don't use df.loc anymore :
df_batch =[random_sample_block(df[df['hash']==h],s) for s,h in batch]

Tojens · 2019-11-20T10:26:14Z

Hi David,

Thank you very much!

I tried replacing the existing with your suggestion, but df['hash']==h must match in length for it to not return a ValueError like this and the lengths of the two variables are pretty far apart.

I've also printed the lengths of both 'h' and df['hash'] at the start of the code block.

All suggestions are appreciated :)


[30793 rows x 8 columns]
Tensor("fc2/Relu:0", shape=(2, 128), dtype=float32, device=/device:GPU:0)
**** EPOCH 000 ****
----


length of h: 2
length of df[hash]: 30793





---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
C:\thjen\Lidar\pointnet-master\sem_seg\train_custom.py in <module>
    388 
    389 if __name__ == "__main__":
--> 390     train()
    391     LOG_FOUT.close()
    392     del df

C:\thjen\Lidar\pointnet-master\sem_seg\train_custom.py in train()
    304             sys.stdout.flush()
    305 
--> 306             train_one_epoch(sess, ops, train_writer)
    307             eval_one_epoch(sess, ops, test_writer)
    308 

C:\thjen\Lidar\pointnet-master\sem_seg\train_custom.py in train_one_epoch(sess, ops, train_writer)
    322 
    323     num_batches = 0
--> 324     for batch_data, batch_label in generator(df, train_hashes, BATCH_SIZE, NUM_POINT, N_AUGMENTATIONS):
    325         num_batches += 1 * BATCH_SIZE
    326         if num_batches % 10 == 0:

C:\thjen\Lidar\pointnet-master\sem_seg\train_custom.py in generator(df, hashes, BATCH_SIZE, NUM_POINT, N_AUGMENTATIONS, shuffled)
    193         print('length of h:', len(h))
    194         print('length of df[hash]:', len(df['hash']))
--> 195         df_batch = [random_sample_block(df[df['hash']==h], s) for s, h in batch]
    196         data = np.stack([b[data_channels].values for b in df_batch])
    197         label = np.stack([l.label.values for l in df_batch])

C:\thjen\Lidar\pointnet-master\sem_seg\train_custom.py in <listcomp>(.0)
    193         print('length of h:', len(h))
    194         print('length of df[hash]:', len(df['hash']))
--> 195         df_batch = [random_sample_block(df[df['hash']==h], s) for s, h in batch]
    196         data = np.stack([b[data_channels].values for b in df_batch])
    197         label = np.stack([l.label.values for l in df_batch])

~\AppData\Local\Continuum\anaconda3\envs\tf1\lib\site-packages\pandas\core\ops\__init__.py in wrapper(self, other, axis)
   1205             # as it will broadcast
   1206             if other.ndim != 0 and len(self) != len(other):
-> 1207                 raise ValueError("Lengths must match to compare")
   1208 
   1209             res_values = na_op(self.values, np.asarray(other))

ValueError: Lengths must match to compare

davdmaccartney · 2019-11-20T10:28:51Z

well as I changed quite a lot the original code it might be normal, contact me by email
[email protected]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing index values results in KeyError #6

Missing index values results in KeyError #6

Tojens commented Nov 19, 2019

davdmaccartney commented Nov 20, 2019

Tojens commented Nov 20, 2019

davdmaccartney commented Nov 20, 2019

Missing index values results in KeyError #6

Missing index values results in KeyError #6

Comments

Tojens commented Nov 19, 2019

davdmaccartney commented Nov 20, 2019

Tojens commented Nov 20, 2019

davdmaccartney commented Nov 20, 2019