Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing index values results in KeyError #6

Open
Tojens opened this issue Nov 19, 2019 · 3 comments
Open

Missing index values results in KeyError #6

Tojens opened this issue Nov 19, 2019 · 3 comments

Comments

@Tojens
Copy link
Contributor

Tojens commented Nov 19, 2019

Hi,

Hoping for some help (once again) :)

I've almost managed to get this repo working with my own data, but it seems like package updates to pandas has made some of the code uncooperative.

It is failing once it encounters missing index values because the functionality of .loc[] has changed but manages to run a couple small batches before it encounters a missing index value, which is explained in the traceback below:

FutureWarning: 
Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

See the documentation here:
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#deprecate-loc-reindex-listlike

Current batch num: 10
Current batch num: 20
Current batch num: 30

---------------------------------------------------------------------------

KeyError                                  Traceback (most recent call last)
C:\thjen\Lidar\pointnet-master\sem_seg\train_custom.py in <module>
    387 
    388 if __name__ == "__main__":
--> 389     train()
    390     LOG_FOUT.close()
    391     del df

C:\thjen\Lidar\pointnet-master\sem_seg\train_custom.py in train()
    304             sys.stdout.flush()
    305 
--> 306             train_one_epoch(sess, ops, train_writer)
    307             eval_one_epoch(sess, ops, test_writer)
    308 

C:\thjen\Lidar\pointnet-master\sem_seg\train_custom.py in train_one_epoch(sess, ops, train_writer)
    322 
    323     num_batches = 0
--> 324     for batch_data, batch_label in generator(df, train_hashes, BATCH_SIZE, NUM_POINT, N_AUGMENTATIONS):
    325         num_batches += 1 * BATCH_SIZE
    326         if num_batches % 10 == 0:

C:\thjen\Lidar\pointnet-master\sem_seg\train_custom.py in generator(df, hashes, BATCH_SIZE, NUM_POINT, N_AUGMENTATIONS, shuffled)
    193 
    194     for batch in batches:
--> 195         df_batch = [random_sample_block(df.loc[h], seed) for s,h in batch]
    196         data = np.stack([b[data_channels].values for b in df_batch])
    197         label = np.stack([l.label.values for l in df_batch])

C:\thjen\Lidar\pointnet-master\sem_seg\train_custom.py in <listcomp>(.0)
    193
    194     for batch in batches:
--> 195         df_batch = [random_sample_block(df.loc[h], seed) for s,h in batch]
    196         data = np.stack([b[data_channels].values for b in df_batch])
    197         label = np.stack([l.label.values for l in df_batch])

~\AppData\Local\Continuum\anaconda3\envs\tf1\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key)
   1422 
   1423             maybe_callable = com.apply_if_callable(key, self.obj)
-> 1424             return self._getitem_axis(maybe_callable, axis=axis)
   1425 
   1426     def _is_scalar_access(self, key: Tuple):

~\AppData\Local\Continuum\anaconda3\envs\tf1\lib\site-packages\pandas\core\indexing.py in _getitem_axis(self, key, axis)
   1837                     raise ValueError("Cannot index with multidimensional key")
   1838 
-> 1839                 return self._getitem_iterable(key, axis=axis)
   1840 
   1841             # nested tuple slicing

~\AppData\Local\Continuum\anaconda3\envs\tf1\lib\site-packages\pandas\core\indexing.py in _getitem_iterable(self, key, axis)
   1131         else:
   1132             # A collection of keys
-> 1133             keyarr, indexer = self._get_listlike_indexer(key, axis, raise_missing=False)
   1134             return self.obj._reindex_with_indexers(
   1135                 {axis: [keyarr, indexer]}, copy=True, allow_dups=True

~\AppData\Local\Continuum\anaconda3\envs\tf1\lib\site-packages\pandas\core\indexing.py in _get_listlike_indexer(self, key, axis, raise_missing)
   1090 
   1091         self._validate_read_indexer(
-> 1092             keyarr, indexer, o._get_axis_number(axis), raise_missing=raise_missing
   1093         )
   1094         return keyarr, indexer

~\AppData\Local\Continuum\anaconda3\envs\tf1\lib\site-packages\pandas\core\indexing.py in _validate_read_indexer(self, key, indexer, axis, raise_missing)
   1175                 raise KeyError(
   1176                     "None of [{key}] are in the [{axis}]".format(
-> 1177                         key=key, axis=self.obj._get_axis_name(axis)
   1178                     )
   1179                 )

KeyError: "None of [Int64Index([142, 14277], dtype='int64', name='hash')] are in the [index]"

If you have any suggestions how I might fix this problem with missing index values, I'd be very interested in hearing them.

@davdmaccartney
Copy link

Hi, i don't use df.loc anymore :
df_batch =[random_sample_block(df[df['hash']==h],s) for s,h in batch]

@Tojens
Copy link
Contributor Author

Tojens commented Nov 20, 2019

Hi David,

Thank you very much!

I tried replacing the existing with your suggestion, but df['hash']==h must match in length for it to not return a ValueError like this and the lengths of the two variables are pretty far apart.

I've also printed the lengths of both 'h' and df['hash'] at the start of the code block.

All suggestions are appreciated :)


[30793 rows x 8 columns]
Tensor("fc2/Relu:0", shape=(2, 128), dtype=float32, device=/device:GPU:0)
**** EPOCH 000 ****
----


length of h: 2
length of df[hash]: 30793





---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
C:\thjen\Lidar\pointnet-master\sem_seg\train_custom.py in <module>
    388 
    389 if __name__ == "__main__":
--> 390     train()
    391     LOG_FOUT.close()
    392     del df

C:\thjen\Lidar\pointnet-master\sem_seg\train_custom.py in train()
    304             sys.stdout.flush()
    305 
--> 306             train_one_epoch(sess, ops, train_writer)
    307             eval_one_epoch(sess, ops, test_writer)
    308 

C:\thjen\Lidar\pointnet-master\sem_seg\train_custom.py in train_one_epoch(sess, ops, train_writer)
    322 
    323     num_batches = 0
--> 324     for batch_data, batch_label in generator(df, train_hashes, BATCH_SIZE, NUM_POINT, N_AUGMENTATIONS):
    325         num_batches += 1 * BATCH_SIZE
    326         if num_batches % 10 == 0:

C:\thjen\Lidar\pointnet-master\sem_seg\train_custom.py in generator(df, hashes, BATCH_SIZE, NUM_POINT, N_AUGMENTATIONS, shuffled)
    193         print('length of h:', len(h))
    194         print('length of df[hash]:', len(df['hash']))
--> 195         df_batch = [random_sample_block(df[df['hash']==h], s) for s, h in batch]
    196         data = np.stack([b[data_channels].values for b in df_batch])
    197         label = np.stack([l.label.values for l in df_batch])

C:\thjen\Lidar\pointnet-master\sem_seg\train_custom.py in <listcomp>(.0)
    193         print('length of h:', len(h))
    194         print('length of df[hash]:', len(df['hash']))
--> 195         df_batch = [random_sample_block(df[df['hash']==h], s) for s, h in batch]
    196         data = np.stack([b[data_channels].values for b in df_batch])
    197         label = np.stack([l.label.values for l in df_batch])

~\AppData\Local\Continuum\anaconda3\envs\tf1\lib\site-packages\pandas\core\ops\__init__.py in wrapper(self, other, axis)
   1205             # as it will broadcast
   1206             if other.ndim != 0 and len(self) != len(other):
-> 1207                 raise ValueError("Lengths must match to compare")
   1208 
   1209             res_values = na_op(self.values, np.asarray(other))

ValueError: Lengths must match to compare

@davdmaccartney
Copy link

well as I changed quite a lot the original code it might be normal, contact me by email
[email protected]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants