IndexError: tuple index out of range (when using dask_apply) #181

CoteDave · 2022-05-13T20:08:12Z

Hi,

swifter version: 1.1.3
dask version: 2022.05.0
pandas version: 1.4.2
python version: 3.9.12

When I use dataframe[col].apply(func) it does work.

When I use dataframe[col].swifter.allow_dask_on_strings(enable=True).apply(func) on SMALL sample (10), it use pandas apply under the hood and it works.

When I use dataframe[col].swifter.allow_dask_on_strings(enable=True).apply(func) on BIGGER sample (1000), it use dask apply under the hood and it does NOT works. Seems to have a problem when switching to dask apply. Here is the complete error:

Traceback (most recent call last):
File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/swifter/swifter.py", line 241, in apply
self._validate_apply(
File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/swifter/base.py", line 50, in _validate_apply
raise ValueError(error_message)
ValueError: Vectorized function sample doesn't match pandas apply sample.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "", line 1, in
File "", line 47, in transform
File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/swifter/swifter.py", line 255, in apply
return self._dask_apply(func, convert_dtype, *args, **kwds)
File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/swifter/swifter.py", line 173, in _dask_apply
dd.from_pandas(sample, npartitions=self._npartitions)
File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/dask/base.py", line 292, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/dask/base.py", line 576, in compute
return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/dask/base.py", line 576, in
return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/dask/dataframe/core.py", line 129, in finalize
return _concat(results)
File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/dask/dataframe/core.py", line 110, in _concat
return da.core.concatenate3(args)
File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/dask/array/core.py", line 5124, in concatenate3
chunks = chunks_from_arrays(arrays)
File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/dask/array/core.py", line 4911, in chunks_from_arrays
result.append(tuple(shape(deepfirst(a))[dim] for a in arrays))
File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/dask/array/core.py", line 4911, in
result.append(tuple(shape(deepfirst(a))[dim] for a in arrays))
IndexError: tuple index out of range

jmcarpenter2 · 2022-07-22T22:31:35Z

Hi @CoteDave , any chance you could provide the version of swifter you are using and a snippet of the function you are trying to apply? That will help me debug

jmcarpenter2 closed this as completed Mar 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IndexError: tuple index out of range (when using dask_apply) #181

IndexError: tuple index out of range (when using dask_apply) #181

CoteDave commented May 13, 2022 •

edited

Loading

jmcarpenter2 commented Jul 22, 2022

IndexError: tuple index out of range (when using dask_apply) #181

IndexError: tuple index out of range (when using dask_apply) #181

Comments

CoteDave commented May 13, 2022 • edited Loading

jmcarpenter2 commented Jul 22, 2022

CoteDave commented May 13, 2022 •

edited

Loading