Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexError: tuple index out of range (when using dask_apply) #181

Closed
CoteDave opened this issue May 13, 2022 · 1 comment
Closed

IndexError: tuple index out of range (when using dask_apply) #181

CoteDave opened this issue May 13, 2022 · 1 comment

Comments

@CoteDave
Copy link

CoteDave commented May 13, 2022

Hi,

swifter version: 1.1.3
dask version: 2022.05.0
pandas version: 1.4.2
python version: 3.9.12

When I use dataframe[col].apply(func) it does work.

When I use dataframe[col].swifter.allow_dask_on_strings(enable=True).apply(func) on SMALL sample (10), it use pandas apply under the hood and it works.

When I use dataframe[col].swifter.allow_dask_on_strings(enable=True).apply(func) on BIGGER sample (1000), it use dask apply under the hood and it does NOT works. Seems to have a problem when switching to dask apply. Here is the complete error:

Traceback (most recent call last):
File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/swifter/swifter.py", line 241, in apply
self._validate_apply(
File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/swifter/base.py", line 50, in _validate_apply
raise ValueError(error_message)
ValueError: Vectorized function sample doesn't match pandas apply sample.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "", line 1, in
File "", line 47, in transform
File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/swifter/swifter.py", line 255, in apply
return self._dask_apply(func, convert_dtype, *args, **kwds)
File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/swifter/swifter.py", line 173, in _dask_apply
dd.from_pandas(sample, npartitions=self._npartitions)
File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/dask/base.py", line 292, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/dask/base.py", line 576, in compute
return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/dask/base.py", line 576, in
return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/dask/dataframe/core.py", line 129, in finalize
return _concat(results)
File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/dask/dataframe/core.py", line 110, in _concat
return da.core.concatenate3(args)
File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/dask/array/core.py", line 5124, in concatenate3
chunks = chunks_from_arrays(arrays)
File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/dask/array/core.py", line 4911, in chunks_from_arrays
result.append(tuple(shape(deepfirst(a))[dim] for a in arrays))
File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/dask/array/core.py", line 4911, in
result.append(tuple(shape(deepfirst(a))[dim] for a in arrays))
IndexError: tuple index out of range

@jmcarpenter2
Copy link
Owner

Hi @CoteDave , any chance you could provide the version of swifter you are using and a snippet of the function you are trying to apply? That will help me debug

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants