You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I use dataframe[col].apply(func) it does work.
When I use dataframe[col].swifter.allow_dask_on_strings(enable=True).apply(func) on SMALL sample (10), it use pandas apply under the hood and it works.
When I use dataframe[col].swifter.allow_dask_on_strings(enable=True).apply(func) on BIGGER sample (1000), it use dask apply under the hood and it does NOT works. Seems to have a problem when switching to dask apply. Here is the complete error:
Traceback (most recent call last):
File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/swifter/swifter.py", line 241, in apply
self._validate_apply(
File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/swifter/base.py", line 50, in _validate_apply
raise ValueError(error_message)
ValueError: Vectorized function sample doesn't match pandas apply sample.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "", line 1, in
File "", line 47, in transform
File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/swifter/swifter.py", line 255, in apply
return self._dask_apply(func, convert_dtype, *args, **kwds)
File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/swifter/swifter.py", line 173, in _dask_apply
dd.from_pandas(sample, npartitions=self._npartitions)
File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/dask/base.py", line 292, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/dask/base.py", line 576, in compute
return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/dask/base.py", line 576, in
return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/dask/dataframe/core.py", line 129, in finalize
return _concat(results)
File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/dask/dataframe/core.py", line 110, in _concat
return da.core.concatenate3(args)
File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/dask/array/core.py", line 5124, in concatenate3
chunks = chunks_from_arrays(arrays)
File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/dask/array/core.py", line 4911, in chunks_from_arrays
result.append(tuple(shape(deepfirst(a))[dim] for a in arrays))
File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/dask/array/core.py", line 4911, in
result.append(tuple(shape(deepfirst(a))[dim] for a in arrays))
IndexError: tuple index out of range
The text was updated successfully, but these errors were encountered:
Hi @CoteDave , any chance you could provide the version of swifter you are using and a snippet of the function you are trying to apply? That will help me debug
Hi,
swifter version: 1.1.3
dask version: 2022.05.0
pandas version: 1.4.2
python version: 3.9.12
When I use dataframe[col].apply(func) it does work.
When I use dataframe[col].swifter.allow_dask_on_strings(enable=True).apply(func) on SMALL sample (10), it use pandas apply under the hood and it works.
When I use dataframe[col].swifter.allow_dask_on_strings(enable=True).apply(func) on BIGGER sample (1000), it use dask apply under the hood and it does NOT works. Seems to have a problem when switching to dask apply. Here is the complete error:
Traceback (most recent call last):
File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/swifter/swifter.py", line 241, in apply
self._validate_apply(
File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/swifter/base.py", line 50, in _validate_apply
raise ValueError(error_message)
ValueError: Vectorized function sample doesn't match pandas apply sample.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "", line 1, in
File "", line 47, in transform
File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/swifter/swifter.py", line 255, in apply
return self._dask_apply(func, convert_dtype, *args, **kwds)
File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/swifter/swifter.py", line 173, in _dask_apply
dd.from_pandas(sample, npartitions=self._npartitions)
File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/dask/base.py", line 292, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/dask/base.py", line 576, in compute
return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/dask/base.py", line 576, in
return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/dask/dataframe/core.py", line 129, in finalize
return _concat(results)
File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/dask/dataframe/core.py", line 110, in _concat
return da.core.concatenate3(args)
File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/dask/array/core.py", line 5124, in concatenate3
chunks = chunks_from_arrays(arrays)
File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/dask/array/core.py", line 4911, in chunks_from_arrays
result.append(tuple(shape(deepfirst(a))[dim] for a in arrays))
File "/opt/continuum/.conda/envs/nlpbeneva/lib/python3.9/site-packages/dask/array/core.py", line 4911, in
result.append(tuple(shape(deepfirst(a))[dim] for a in arrays))
IndexError: tuple index out of range
The text was updated successfully, but these errors were encountered: