Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix overflow in ParallelFor(Long n, ...) (AMReX-Codes#3489)
## Summary When using the `void ParallelFor (T n, L&& f)` overload with T equal to `long long` or `unsigned long long`, an overflow could occur if `n` was larger than uint max. ## Additional background The overflow occurs in the Gpu part of the ParallelFor function itself when calculating `stride = blockDim.x*gridDim.x`. In this PR the execution config is changed to limit gridDim.x to prevent the overflow. ParallelFor will still iterate over all elements because the for loop inside the kernel will make a single thread iterate over multiple elements if gridDim.x needed to be reduced. Note: Some SYCL functions look to still be prone to overflow https://github.com/AMReX-Codes/amrex/blob/cc7a5c1171bb2634a9a71ce3c18cfd0be72f581c/Src/Base/AMReX_GpuLaunchFunctsG.H#L196 (should be at least unsigned int).
- Loading branch information