Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel may launch GPU kernel with no work #1733

Open
MrBurmark opened this issue Sep 16, 2024 · 0 comments
Open

Kernel may launch GPU kernel with no work #1733

MrBurmark opened this issue Sep 16, 2024 · 0 comments
Labels
bug performance reviewed Mark with this label when issue has been discussed by team

Comments

@MrBurmark
Copy link
Member

Describe the bug

Sometimes kernel will launch gpu kernels even though they have no work to do.
The gpu implementation of kernel tracks the number of blocks/threads requested and a minimum number of blocks/threads required in each dimension to choose appropriate numbers of blocks and threads. It uses 0 as the default number of blocks/threads and then only chooses not to launch a kernel if all dimensions are still 0 after calculating dimensions.

To Reproduce

Use kernel with a 2d loop over ranges of size 1 and 0 respectively.

Expected behavior

Kernel should not launch a kernel if any dimension in a gpu policy has no work.

Compilers & Libraries (please complete the following information):

  • CUDA/HIP version: any

Additional context

Discovered this when I saw segfaults while trying out new "unchecked" policies with 0 length list segments in the test-kernel-nested-loop-segments-HIP test.

@rhornung67 rhornung67 added bug performance reviewed Mark with this label when issue has been discussed by team labels Sep 17, 2024
@rhornung67 rhornung67 added this to the FY25 Development milestone Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug performance reviewed Mark with this label when issue has been discussed by team
Projects
None yet
Development

No branches or pull requests

2 participants