Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cluster to LaunchConfig to support thread block clusters on Hopper #261

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

leofang
Copy link
Member

@leofang leofang commented Dec 3, 2024

Close #204.

@leofang leofang added enhancement Any code-related improvements P1 Medium priority - Should do cuda.core Everything related to the cuda.core module labels Dec 3, 2024
@leofang leofang added this to the cuda.core beta 2 milestone Dec 3, 2024
@leofang leofang requested a review from vzhurba01 December 3, 2024 05:18
@leofang leofang self-assigned this Dec 3, 2024
Copy link

copy-pr-bot bot commented Dec 3, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@leofang
Copy link
Member Author

leofang commented Dec 3, 2024

(Merge after #247; only commits starting 5d253f1 are new.)

@leofang leofang marked this pull request as ready for review December 4, 2024 03:23
@leofang
Copy link
Member Author

leofang commented Dec 8, 2024

/ok to test

@leofang
Copy link
Member Author

leofang commented Dec 8, 2024

@vzhurba01 this is ready for review

if not _use_ex:
raise CUDAError("thread block clusters require cuda.bindings & driver 11.8+")
if Device().compute_capability < (9, 0):
raise CUDAError("thread block clusters are not supported below Hopper")
Copy link
Contributor

@ksimpson-work ksimpson-work Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally think this message should refer to compute capability rather than archictecture name chronology. ie "thread block clusters require compute capability >=9.0" That is subjective though, I think if we leave it as hopper we should add "(compute capability >= 9.0)"

dev = Device()
arch = dev.compute_capability
if arch < (9, 0):
print("this demo requires a Hopper GPU (since thread block cluster is a hardware feature)", file=sys.stderr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on the cuda programming guide it indicates that thread block clusters were introduced in cc 9.0 . This seems to indicate that they are exclusive to Hopper. If they are not exclusive to hopper I think we should refer to Compute Capability, or atleast include it and say "Hopper or newer". Personal preference would be >= 9.0

Copy link
Contributor

@ksimpson-work ksimpson-work left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit / comment about the documentation, but otherwise lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda.core Everything related to the cuda.core module enhancement Any code-related improvements P1 Medium priority - Should do
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support launching thread block clusters
3 participants