Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added -use for NONTEMPORAL #317

Merged
merged 1 commit into from
Dec 27, 2024
Merged

Added -use for NONTEMPORAL #317

merged 1 commit into from
Dec 27, 2024

Conversation

gwoltman
Copy link
Collaborator

I added an option to change non-temporal memory access from the command line. The default is off even though on Radeon VII it is about a 0.5% gaain. I suspect an A100 or recent AMD consumer cards with large caches will see bigger gains without nontemporal access. We could change the default setting to depend on the cache size reported by clinfo.
In either case, we should make -tune auto-config this option.

I added an option to change non-temporal memory access from the command line.
The default is off even though on Radeon VII it is about a 0.5% gaain.
I suspect an A100 or recent AMD consumer cards with large caches will see bigger
gains without nontemporal access.  We could change the default setting to depend
on the cache size reported by clinfo.
In either case, we should make -tune auto-config this option.
@preda preda merged commit 1582454 into preda:master Dec 27, 2024
7 checks passed
@preda
Copy link
Owner

preda commented Dec 27, 2024

Thanks! we should make it ON by default (but guard it with LLVM builtin checks: __has_builtin() )

@gwoltman
Copy link
Collaborator Author

gwoltman commented Dec 27, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants