Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use hardware performance counters instead of cachegrind #11

Open
asomers opened this issue Feb 3, 2021 · 3 comments
Open

Use hardware performance counters instead of cachegrind #11

asomers opened this issue Feb 3, 2021 · 3 comments

Comments

@asomers
Copy link
Contributor

asomers commented Feb 3, 2021

Iai is very exciting! I love the idea of benchmarks that are fast and deterministic. But relying on Cachegrind has some drawbacks:

  • Limited OS support
  • Requires the user to install valgrind
  • Executing binaries is slow
  • Valgrind alters the program's normal execution. This reduces its accuracy, and leads to bugs like failed to allocate a guard page on FreeBSD #8

Modern CPUs contain hardware performance counters that can be used for nearly zero-cost profiling. Using those instead of Iai would have several benefits:

  • No dependency on Valgrind
  • Much faster to execute
  • The counters can be paused and restarted mid-process. This would allow Iai to skip setup and teardown sections as requested in Exclude setup/teardown code from measurements #7 .
  • Wider OS support
  • More accurate and detailed reports.

On FreeBSD, pmc(3) provides access to the counters, and there is already a nascent Rust crate for them: pmc-rs. On Linux, I think the perfcnt and perf crates provide the same functionality.

@shepmaster
Copy link

I think that https://github.com/jbreitbart/criterion-perf-events is an attempt to do that.

@asomers
Copy link
Contributor Author

asomers commented Feb 14, 2021

cool! Thanks for the tip.

@bheisler
Copy link
Owner

Yes, if that's what you want I would recommend using the criterion-perf-events plugin.

Cachegrind is used specifically for its emulation of the memory hierarchy. Because we can control the parameters of that emulation, Iai can take measurements under cachegrind that should be far more repeatable and consistent between machines than are possible even with performance counters. Hardware performance counters will naturally be different between different hardware.

In addition, under virtualization it's common for access to the performance counters of the underlying hardware to be disabled, so it's not as if that approach is without drawback either. I know this is the case, because the VM I do my work in at my day job has its performance counters disabled for mysterious IT-department reasons.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants