Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Avoid reduce_once in scalar_centered_binomial_distribution_eta_2_with…
…_prf The comment that a barrier-less reduce_once was safe turned out to not *quite* be true. In Clang configurations without auto-vectorization (notably -O1), Clang would emit a branch instead of a CMOV. Unfortunately, adding a barrier to reduce_once has significant performance costs. The problem seems to be that auto-vectorization breaks. I suspect it is primarily because the value barrier forces the value into a general-purpose register, while vectorized code puts it straight into a SIMD register. Though knowing the comparison is a comparison seems to also help a bit. Based on what we've understood of Clang's select transforms thus far, it would make sense that ML-KEM might not need the barrier. The main culprit is turning multiple selects with the same condition into a branch, and that does not happen in ML-KEM. Yet we observe a problem. Based on valgrind instrumentation, the problem seems to be limited to scalar_centered_binomial_distribution_eta_2_with_prf, likely because the value has such a limited range of values. For some reason, this causes many recent versions of Clang to emit a branch. I think this may actually be a misoptimization. Indeed the very latest trunk build of Clang on godbolt does not have this problem. Somewhere between 8cb44859cc31929521c09fc6a8add66d53db44de and 8daf4f16fa08b5d876e98108721dd1743a360326, LLVM seems to have fixed this issue. We can avoid this by computing it differently. We currently write reduce_once(kPrime + a + b - (c + d)), where a through d are 0 or 1. Instead, we can write a + b - (c + d), let the underflow happen, and then conditionally add kPrime based on the sign bit of the result. This seems to avoid mishaps, for now. If this breaks down again, we may need to get better value barriers, or to stop relying on auto-vectorization and vectorize ourselves. Change-Id: I917456348d63628880467d21138a57297532bc9a Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/74447 Auto-Submit: David Benjamin <[email protected]> Reviewed-by: Adam Langley <[email protected]> Commit-Queue: David Benjamin <[email protected]>
- Loading branch information