Skip to content

Commit

Permalink
ggml : support AVX512VNNI (ggerganov#6280)
Browse files Browse the repository at this point in the history
This change causes some quants (e.g. Q4_0, Q8_0) to go faster on some
architectures (e.g. AMD Zen 4).
  • Loading branch information
jart authored Mar 25, 2024
1 parent a32b77c commit 7733f0c
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion ggml-quants.c
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,7 @@ static inline __m256 sum_i16_pairs_float(const __m256i x) {
}

static inline __m256 mul_sum_us8_pairs_float(const __m256i ax, const __m256i sy) {
#if __AVXVNNI__
#if defined(__AVXVNNI__) || defined(__AVX512VNNI__)
const __m256i zero = _mm256_setzero_si256();
const __m256i summed_pairs = _mm256_dpbusd_epi32(zero, ax, sy);
return _mm256_cvtepi32_ps(summed_pairs);
Expand Down

0 comments on commit 7733f0c

Please sign in to comment.