forked from nmslib/hnswlib
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[L2Space] Perf improvement for dimension not of factor 4 and 16
Currently SIMD (SSE or AVX) is used for the cases when dimension is multiple of 4 or 16, when dimension size is not strictly equal to multiple of 4 or 16 a slower non-vectorized method is used. To improve performance for these cases new methods are added: `L2SqrSIMD(4|16)ExtResidual` - relies on existing `L2SqrSIMD(4|16)Ext` to compute up to *4 and *16 dimensions and finishes residual computation by method `L2Sqr`. Performance improvement compared to baseline is x3-4 times depending on dimension. Benchmark results: Run on (4 X 3300 MHz CPU s) CPU Caches: L1 Data 32 KiB (x2) L1 Instruction 32 KiB (x2) L2 Unified 256 KiB (x2) L3 Unified 4096 KiB (x1) Load Average: 2.18, 2.35, 3.88 ----------------------------------------------------------- Benchmark Time CPU Iterations ----------------------------------------------------------- TstDim65 14.7 ns 14.7 ns 20 * 47128209 RefDim65 50.2 ns 50.1 ns 20 * 10373751 TstDim101 24.7 ns 24.7 ns 20 * 28064436 RefDim101 90.4 ns 90.2 ns 20 * 7592191 TstDim129 31.4 ns 31.3 ns 20 * 22397921 RefDim129 125 ns 124 ns 20 * 5548862 TstDim257 59.3 ns 59.2 ns 20 * 10856753 RefDim257 266 ns 266 ns 20 * 2630926
- Loading branch information
Showing
1 changed file
with
50 additions
and
29 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters