Skip to content

Commit

Permalink
Adding vectorized support for matrix multiply
Browse files Browse the repository at this point in the history
Now only 16x slower than numpy.matmul() for a 1000x1000 matrix..! If you
use 6 threads then it gets to about 5x slower, which isn't bad.

For reference, the non-vectorized version is about 32x slower for the
single thread case.
  • Loading branch information
iamsrp-deshaw committed Oct 14, 2024
1 parent f59d191 commit 6696b5a
Show file tree
Hide file tree
Showing 4 changed files with 2,991 additions and 596 deletions.
Loading

0 comments on commit 6696b5a

Please sign in to comment.