Adding vectorized support for matrix multiply · deshaw/pjrmi@6696b5a · GitHub

Commit

Adding vectorized support for matrix multiply

Browse files

Now only 16x slower than numpy.matmul() for a 1000x1000 matrix..! If you
use 6 threads then it gets to about 5x slower, which isn't bad.

For reference, the non-vectorized version is about 32x slower for the
single thread case.

Loading branch information

iamsrp-deshaw committed Oct 14, 2024

1 parent f59d191 commit 6696b5a

0 comments on commit `6696b5a`

Please sign in to comment.