Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Adding vectorized support for matrix multiply
Now only 16x slower than numpy.matmul() for a 1000x1000 matrix..! If you use 6 threads then it gets to about 5x slower, which isn't bad. For reference, the non-vectorized version is about 32x slower for the single thread case.
- Loading branch information