Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
dumerrill authored Dec 6, 2017
1 parent 57747e3 commit e2bf51c
Showing 1 changed file with 4 additions and 1 deletion.
5 changes: 4 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,9 @@ point (FP64) types. Furthermore, CUTLASS demonstrates CUDA's WMMA API for targe
the programmable, high-throughput _Tensor Cores_ provided by NVIDIA's Volta architecture
and beyond.

For more exposition, see our Parallel Forall blog post ["CUTLASS: Fast Linear Algebra
in CUDA C++"](https://devblogs.nvidia.com/parallelforall/cutlass-linear-algebra-cuda).

# Project Structure

CUTLASS is arranged as a header-only library with several example test programs
Expand Down Expand Up @@ -56,7 +59,7 @@ transposititions. Be sure to specify your target architecture.

<s|d|h|i|w>gemm_<nn|nt|tn|tt>
[--help]
[--schmoo || --m=<height> --n=<width> --k=<depth>]
[--schmoo=<#schmoo-samples> || --m=<height> --n=<width> --k=<depth>]
[--i=<timing iterations>]
[--device=<device-id>]
[--alpha=<alpha> --beta=<beta>]
Expand Down

0 comments on commit e2bf51c

Please sign in to comment.