Torchrec uses fbgemm_gpu embedding and embedding bag implementations for Fused, Batched, Quantized versions of embedding and embeddingbag (in addition to other kernels).
They have run benchmarks on FusedEmbeddingBagCollection, which is implemented with fbgemm_gpu's SplitTableBatchedEmbeddingBagsCodegen
. They benchmark utilizing UVM and UVM-caching.
The results show between 13x and 23x usecase in DLRM embedding sizes.
bench
Folders and files
Name | Name | Last commit date | ||
---|---|---|---|---|
parent directory.. | ||||