Stars
Measures the latency between CPU cores
[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl
NVIDIA Linux open GPU kernel module source
Fairring (FAIR + Herring) is a plug-in for PyTorch that provides a process group for distributed training that outperforms NCCL at large scales
Pytorch process group third-party plugin for UCC
PArametrized Recommendation and Ai Model benchmark is a repository for development of numerous uBenchmarks as well as end to end nets for evaluation of training and inference platforms.
RDMA and SHARP plugins for nccl library
DASH, the C++ Template Library for Distributed Data Structures with Support for Hierarchical Locality for HPC and Data-Driven Science
A General-purpose Task-parallel Programming System using Modern C++
Unified Communication X (mailing list - https://elist.ornl.gov/mailman/listinfo/ucx-group)