Starred repositories
⭐ 【开源书籍】深入讲解内核网络、Kubernetes、ServiceMesh、容器等云原生相关技术。经历实践检验的 DevOps、SRE指南。如发现错误,谢谢提issue
Example models using DeepSpeed
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
A Communications Tool for Persistent Collectives
Unit test for measuring system bandwidth.
A Micro-benchmarking Tool for HPC Networks
A hierarchical collective communications library with portable optimizations
uds-proxy provides a UNIX domain socket that acts as HTTP(S) connection-pooling forward proxy
Multi-Path Transport for RDMA in Datacenters (Course assignment)
Simulation of Multi-Path-RDMA algorithm based on ns-3
DCPerf benchmark suite for hyperscale cloud applications
Qwen2.5-Coder is the code version of Qwen2.5, the large language model series developed by Qwen team, Alibaba Cloud.
Pytorch process group third-party plugin for UCC
Collective communications library with various primitives for multi-machine training.
The Cloud-Native API Gateway and AI Gateway
A tool for bandwidth measurements on NVIDIA GPUs.
Automated machine learning as an AI-HPC benchmark
FastGPT is a knowledge-based platform built on the LLMs, offers a comprehensive suite of out-of-the-box capabilities such as data processing, RAG retrieval, and visual AI workflow orchestration, le…
The production-scale datacenter profiler (C/C++, Go, Rust, Python, Java, NodeJS, .NET, PHP, Ruby, Perl, ...)