Skip to content
View zhangmenghao's full-sized avatar

Block or report zhangmenghao

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

CUDA checkpoint and restore utility

Cuda 247 13 Updated Apr 17, 2024

ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale

C++ 292 119 Updated Nov 24, 2024

Lumina is a user-friendly tool to test the correctness and performance of hardware network stacks.

Python 19 6 Updated Jan 8, 2024

Benchmark Test Suite for RDMA Networks

C++ 50 4 Updated Apr 15, 2023

Checkpoint/Restore tool

C 3,019 605 Updated Dec 15, 2024

Initializer for KServe Cluster

Shell 1 1 Updated Jul 29, 2024

P4 codes for research projects

P4 207 56 Updated Nov 3, 2024

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Python 136,594 27,349 Updated Dec 21, 2024

Large Language Model (LLM) Systems Paper List

687 25 Updated Dec 13, 2024

PArametrized Recommendation and Ai Model benchmark is a repository for development of numerous uBenchmarks as well as end to end nets for evaluation of training and inference platforms.

Python 126 63 Updated Dec 18, 2024

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 12,495 2,573 Updated Dec 22, 2024

Zeta is a distributed platform for developing and deploying complex, elastic, and highly available multi-tenant network services.

C 18 10 Updated Mar 31, 2023

nsfc - 国家自然科学基金项目LaTeX模版(面青地)

TeX 232 65 Updated Dec 13, 2024

NCCL Profiling Kit

Python 120 12 Updated Jul 1, 2024

Microsoft Collective Communication Library

57 6 Updated Nov 23, 2024

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.

LLVM 29,763 12,287 Updated Dec 22, 2024

NVIDIA Linux open GPU kernel module source

C 15,326 1,304 Updated Dec 17, 2024

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 8,941 1,032 Updated Dec 17, 2024

Transformer related optimization, including BERT, GPT

C++ 5,938 896 Updated Mar 27, 2024

eBPF implementation that runs on top of Windows

C 2,975 241 Updated Dec 21, 2024

A series of large language models developed by Baichuan Intelligent Technology

Python 4,114 297 Updated Nov 8, 2024

A platform for building proxies to bypass network restrictions.

Go 29,939 4,691 Updated Dec 18, 2024

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 85,187 22,946 Updated Dec 22, 2024

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 35,951 4,169 Updated Dec 20, 2024

NCCL Tests

Cuda 939 249 Updated Dec 19, 2024

Dynolog is a telemetry daemon for performance monitoring and tracing. It exports metrics from different components in the system like the linux kernel, CPU, disks, Intel PT, GPUs etc. Dynolog also …

C++ 280 43 Updated Dec 19, 2024

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

C 6,623 1,870 Updated Jul 26, 2024

Inference code for Llama models

Python 56,908 9,621 Updated Aug 18, 2024

《Machine Learning Systems: Design and Implementation》- Chinese Version

TeX 4,139 440 Updated Apr 13, 2024
Next