Skip to content
View yejingxin's full-sized avatar

Block or report yejingxin

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Collection of tools and examples for managing Accelerated workloads in Kubernetes Engine

Go 218 154 Updated Dec 12, 2024

Examples demonstrating available options to program multiple GPUs in a single node or a cluster

Cuda 573 112 Updated Oct 30, 2024

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…

Python 2,015 334 Updated Dec 14, 2024
Python 2 Updated Nov 4, 2024

A PyTorch Native LLM Training Framework

Python 678 34 Updated Aug 25, 2024

NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the effective training time by minimizing the downtime due to fa…

Python 53 4 Updated Nov 27, 2024

Small scale distributed training of sequential deep learning models, built on Numpy and MPI.

Python 109 4 Updated Oct 19, 2023

A library to analyze PyTorch traces.

Python 312 45 Updated Dec 3, 2024

Ongoing research training transformer models at scale

Python 10,807 2,416 Updated Dec 14, 2024

Best practice for training LLaMA models in Megatron-LM

Python 634 53 Updated Jan 2, 2024

Training NVIDIA NeMo Megatron Large Language Model (LLM) using NeMo Framework on Google Kubernetes Engine

HCL 12 5 Updated Nov 19, 2024

PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"

Python 42 15 Updated Dec 14, 2024

Microsoft Collective Communication Library

C++ 325 30 Updated Sep 20, 2023

PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538

Python 1,007 105 Updated Apr 19, 2024

how to optimize some algorithm in cuda.

Cuda 1,726 142 Updated Dec 12, 2024

Material for gpu-mode lectures

Jupyter Notebook 3,170 325 Updated Dec 3, 2024

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries

Python 6,993 1,022 Updated Dec 10, 2024

Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.

Jupyter Notebook 39,976 4,240 Updated Jul 28, 2024
Python 52 7 Updated Apr 23, 2024

A multi-platform experimentation framework written in python.

Python 42 28 Updated Dec 13, 2024

Testing framework for Deep Learning models (Tensorflow and PyTorch) on Google Cloud hardware accelerators (TPU and GPU)

Jsonnet 64 59 Updated Nov 22, 2024

JAX-Toolbox

Jupyter Notebook 268 50 Updated Dec 14, 2024
Python 1,225 175 Updated Nov 20, 2024

Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax

Python 526 85 Updated Dec 12, 2024

Neighborhood Attention Extension. Bringing attention to a neighborhood near you!

Cuda 382 31 Updated Dec 2, 2024

Technical content from the Spacelift blog articles.

HCL 51 50 Updated Oct 18, 2023

My personal blog

CSS 5 3 Updated Nov 24, 2024

Implementation of Flash Attention in Jax

Python 201 23 Updated Mar 1, 2024

A toolkit to run Ray applications on Kubernetes

Go 1,330 421 Updated Dec 14, 2024
Next