Lists (1)
Sort Name ascending (A-Z)
Stars
Collection of tools and examples for managing Accelerated workloads in Kubernetes Engine
Examples demonstrating available options to program multiple GPUs in a single node or a cluster
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…
NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the effective training time by minimizing the downtime due to fa…
Small scale distributed training of sequential deep learning models, built on Numpy and MPI.
A library to analyze PyTorch traces.
Ongoing research training transformer models at scale
alibaba / Megatron-LLaMA
Forked from NVIDIA/Megatron-LMBest practice for training LLaMA models in Megatron-LM
Training NVIDIA NeMo Megatron Large Language Model (LLM) using NeMo Framework on Google Kubernetes Engine
PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"
PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538
how to optimize some algorithm in cuda.
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
A multi-platform experimentation framework written in python.
Testing framework for Deep Learning models (Tensorflow and PyTorch) on Google Cloud hardware accelerators (TPU and GPU)
Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax
Neighborhood Attention Extension. Bringing attention to a neighborhood near you!
Technical content from the Spacelift blog articles.
Implementation of Flash Attention in Jax
A toolkit to run Ray applications on Kubernetes