Stars
📰 Must-read papers and blogs on Speculative Decoding ⚡️
PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538
A curated list for Efficient Large Language Models
Ongoing research training transformer models at scale
microsoft / Megatron-DeepSpeed
Forked from NVIDIA/Megatron-LMOngoing research training transformer language models at scale, including: BERT & GPT-2
Ongoing research training transformer language models at scale, including: BERT & GPT-2
Proper implementation of ResNet-s for CIFAR10/100 in pytorch that matches description of the original paper.
Accuracy 77%. Large batch deep learning optimizer LARS for ImageNet with PyTorch and ResNet, using Horovod for distribution. Optional accumulated gradient and NVIDIA DALI dataloader.
cvpr2024/cvpr2023/cvpr2022/cvpr2021/cvpr2020/cvpr2019/cvpr2018/cvpr2017 论文/代码/解读/直播合集,极市团队整理