-
Nanyang Technological University
- Singapore
-
05:41
(UTC -12:00) - cuhkszzxy.github.io
Lists (10)
Sort Name ascending (A-Z)
Stars
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).
Disaggregated serving system for Large Language Models (LLMs).
A generative world for general-purpose robotics & embodied AI learning.
The benchmark of SOTA text-to-image diffusion models with a new benchmarking strategy based on MiniGPT-4, namely X-IQE.
[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ult…
Official release of InternLM series (InternLM, InternLM2, InternLM2.5, InternLM3).
Resource-adaptive cluster scheduler for deep learning training.
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
A LaTeX resume template designed for optimal information density and aesthetic appeal.
📄 适合中文的简历模板收集(LaTeX,HTML/JS and so on)由 @hoochanlon 维护
KV cache compression for high-throughput LLM inference
[ICML‘2024] "LoCoCo: Dropping In Convolutions for Long Context Compression", Ruisi Cai, Yuandong Tian, Zhangyang Wang, Beidi Chen
Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of GPT-Fast, a simple, PyTorch-native generation codebase.
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
Triton-based implementation of Sparse Mixture of Experts.
Official PyTorch implementation of FlatQuant: Flatness Matters for LLM Quantization
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
The Official Implementation of Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference
All Algorithms implemented in Python
Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.