-
Zhejiang University
- Hangzhou, Zhejiang, China
-
00:53
(UTC +08:00) - https://jingyangxiang.github.io/
Lists (3)
Sort Name ascending (A-Z)
Stars
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.
Implementation of "Attention Is Off By One" by Evan Miller
Extend existing LLMs way beyond the original training length with constant memory usage, without retraining
[NeurIPS 2024 Oral][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-sim…
Dataset for the paper "HVAQ: A High-Resolution Vision-Based Air Quality Dataset"
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
[ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
Optimize softmax in triton in many cases
Triton Documentation in Chinese Simplified / Triton 中文文档
DFRot: Achieving Outlier-Free and Massive Activation-Free for Rotated LLMs with Refined Rotation
Official PyTorch implementation of FlatQuant: Flatness Matters for LLM Quantization
[NeurIPS 2024] Dual-Perspective Activation: Efficient Channel Denoising via Joint Forward-Backward Criterion for Artificial Neural Networks
[CVPR2024] The code for "MGMap: Mask-Guided Learning for Online Vectorized HD Map Construction"
Refine high-quality datasets and visual AI models
基于 ChatGPT API 的划词翻译浏览器插件和跨平台桌面端应用 - Browser extension and cross-platform desktop application for translation based on ChatGPT API.
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V…
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
SGLang is a fast serving framework for large language models and vision language models.
A series of large language models trained from scratch by developers @01-ai
MiniCPM3-4B: An edge-side LLM that surpasses GPT-3.5-Turbo.
A high-throughput and memory-efficient inference and serving engine for LLMs
Code accompanying the paper "Massive Activations in Large Language Models"
clash for windows汉化版. 提供clash for windows的汉化版, 汉化补丁及汉化版安装程序
Command-line tool to inspect the difference between (the text in) two PDF files