lzzmm

🤯

CHEN Yuhan lzzmm

🤯

ML System, GPU Computing @HPMLL

36 followers · 48 following

HKUST(Guangzhou)
05:39 (UTC +08:00)
https://lzzmm.github.io

Achievements

x2 x2

Achievements

x2 x2

Highlights

Organizations

Lists (3)

Sort

🔮 Future ideas

✨ Inspiration

🚀 My stack

Stars

mit-han-lab / duo-attention

DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

Python 399 21 Updated Oct 31, 2024

openai / swarm

Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.

Python 16,866 1,709 Updated Oct 15, 2024

jaywcjlove / awesome-mac

 Now we have become very big, Different from the original idea. Collect premium software in various categories.

JavaScript 77,870 6,286 Updated Dec 12, 2024

varunshenoy / super-json-mode

Low latency JSON generation using LLMs ⚡️

Jupyter Notebook 387 14 Updated Mar 10, 2024

NVIDIA / nvbench

CUDA Kernel Benchmarking Library

Cuda 530 66 Updated Nov 20, 2024

efeslab / Nanoflow

A throughput-oriented high-performance serving framework for LLMs

Cuda 654 26 Updated Sep 21, 2024

Dicklesworthstone / llm_aided_ocr

Enhance Tesseract OCR output for scanned PDFs by applying Large Language Model (LLM) corrections.

Python 2,233 153 Updated Aug 21, 2024

NVIDIA / TensorRT-Incubator

Experimental projects related to TensorRT

MLIR 82 12 Updated Dec 13, 2024

Bin-Huang / chatbox

User-friendly Desktop Client App for AI Models/LLMs (GPT, Claude, Gemini, Ollama...)

TypeScript 23,720 2,348 Updated Dec 12, 2024

tspeterkim / flash-attention-minimal

Flash Attention in ~100 lines of CUDA (forward pass only)

Cuda 651 58 Updated Apr 7, 2024

NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 8,890 1,024 Updated Dec 11, 2024

OpenBMB / MiniCPM-V

MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone

Python 12,849 901 Updated Oct 22, 2024

AnswerDotAI / gpu.cpp

A lightweight library for portable low-level GPU computation using WebGPU.

C++ 3,770 177 Updated Nov 18, 2024

daadaada / turingas

Assembler for NVIDIA Volta and Turing GPUs

Python 203 40 Updated Jan 13, 2022

microsoft / MInference

[NeurIPS'24 Spotlight] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 whil…

Python 832 38 Updated Dec 13, 2024