lzzmm

🤯

CHEN Yuhan lzzmm

🤯

ML System, GPU Computing @HPMLL

37 followers · 49 following

HKUST(Guangzhou)
08:42 (UTC +08:00)
https://lzzmm.github.io

Achievements

x2 x2

Achievements

x2 x2

Highlights

Organizations

Lists (3)

Sort

🔮 Future ideas

✨ Inspiration

🚀 My stack

Stars

ai4nucleome / Polaris

A Versatile Tool for Chromatin Loop Annotation in Bulk and Single-cell Hi-C Data

Python 3 Updated Dec 27, 2024

gpu-mode / lectures

Material for gpu-mode lectures

Jupyter Notebook 3,387 343 Updated Dec 3, 2024

microsoft / chunk-attention

Python 56 7 Updated Dec 13, 2024

XiaoMi / ha_xiaomi_home

Xiaomi Home Integration for Home Assistant

Python 16,646 776 Updated Jan 3, 2025

mit-han-lab / duo-attention

DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

Python 412 24 Updated Oct 31, 2024

openai / swarm

Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.

Python 17,423 1,787 Updated Oct 15, 2024

jaywcjlove / awesome-mac

 Now we have become very big, Different from the original idea. Collect premium software in various categories.

JavaScript 78,563 6,323 Updated Jan 4, 2025

varunshenoy / super-json-mode

Low latency JSON generation using LLMs ⚡️

Jupyter Notebook 388 14 Updated Mar 10, 2024

NVIDIA / nvbench

CUDA Kernel Benchmarking Library

Cuda 541 69 Updated Nov 20, 2024

efeslab / Nanoflow

A throughput-oriented high-performance serving framework for LLMs

Cuda 683 29 Updated Sep 21, 2024

Dicklesworthstone / llm_aided_ocr

Enhance Tesseract OCR output for scanned PDFs by applying Large Language Model (LLM) corrections.

Python 2,275 158 Updated Aug 21, 2024

NVIDIA / TensorRT-Incubator

Experimental projects related to TensorRT

MLIR 84 13 Updated Jan 4, 2025

Bin-Huang / chatbox

User-friendly Desktop Client App for AI Models/LLMs (GPT, Claude, Gemini, Ollama...)

TypeScript 24,457 2,430 Updated Dec 30, 2024

tspeterkim / flash-attention-minimal

Flash Attention in ~100 lines of CUDA (forward pass only)

Cuda 672 58 Updated Dec 30, 2024

NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 9,073 1,045 Updated Jan 3, 2025

OpenBMB / MiniCPM-V

MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone

Python 12,999 910 Updated Oct 22, 2024

AnswerDotAI / gpu.cpp

A lightweight library for portable low-level GPU computation using WebGPU.

C++ 3,784 176 Updated Dec 29, 2024

daadaada / turingas

Assembler for NVIDIA Volta and Turing GPUs

Python 203 40 Updated Jan 13, 2022

microsoft / MInference

[NeurIPS'24 Spotlight] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 whil…

Python 861 39 Updated Dec 28, 2024