Skip to content
View lzzmm's full-sized avatar
🤯
🤯

Highlights

  • Pro

Organizations

@sysu @HPMLL

Block or report lzzmm

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A Versatile Tool for Chromatin Loop Annotation in Bulk and Single-cell Hi-C Data

Python 3 Updated Dec 27, 2024

Material for gpu-mode lectures

Jupyter Notebook 3,387 343 Updated Dec 3, 2024
Python 56 7 Updated Dec 13, 2024

Xiaomi Home Integration for Home Assistant

Python 16,646 776 Updated Jan 3, 2025

DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

Python 412 24 Updated Oct 31, 2024

Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.

Python 17,423 1,787 Updated Oct 15, 2024

 Now we have become very big, Different from the original idea. Collect premium software in various categories.

JavaScript 78,563 6,323 Updated Jan 4, 2025

Low latency JSON generation using LLMs ⚡️

Jupyter Notebook 388 14 Updated Mar 10, 2024

CUDA Kernel Benchmarking Library

Cuda 541 69 Updated Nov 20, 2024

A throughput-oriented high-performance serving framework for LLMs

Cuda 683 29 Updated Sep 21, 2024

Enhance Tesseract OCR output for scanned PDFs by applying Large Language Model (LLM) corrections.

Python 2,275 158 Updated Aug 21, 2024

Experimental projects related to TensorRT

MLIR 84 13 Updated Jan 4, 2025

User-friendly Desktop Client App for AI Models/LLMs (GPT, Claude, Gemini, Ollama...)

TypeScript 24,457 2,430 Updated Dec 30, 2024

Flash Attention in ~100 lines of CUDA (forward pass only)

Cuda 672 58 Updated Dec 30, 2024

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 9,073 1,045 Updated Jan 3, 2025

MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone

Python 12,999 910 Updated Oct 22, 2024

A lightweight library for portable low-level GPU computation using WebGPU.

C++ 3,784 176 Updated Dec 29, 2024

Assembler for NVIDIA Volta and Turing GPUs

Python 203 40 Updated Jan 13, 2022

[NeurIPS'24 Spotlight] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 whil…

Python 861 39 Updated Dec 28, 2024

A low-latency & high-throughput serving engine for LLMs

Python 288 35 Updated Sep 12, 2024

An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).

Cuda 229 16 Updated Oct 28, 2024

This is an online course where you can learn and master the skill of low-level performance analysis and tuning.

C++ 2,716 238 Updated Jan 5, 2025

A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems

Python 141 8 Updated Oct 15, 2024

An awesome repository of local AI tools

1,326 105 Updated Nov 13, 2024

Local AI API Platform

C++ 2,257 133 Updated Jan 5, 2025

Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)

Python 37,296 4,598 Updated Jan 4, 2025

C++ Insights - See your source code with the eyes of a compiler

C++ 4,155 245 Updated Oct 21, 2024

Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for LLMs

Python 76 6 Updated Nov 25, 2024

The official Meta Llama 3 GitHub site

Python 27,803 3,184 Updated Aug 12, 2024

A list of tutorials, paper, talks, and open-source projects for emerging compiler and architecture

422 36 Updated Nov 28, 2024
Next