Skip to content
View TKH666's full-sized avatar

Highlights

  • Pro

Block or report TKH666

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.

184 9 Updated Dec 7, 2024

Microsoft Azure Traces

Jupyter Notebook 867 147 Updated Dec 12, 2024

Stack trace visualizer

Perl 17,660 1,990 Updated Oct 20, 2024

A debian-based shell environment designed for Android and adb

Shell 325 102 Updated Feb 4, 2023

BCC - Tools for BPF-based Linux IO analysis, networking, monitoring, and more

C 20,848 3,913 Updated Jan 12, 2025

llama and other large language models on iOS and MacOS offline using GGML library.

Swift 1,492 99 Updated Dec 17, 2024

A GPU accelerated error-bounded lossy compression for scientific data.

C++ 69 28 Updated Jan 12, 2025

Error-bounded Lossy Data Compressor (for floating-point/integer datasets)

C 158 55 Updated Apr 6, 2024
Python 7 Updated Oct 2, 2024

[NeurIPS'24 Oral] HydraLoRA: An Asymmetric LoRA Architecture for Efficient Fine-Tuning

Python 146 8 Updated Dec 3, 2024

eBPF Android Debug Bridge

Rust 479 61 Updated Mar 30, 2024

MLX: An array framework for Apple silicon

C++ 18,340 1,058 Updated Jan 15, 2025

Cross-platform, customizable ML solutions for live and streaming media.

C++ 28,268 5,215 Updated Jan 15, 2025

Automated upstream mirror for libbpf stand-alone build.

C 2,260 420 Updated Jan 16, 2025

Minimal and opinionated eBPF tooling for the Rust ecosystem

Rust 798 141 Updated Jan 16, 2025

使用ebpf进行安卓应用的帧生成间隔跟踪

Rust 16 7 Updated Dec 3, 2024

Frame aware scheduling for android.

Rust 671 33 Updated Jan 15, 2025

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

Python 2,310 149 Updated Jan 15, 2025

📖A curated list of Awesome LLM/VLM Inference Papers with codes, such as FlashAttention, PagedAttention, Parallelism, etc. 🎉🎉

3,228 215 Updated Jan 16, 2025

📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

Cuda 1,980 207 Updated Jan 16, 2025

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Python 137,662 27,580 Updated Jan 16, 2025

NAS媒体库自动化管理工具

Python 7,468 903 Updated Jan 16, 2025

llm-export can export llm model to onnx.

Python 255 31 Updated Jan 9, 2025

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Python 15,378 1,243 Updated Dec 12, 2024

本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)

HTML 12,987 1,450 Updated Jan 15, 2025

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

Python 16,985 1,684 Updated Jan 16, 2025

On-device AI across mobile, embedded and edge for PyTorch

C++ 2,405 417 Updated Jan 16, 2025

Code repo for the paper "LLM-QAT Data-Free Quantization Aware Training for Large Language Models"

Python 265 25 Updated Sep 3, 2024

MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.

Python 1,220 67 Updated Nov 27, 2024
Next