Highlights
- Pro
Stars
Inference RWKV v5, v6 and (WIP) v7 with Qualcomm AI Engine Direct SDK
💥 Blazing fast terminal file manager written in Rust, based on async I/O.
Fast OS-level support for GPU checkpoint and restore
Open-source Framework for HPCA2024 paper: Gemini: Mapping and Architecture Co-exploration for Large-scale DNN Chiplet Accelerators
how to optimize some algorithm in cuda.
Puzzles for learning Triton, play it with minimal environment configuration!
A Chinese (Simplified) Translation Project for the Create: Astral modpack.
General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for…
A tool for bandwidth measurements on NVIDIA GPUs.
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
纯c++的全平台llm加速库,支持python调用,chatglm-6B级模型单卡可达10000+token / s,支持glm, llama, moss基座,手机端流畅运行
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
Code for Diversity-Enhanced Learning for Instruction Adaptation in Large Language Models
Linux-capable out-of-order superscaler multicore LoongArch32 (LA32 / LA32R) processor.
Dynamic Memory Management for Serving LLMs without PagedAttention
llama3 implementation one matrix multiplication at a time
Open-source training data and evaluation tools used in Token-Efficient Leverage Learning
My personal vim/neovim configuration files, dotfiles, docs and other scripts.
A tool to decode RISC-V and LoongArch and MIPS instructions in gtkwave
A tool to decode RISC-V and LoongArch instructions in gtkwave