Skip to content
View rand-fly's full-sized avatar

Highlights

  • Pro

Organizations

@Infinideastudio

Block or report rand-fly

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Inference RWKV v5, v6 and (WIP) v7 with Qualcomm AI Engine Direct SDK

C++ 46 4 Updated Dec 17, 2024

💥 Blazing fast terminal file manager written in Rust, based on async I/O.

Rust 20,454 455 Updated Jan 10, 2025

Fast OS-level support for GPU checkpoint and restore

C 131 11 Updated Jan 7, 2025

Open-source Framework for HPCA2024 paper: Gemini: Mapping and Architecture Co-exploration for Large-scale DNN Chiplet Accelerators

C++ 63 11 Updated Aug 31, 2024

how to optimize some algorithm in cuda.

Cuda 1,811 150 Updated Jan 8, 2025

Fast Multimodal LLM on Mobile Devices

C++ 653 70 Updated Jan 10, 2025

Puzzles for learning Triton, play it with minimal environment configuration!

Python 190 9 Updated Dec 3, 2024

Puzzles for learning Triton

Jupyter Notebook 1,289 96 Updated Nov 18, 2024

A Chinese (Simplified) Translation Project for the Create: Astral modpack.

JavaScript 31 11 Updated Dec 21, 2024

General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for…

C++ 2,057 159 Updated Dec 10, 2024

A tool for bandwidth measurements on NVIDIA GPUs.

C++ 342 30 Updated Oct 18, 2024

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。

Python 23,987 1,775 Updated Jan 10, 2025
JavaScript 1 Updated Nov 8, 2024

纯c++的全平台llm加速库,支持python调用,chatglm-6B级模型单卡可达10000+token / s,支持glm, llama, moss基座,手机端流畅运行

C++ 3,366 348 Updated Dec 23, 2024

Low-bit LLM inference on CPU with lookup table

C++ 637 48 Updated Jan 9, 2025

INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model

C++ 1,449 100 Updated Aug 7, 2024

LLM inference in C/C++

C++ 70,484 10,172 Updated Jan 10, 2025

A Pascal to C/RISC-V compiler based on YACC

C++ 4 Updated Aug 30, 2024

Code for Diversity-Enhanced Learning for Instruction Adaptation in Large Language Models

Python 8 Updated Aug 31, 2024

Linux-capable out-of-order superscaler multicore LoongArch32 (LA32 / LA32R) processor.

SystemVerilog 17 1 Updated Aug 9, 2024

Dynamic Memory Management for Serving LLMs without PagedAttention

C 268 20 Updated Dec 6, 2024

llama3 implementation one matrix multiplication at a time

Jupyter Notebook 14,004 1,140 Updated May 23, 2024

夏令营截止日期DDL静态网页

Vue 134 2 Updated Nov 29, 2024

Open-source training data and evaluation tools used in Token-Efficient Leverage Learning

Python 9 Updated Apr 12, 2024

My personal vim/neovim configuration files, dotfiles, docs and other scripts.

Vim Script 12 Updated Jan 3, 2025

A tool to decode RISC-V and LoongArch and MIPS instructions in gtkwave

C++ 27 6 Updated Apr 8, 2024

A tool to decode RISC-V and LoongArch instructions in gtkwave

C++ 5 Updated Mar 23, 2024
JavaScript 73 77 Updated Jan 5, 2025

适用于龙芯杯团队赛入门选手的应急cache模块

Verilog 21 Updated Mar 13, 2024
Next