Skip to content
View ErvinXie's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report ErvinXie

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

FlashInfer: Kernel Library for LLM Serving

Cuda 1,757 175 Updated Jan 9, 2025

High-performance In-browser LLM Inference Engine

TypeScript 14,205 919 Updated Dec 23, 2024

SGLang is a fast serving framework for large language models and vision language models.

Python 7,231 688 Updated Jan 11, 2025

A code sample demonstrating how to share and rebuild a PyTorch GPU tensor via its pointer/reference between different processes.

Python 7 1 Updated Aug 27, 2024
C++ 3 Updated Jun 25, 2024

Advanced Matrix Extensions (AMX) Guide

C++ 77 7 Updated Jan 11, 2022

A JIT assembler for x86/x64 architectures supporting MMX, SSE (1-4), AVX (1-2, 512), FPU, APX, and AVX10.2

C++ 2,073 276 Updated Nov 11, 2024

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

Python 2,298 259 Updated Jan 10, 2025

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡

Python 2,150 210 Updated Oct 8, 2024

An innovative library for efficient LLM inference via low-bit quantization

C++ 352 38 Updated Aug 30, 2024

Running linear algebra as fast as possible on Apple silicon

Swift 18 2 Updated Aug 18, 2023

Apple AMX Instruction Set

C 1,024 50 Updated Dec 26, 2024

Performance analysis tools based on Linux perf_events (aka perf) and ftrace

Shell 9,963 1,646 Updated Nov 22, 2023

An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).

Cuda 229 16 Updated Oct 28, 2024

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 828 46 Updated Nov 14, 2024

SpotServe: Serving Generative Large Language Models on Preemptible Instances

109 9 Updated Feb 22, 2024

Reimplementation of RA3.exe (Red Alert 3 game launcher)

C++ 116 17 Updated Feb 7, 2023

The world's simplest facial recognition api for Python and the command line

Python 53,890 13,526 Updated Aug 21, 2024

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 2,343 134 Updated Jan 10, 2025

Self-hosted AI coding assistant

Rust 22,530 1,061 Updated Jan 11, 2025

build-once run-anywhere c library

C 18,692 650 Updated Jan 6, 2025
Python 33 6 Updated Aug 14, 2023

qstock由“Python金融量化”公众号开发,试图打造成个人量化投研分析包,目前包括数据获取(data)、可视化(plot)、选股(stock)和量化回测(策略backtest)模块。 qstock将为用户提供简洁的数据接口和规整化后的金融市场数据。可视化模块为用户提供基于web的交互图形的简单接口; 选股模块提供了同花顺的选股数据和自定义选股,包括RPS、MM趋势、财务指标、资金流模型…

Python 1,037 260 Updated May 22, 2023

受 pytdx 启发的 A 股数据获取工具

Rust 139 43 Updated Sep 24, 2024

股票接口 | 韭菜小猪 | A股 | 美股 | 港股 | 股票 | 基金 | JavaScript

TypeScript 749 105 Updated Apr 9, 2024

Peregrine: A Pattern-Aware Graph Mining System

C++ 203 35 Updated Sep 14, 2023
C++ 35 15 Updated Feb 2, 2024

Scalable graph analytics database powered by a multithreaded, vectorized temporal engine, written in Rust

HTML 359 55 Updated Jan 10, 2025
Scala 28 9 Updated Sep 27, 2024
Next