Skip to content
View yizhang2077's full-sized avatar

Block or report yizhang2077

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Running large language models on a single GPU for throughput-oriented scenarios.

Python 9,235 552 Updated Oct 28, 2024

My learning notes/codes for ML SYS.

Python 178 6 Updated Dec 12, 2024

how to optimize some algorithm in cuda.

Cuda 1,718 141 Updated Dec 12, 2024

ASCII generator (image to text, image to image, video to video)

Python 7,422 570 Updated Nov 22, 2024

Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]

Python 15,046 1,763 Updated Dec 12, 2024

Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.

135 7 Updated Dec 7, 2024

Inference Llama 2 in one file of pure C

C 17,561 2,104 Updated Aug 6, 2024

LLM training in simple, raw C/CUDA

Cuda 24,696 2,799 Updated Oct 2, 2024

📚150+ Tensor/CUDA Cores Kernels, ⚡️flash-attention-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS 🎉🎉).

Cuda 1,623 173 Updated Dec 12, 2024

📖A curated list of Awesome LLM/VLM Inference Papers with codes, such as FlashAttention, PagedAttention, Parallelism, etc. 🎉🎉

2,983 202 Updated Dec 9, 2024

Making large AI models cheaper, faster and more accessible

Python 38,903 4,351 Updated Dec 10, 2024

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

Python 9,683 916 Updated Dec 8, 2024

learning how CUDA works

Cuda 171 23 Updated Aug 16, 2024

LLM inference in C/C++

C++ 69,154 9,936 Updated Dec 12, 2024

SGLang is a fast serving framework for large language models and vision language models.

Python 6,496 575 Updated Dec 12, 2024

交易模块

Python 4,191 916 Updated May 13, 2024

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Python 37,835 6,049 Updated Dec 9, 2024

[🔥updating ...] AI 自动量化交易机器人(完全本地部署) AI-powered Quantitative Investment Research Platform. 📃 online docs: https://ufund-me.github.io/Qbot ✨ :news: qbot-mini: https://github.com/Charmve/iQuant

Jupyter Notebook 8,253 1,154 Updated Nov 9, 2024

基于Python的开源量化交易平台开发框架

Python 26,188 8,898 Updated Nov 29, 2024

Inference code for Llama models

Python 56,771 9,602 Updated Aug 18, 2024

The official Meta Llama 3 GitHub site

Python 27,479 3,126 Updated Aug 12, 2024

NoSQL data store using the SEASTAR framework, compatible with Redis

C++ 1,316 170 Updated Oct 2, 2019

NoSQL data store using the seastar framework, compatible with Apache Cassandra

C++ 13,717 1,306 Updated Dec 12, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 31,775 4,832 Updated Dec 12, 2024

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 20,673 2,281 Updated Aug 12, 2024

A collection of modern/faster/saner alternatives to common unix commands.

31,217 787 Updated Sep 10, 2024

k8s tutorials | k8s 教程

Go 4,698 536 Updated Oct 12, 2024

High Performance Embedded Key-Value Store

C 689 57 Updated Dec 12, 2024

KubeBlocks is an open-source control plane software that runs and manages databases, message queues and other stateful applications on K8s.

Go 2,241 187 Updated Dec 12, 2024

[SIGMOD 2023] High-Dimensional Approximate Nearest Neighbor Search: with Reliable and Efficient Distance Comparison Operations

C++ 46 3 Updated Mar 17, 2023
Next