Skip to content
View Wonderful-Me's full-sized avatar
🎯
Focusing
🎯
Focusing
  • Rice University
  • Houston, United States
  • 17:27 - 6h behind

Block or report Wonderful-Me

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Rotary Transformer

Python 890 52 Updated Mar 21, 2022
MoonBit 3 2 Updated Feb 15, 2025

Line-by-line profiling for Python

Python 2,856 125 Updated Jan 30, 2025

Push-Button End-to-End Testing of Kubernetes Operators and Controllers

Python 125 43 Updated Feb 14, 2025
Python 314 40 Updated Apr 2, 2024

LLM inference in C/C++

C++ 74,435 10,756 Updated Feb 16, 2025

This is a simple demonstration of more advanced, agentic patterns built on top of the Realtime API.

TypeScript 4,963 512 Updated Feb 13, 2025

Local models support for Microsoft's graphrag using ollama (llama3, mistral, gemma2 phi3)- LLM & Embedding extraction

Python 889 141 Updated Sep 30, 2024

An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.

Python 21,844 1,916 Updated Jan 23, 2025

High-speed Large Language Model Serving for Local Deployment

C++ 8,100 422 Updated Jan 28, 2025

A paper list of recent mamba efforts for low-level vision.

240 9 Updated Feb 13, 2025

Deep Learning Energy Measurement and Optimization

Python 237 30 Updated Feb 5, 2025

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

C++ 5,277 631 Updated Feb 14, 2025

Tile primitives for speedy kernels

Cuda 2,032 113 Updated Feb 16, 2025

Efficient, Flexible and Portable Structured Generation

C++ 694 42 Updated Feb 15, 2025

Fast Multimodal LLM on Mobile Devices

C++ 692 79 Updated Feb 9, 2025

Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.

587 40 Updated Feb 14, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 3,207 275 Updated Feb 16, 2025

xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism

Python 1,247 102 Updated Feb 10, 2025

PyTorch native quantization and sparsity for training and inference

Python 1,836 216 Updated Feb 14, 2025

My learning notes/codes for ML SYS.

Python 689 31 Updated Feb 16, 2025

📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

Cuda 2,320 247 Updated Feb 7, 2025

📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, Flash-Attention, Paged-Attention, Parallelism, etc. 🎉🎉

3,419 235 Updated Feb 13, 2025

A curated list of recent diffusion models for video generation, editing, restoration, understanding, etc.

3,984 225 Updated Feb 14, 2025
Python 87 11 Updated Oct 9, 2024

A curated list of modern Generative Artificial Intelligence projects and services

7,502 806 Updated Feb 11, 2025

Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 2, and other large language models.

Go 126,580 10,274 Updated Feb 15, 2025

Implements harmful/harmless refusal removal using pure HF Transformers

Python 530 77 Updated Jun 12, 2024

Google TPU optimizations for transformers models

Python 98 24 Updated Jan 21, 2025

An awesome repository & A comprehensive survey on interpretability of LLM attention heads.

TeX 306 9 Updated Feb 12, 2025
Next
Showing results