-
Rice University
- Houston, United States
Stars
llama and other large language models on iOS and MacOS offline using GGML library.
Universal LLM Deployment Engine with ML Compilation
FedML - The Research and Production Integrated Federated Learning Library: https://fedml.ai
MoBA: Mixture of Block Attention for Long-Context LLMs
📚FFPA: Yet another Faster Flash Prefill Attention with O(1)⚡️SRAM complexity for headdim > 256, 1.8x~3x↑🎉faster than SDPA EA.
ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization
On-device AI across mobile, embedded and edge for PyTorch
CoreNet: A library for training deep neural networks
Push-Button End-to-End Testing of Kubernetes Operators and Controllers
This is a simple demonstration of more advanced, agentic patterns built on top of the Realtime API.
Local models support for Microsoft's graphrag using ollama (llama3, mistral, gemma2 phi3)- LLM & Embedding extraction
An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.
High-speed Large Language Model Serving for Local Deployment
A paper list of recent mamba efforts for low-level vision.
Deep Learning Energy Measurement and Optimization
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
Tile primitives for speedy kernels
Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.
verl: Volcano Engine Reinforcement Learning for LLMs
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism