-
Rice University
- Houston, United States
-
17:27
- 6h behind
Stars
Push-Button End-to-End Testing of Kubernetes Operators and Controllers
This is a simple demonstration of more advanced, agentic patterns built on top of the Realtime API.
Local models support for Microsoft's graphrag using ollama (llama3, mistral, gemma2 phi3)- LLM & Embedding extraction
An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.
High-speed Large Language Model Serving for Local Deployment
A paper list of recent mamba efforts for low-level vision.
Deep Learning Energy Measurement and Optimization
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
Tile primitives for speedy kernels
Efficient, Flexible and Portable Structured Generation
Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.
verl: Volcano Engine Reinforcement Learning for LLMs
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
PyTorch native quantization and sparsity for training and inference
My learning notes/codes for ML SYS.
📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).
📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, Flash-Attention, Paged-Attention, Parallelism, etc. 🎉🎉
A curated list of recent diffusion models for video generation, editing, restoration, understanding, etc.
A curated list of modern Generative Artificial Intelligence projects and services
Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 2, and other large language models.
Implements harmful/harmless refusal removal using pure HF Transformers
Google TPU optimizations for transformers models
An awesome repository & A comprehensive survey on interpretability of LLM attention heads.