Starred repositories
This is the official code release for our work, Denoising Vision Transformers.
🐫 CAMEL: Finding the Scaling Law of Agents. The first and the best multi-agent framework. https://www.camel-ai.org
🦉 OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation
[ICLR 2023] ReAct: Synergizing Reasoning and Acting in Language Models
[WIP] The all in one inference optimization solution for ComfyUI, universal, flexible, and fast.
📚 Collection of awesome generation acceleration resources.
Accelerating Diffusion Transformers with Token-wise Feature Caching
Easy and fast file sharing from the command-line.
Hallucinations (Confabulations) Document-Based Benchmark for RAG. Includes human-verified questions and answers.
This benchmark tests how well LLMs incorporate a set of 10 mandatory story elements (characters, objects, core concepts, attributes, motivations, etc.) in a short creative story
Democratizing Reinforcement Learning for LLMs
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
Finetune Llama 3.3, DeepSeek-R1 & Reasoning LLMs 2x faster with 70% less memory! 🦥
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
Video Generation Foundation Models: https://saiyan-world.github.io/goku/
A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.
📺 An End-to-End Solution for High-Resolution and Long Video Generation Based on Transformer Diffusion
[CVPR2024 Highlight] VBench - We Evaluate Video Generation
Towards Large Multimodal Models as Visual Foundation Agents
[AAAI 25] Official Implementation for ”E-Bench: Subjective-Aligned Benchmark Suite for Text-Driven Video Editing Quality Assessment“
Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"