Lists (5)
Sort Name ascending (A-Z)
Stars
[NeurIPS 2024 D&B Track] GTA: A Benchmark for General Tool Agents
Official repo for "VisionZip: Longer is Better but Not Necessary in Vision Language Models"
Amodal Depth Anything: Amodal Depth Estimation in the Wild
LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoning
Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"
Companion code for FanOutQA: Multi-Hop, Multi-Document Question Answering for Large Language Models (ACL 2024)
[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Building Open LLM Web Agents with Self-Evolving Online Curriculum RL
The official pytorch implementation of Exploring the Interactive Guidance for Unified and Effective Image Matting
Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.
O1 Replication Journey: A Strategic Progress Report – Part I
[ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large multimodal models (LMMs) such as GPT-4V(ision).
The official implementation of paper "Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language Models as Agents"
official repo for the paper "Learning From Mistakes Makes LLM Better Reasoner"
Windows Agent Arena (WAA) 🪟 is a scalable OS platform for testing and benchmarking of multi-modal AI agents.
Official repo for paper DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning.
The First Multimodal Seach Engine Pipeline and Benchmark for LMMs
A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]
JARVIS, a system to connect LLMs with ML community. Paper: https://arxiv.org/pdf/2303.17580.pdf
🌍 Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL'24 Best Resource Paper.
SuperPrompt is an attempt to engineer prompts that might help us understand AI agents.
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Agent driven automation starting with the web. Try it: https://www.emergence.ai/web-automation-api
Code for "WebVoyager: WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models"