Skip to content
/ Agent-S Public
forked from simular-ai/Agent-S

Agent S: an open agentic framework that uses computers like a human

License

Notifications You must be signed in to change notification settings

XC0R/Agent-S

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Logo Agent S: An Open Agentic Framework that Uses Computers Like a Human

🌐 [Website] 📄 [Paper]

Saaket Agashe, Jiuzhou Han , Shuyu Gan, Jiachen Yang, Ang Li, Xin Eric Wang

💡 Introduction

Agent S is a new agentic framework designed to enable computers to be used as intuitively as a human would. We introduce an Experience-Augmented Hierarchical Planning method. This method utilizes Online Web Knowledge for up-to-date information on frequently changing software and websites, along with Narrative Memory to leverage high-level experiences from past interactions. By breaking complex tasks into manageable subtasks and using Episodic Memory for step-by-step guidance, Agent S continuously refines its actions and learns from experience, achieving adaptable and effective task planning.

🎯 Results


Results of Successful Rate (%) on the OSWorld full test set of all 369 test examples using Image + Accessibility Tree input.

🛠️ Installation

Clone the Agent S Repository

git clone https://github.com/simular-ai/GUI-agent.git

We recommend using Anaconda or Miniconda to create a virtual environment and install the required dependencies. We used Python 3.9 for development and experiments.

conda create -n agent_s python=3.9
conda activate agent_s

Install the agent_s package and dependencies

pip install -e .

Setup Retrieval from Web using Perplexica

  1. Ensure Docker is installed and running on your system.

  2. Clone the Perplexica repository:

    git clone https://github.com/ItzCrazyKns/Perplexica.git
  3. After cloning, navigate to the directory containing the project files.

  4. Rename the sample.config.toml file to config.toml. For Docker setups, you need only fill in the following fields:

    • OPENAI: Your OpenAI API key. You only need to fill this if you wish to use OpenAI's models.

    • OLLAMA: Your Ollama API URL. You should enter it as http://host.docker.internal:PORT_NUMBER. If you installed Ollama on port 11434, use http://host.docker.internal:11434. For other ports, adjust accordingly. You need to fill this if you wish to use Ollama's models instead of OpenAI's.

    • GROQ: Your Groq API key. You only need to fill this if you wish to use Groq's hosted models.

    • ANTHROPIC: Your Anthropic API key. You only need to fill this if you wish to use Anthropic models.

      Note: You can change these after starting Perplexica from the settings dialog.

    • SIMILARITY_MEASURE: The similarity measure to use (This is filled by default; you can leave it as is if you are unsure about it.)

  5. Ensure you are in the directory containing the docker-compose.yaml file and execute:

    docker compose up -d

For a more detailed setup and usage guide, refer to the Perplexica Repository

Setup Paddle-OCR Server

Run the ocr_server.py file code to use OCR-based bounding boxes.

cd agent_s
python ocr_server.py

Switch to a new terminal where you will run Agent S. Set the OCR_SERVER_ADDRESS environment variable as shown below. For a better experience, add the following line directly to your .bashrc (Linux), or .zshrc (MacOS) file.

export OCR_SERVER_ADDRESS=http://localhost:8000/ocr/

You can change the server address by editing the address in agent_s/ocr_server.py file

🚀 Usage

OSWorld

To deploy Agent S in OSWorld, follow the OSWorld Deployment instructions.

WindowsAgentArena

To deploy Agent S in WindowsAgentArena, follow the WindowsAgentArena Deployment instructions.

Run Locally on your Own Computer

We support running Agent S directly on your own system through OpenACI. To run Agent S on your own system run:

python examples/cli_app.py --model <MODEL>

This will show a user query prompt where you can enter your query and interact with Agent S.

💬 Citation

@misc{agashe2024agentsopenagentic,
      title={Agent S: An Open Agentic Framework that Uses Computers Like a Human}, 
      author={Saaket Agashe and Jiuzhou Han and Shuyu Gan and Jiachen Yang and Ang Li and Xin Eric Wang},
      year={2024},
      eprint={2410.08164},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2410.08164}, 
}

About

Agent S: an open agentic framework that uses computers like a human

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%