Saaket Agashe, Jiuzhou Han , Shuyu Gan, Jiachen Yang, Ang Li, Xin Eric Wang
Agent S is a new agentic framework designed to enable computers to be used as intuitively as a human would. We introduce an Experience-Augmented Hierarchical Planning method. This method utilizes Online Web Knowledge for up-to-date information on frequently changing software and websites, along with Narrative Memory to leverage high-level experiences from past interactions. By breaking complex tasks into manageable subtasks and using Episodic Memory for step-by-step guidance, Agent S continuously refines its actions and learns from experience, achieving adaptable and effective task planning.
Results of Successful Rate (%) on the OSWorld full test set of all 369 test examples using Image + Accessibility Tree input.
Clone the Agent S Repository
git clone https://github.com/simular-ai/GUI-agent.git
We recommend using Anaconda or Miniconda to create a virtual environment and install the required dependencies. We used Python 3.9 for development and experiments.
conda create -n agent_s python=3.9
conda activate agent_s
Install the agent_s package and dependencies
pip install -e .
-
Ensure Docker is installed and running on your system.
-
Clone the Perplexica repository:
git clone https://github.com/ItzCrazyKns/Perplexica.git
-
After cloning, navigate to the directory containing the project files.
-
Rename the
sample.config.toml
file toconfig.toml
. For Docker setups, you need only fill in the following fields:-
OPENAI
: Your OpenAI API key. You only need to fill this if you wish to use OpenAI's models. -
OLLAMA
: Your Ollama API URL. You should enter it ashttp://host.docker.internal:PORT_NUMBER
. If you installed Ollama on port 11434, usehttp://host.docker.internal:11434
. For other ports, adjust accordingly. You need to fill this if you wish to use Ollama's models instead of OpenAI's. -
GROQ
: Your Groq API key. You only need to fill this if you wish to use Groq's hosted models. -
ANTHROPIC
: Your Anthropic API key. You only need to fill this if you wish to use Anthropic models.Note: You can change these after starting Perplexica from the settings dialog.
-
SIMILARITY_MEASURE
: The similarity measure to use (This is filled by default; you can leave it as is if you are unsure about it.)
-
-
Ensure you are in the directory containing the
docker-compose.yaml
file and execute:docker compose up -d
For a more detailed setup and usage guide, refer to the Perplexica Repository
Run the ocr_server.py file code to use OCR-based bounding boxes.
cd agent_s
python ocr_server.py
Switch to a new terminal where you will run Agent S. Set the OCR_SERVER_ADDRESS environment variable as shown below. For a better experience, add the following line directly to your .bashrc (Linux), or .zshrc (MacOS) file.
export OCR_SERVER_ADDRESS=http://localhost:8000/ocr/
You can change the server address by editing the address in agent_s/ocr_server.py file
To deploy Agent S in OSWorld, follow the OSWorld Deployment instructions.
To deploy Agent S in WindowsAgentArena, follow the WindowsAgentArena Deployment instructions.
We support running Agent S directly on your own system through OpenACI. To run Agent S on your own system run:
python examples/cli_app.py --model <MODEL>
This will show a user query prompt where you can enter your query and interact with Agent S.
@misc{agashe2024agentsopenagentic,
title={Agent S: An Open Agentic Framework that Uses Computers Like a Human},
author={Saaket Agashe and Jiuzhou Han and Shuyu Gan and Jiachen Yang and Ang Li and Xin Eric Wang},
year={2024},
eprint={2410.08164},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2410.08164},
}