yo3a
diff --git a/‎documents/docs/agents/app_agent.md
+15-1 b/‎documents/docs/agents/app_agent.md
+15-1
diff --git a/‎documents/docs/img/action_step2.png
437 KB b/‎documents/docs/img/action_step2.png
437 KB
diff --git a/‎documents/docs/img/action_step2_annotated.png
463 KB b/‎documents/docs/img/action_step2_annotated.png
463 KB
diff --git a/‎documents/docs/img/action_step2_concat.png
615 KB b/‎documents/docs/img/action_step2_concat.png
615 KB
diff --git a/‎documents/docs/img/action_step2_selected_controls.png
437 KB b/‎documents/docs/img/action_step2_selected_controls.png
437 KB
diff --git a/‎documents/docs/img/screenshots.png
532 KB b/‎documents/docs/img/screenshots.png
532 KB
diff --git a/‎documents/docs/logs/evaluation_logs.md
+14 b/‎documents/docs/logs/evaluation_logs.md
+14
diff --git a/‎documents/docs/logs/overview.md
+12 b/‎documents/docs/logs/overview.md
+12
diff --git a/‎documents/docs/logs/request_logs.md
+20 b/‎documents/docs/logs/request_logs.md
+20
diff --git a/‎documents/docs/logs/screenshots_logs.md
+48 b/‎documents/docs/logs/screenshots_logs.md
+48
diff --git a/‎documents/docs/logs/step_logs.md
+97 b/‎documents/docs/logs/step_logs.md
+97
diff --git a/‎documents/mkdocs.yml
+1-1 b/‎documents/mkdocs.yml
+1-1
@@ -22,15 +22,29 @@ To interact with the application, the `AppAgent` receives the following inputs:
 | Sub-Task | The sub-task description to be executed by the `AppAgent`, assigned by the `HostAgent`. | String |
 | Current Application | The name of the application to be interacted with. | String |
 | Control Information | Index, name and control type of available controls in the application. | List of Dictionaries |
-| Application Screenshots | Screenshots of the application to provide context to the `AppAgent`. | Image |
+| Application Screenshots | Screenshots of the application, including a clean screenshot, an annotated screenshot with labeled controls, and a screenshot with a rectangle around the selected control at the previous step (optional). | List of Strings |
 | Previous Sub-Tasks | The previous sub-tasks and their completion status. | List of Strings |
 | Previous Plan | The previous plan for the following steps. | List of Strings |
 | HostAgent Message | The message from the `HostAgent` for the completion of the sub-task. | String |
 | Retrived Information | The retrieved information from external knowledge bases or demonstration libraries. | String |
 | Blackboard | The shared memory space for storing and sharing information among the agents. | Dictionary |
 
+
+Below is an example of the annotated application screenshot with labeled controls. This follow the [Set-of-Mark](https://arxiv.org/pdf/2310.11441) paradigm.
+<h1 align="center">
+    <img src="../../img/screenshots.png" alt="AppAgent Image" width="90%">
+</h1>
+
+
 By processing these inputs, the `AppAgent` determines the necessary actions to fulfill the user's request within the application.
 
+!!! tip
+    Whether to concatenate the clean screenshot and annotated screenshot can be configured in the `CONCAT_SCREENSHOT` field in the `config_dev.yaml` file.
+
+!!! tip
+     Whether to include the screenshot with a rectangle around the selected control at the previous step can be configured in the `INCLUDE_LAST_SCREENSHOT` field in the `config_dev.yaml` file.
+
+
 ## AppAgent Output
 
 With the inputs provided, the `AppAgent` generates the following outputs:
 
@@ -0,0 +1,14 @@
+# Evaluation Logs
+
+The evaluation logs store the evaluation results from the `EvaluationAgent`. The evaluation log contains the following information:
+
+| Field | Description | Type |
+| --- | --- | --- |
+| Reason | The detailed reason for your judgment, by observing the screenshot differences and the <Execution Trajectory>. | String |
+| Sub-score | The sub-score of the evaluation in decomposing the evaluation into multiple sub-goals. | List of Dictionaries |
+| Complete | The completion status of the evaluation, can be `yes`, `no`, or `unsure`. | String |
+| level | The level of the evaluation. | String |
+| request | The request sent to the `EvaluationAgent`. | Dictionary |
+| id | The ID of the evaluation. | Integer |
+
+
@@ -0,0 +1,12 @@
+# UFO Logs
+
+Logs are essential for debugging and understanding the behavior of the UFO framework. There are three types of logs generated by UFO:
+
+| Log Type | Description | Location | Level |
+| --- | --- | --- | --- |
+| [Request Log](./request_logs.md) | Contains the prompt requests to LLMs. | `logs/{task_name}/request.log` | Info |
+| [Step Log](./step_logs.md) | Contains the agent's response to the user's request and additional information at every step. | `logs/{task_name}/response.log` | Info |
+| [Evaluation Log](./evaluation_logs.md) | Contains the evaluation results from the `EvaluationAgent`. | `logs/{task_name}/evaluation.log` | Info |
+| [Screenshots](./screenshots_logs.md) | Contains the screenshots of the application UI. | `logs/{task_name}/` | - |
+
+All logs are stored in the `logs/{task_name}` directory.
@@ -0,0 +1,20 @@
+# Request Logs
+
+The request is the prompt requests to the LLMs. The request log is stored in the `request.log` file. The request log contains the following information for each step:
+
+| Field | Description |
+| --- | --- |
+| `step` | The step number of the session. |
+| `prompt` | The prompt message sent to the LLMs. |
+
+The request log is stored at the `debug` level. You can configure the logging level in the `LOG_LEVEL` field in the `config_dev.yaml` file.
+
+!!! tip
+    You can use the following python code to read the request log:
+
+        import json
+
+        with open('logs/{task_name}/request.log', 'r') as f:
+            for line in f:
+                log = json.loads(line)
+    
@@ -0,0 +1,48 @@
+# Screenshot Logs
+
+UFO also save desktop or application screenshots for debugging and evaluation purposes. The screenshot logs are stored in the `logs/{task_name}/`.
+
+There are 4 types of screenshot logs generated by UFO, as detailed below.
+
+
+## Clean Screenshots
+At each step, UFO saves a clean screenshot of the desktop or application. The clean screenshot is saved in the `action_step{step_number}.png` file. In addition, the clean screenshots are also saved when a sub-task, round or session is completed. The clean screenshots are saved in the `action_round_{round_id}_sub_round_{sub_task_id}_final.png`, `action_round_{round_id}_final.png` and `action_step_final.png` files, respectively. Below is an example of a clean screenshot.
+
+<h1 align="center">
+    <img src="../../img/action_step2.png" alt="AppAgent Image" width="100%">
+</h1>
+
+
+## Annotation Screenshots
+UFO also saves annotated screenshots of the application, with each control item is annotated with a number, following the [Set-of-Mark](https://arxiv.org/pdf/2310.11441) paradigm. The annotated screenshots are saved in the `action_step{step_number}_annotated.png` file. Below is an example of an annotated screenshot.
+
+<h1 align="center">
+    <img src="../../img/action_step2_annotated.png" alt="AppAgent Image" width="100%">
+</h1>
+
+!!!info
+    Only selected types of controls are annotated in the screenshots. They are configured in the `config_dev.yaml` file under the `CONTROL_LIST` field.
+
+!!!tip
+    Different types of controls are annotated with different colors. You can configure the colors in the `config_dev.yaml` file under the `ANNOTATION_COLORS` field.
+
+
+## Concatenated Screenshots
+UFO also saves concatenated screenshots of the application, with clean and annotated screenshots concatenated side by side. The concatenated screenshots are saved in the `action_step{step_number}_concat.png` file. Below is an example of a concatenated screenshot.
+
+<h1 align="center">
+    <img src="../../img/action_step2_concat.png" alt="AppAgent Image" width="100%">
+</h1>
+
+!!!info
+    You can configure whether to feed the concatenated screenshots to the LLMs, or separate clean and annotated screenshots, in the `config_dev.yaml` file under the `CONCAT_SCREENSHOT` field.
+
+## Selected Control Screenshots
+UFO saves screenshots of the selected control item for operation. The selected control screenshots are saved in the `action_step{step_number}_selected_controls.png` file. Below is an example of a selected control screenshot.
+
+<h1 align="center">
+    <img src="../../img/action_step2_selected_controls.png" alt="AppAgent Image" width="100%">
+</h1>
+
+!!!info
+    You can configure whether to feed LLM with the selected control screenshots at the previous step to enhance the context, in the `config_dev.yaml` file under the `INCLUDE_LAST_SCREENSHOT` field.
@@ -0,0 +1,97 @@
+# Step Logs
+
+The step log contains the agent's response to the user's request and additional information at every step. The step log is stored in the `response.log` file. The log fields are different for `HostAgent` and `AppAgent`. The step log is at the `info` level.
+## HostAgent Logs
+
+The `HostAgent` logs contain the following fields:
+
+
+### LLM Output
+
+| Field | Description | Type |
+| --- | --- | --- |
+| Observation | The observation of current desktop screenshots. | String |
+| Thought | The logical reasoning process of the `HostAgent`. | String |
+| Current Sub-Task | The current sub-task to be executed by the `AppAgent`. | String |
+| Message | The message to be sent to the `AppAgent` for the completion of the sub-task. | String |
+| ControlLabel | The index of the selected application to execute the sub-task. | String |
+| ControlText | The name of the selected application to execute the sub-task. | String |
+| Plan | The plan for the following sub-tasks after the current sub-task. | List of Strings |
+| Status | The status of the agent, mapped to the `AgentState`. | String |
+| Comment | Additional comments or information provided to the user. | String |
+| Questions | The questions to be asked to the user for additional information. | List of Strings |
+| AppsToOpen | The application to be opened to execute the sub-task if it is not already open. | Dictionary |
+
+
+### Additional Information
+
+| Field | Description | Type |
+| --- | --- | --- |
+| Step | The step number of the session. | Integer |
+| RoundStep | The step number of the current round. | Integer |
+| AgentStep | The step number of the `HostAgent`. | Integer |
+| Round | The round number of the session. | Integer |
+| ControlLabel | The index of the selected application to execute the sub-task. | Integer |
+| ControlText | The name of the selected application to execute the sub-task. | String |
+| Request | The user request. | String |
+| Agent | The agent that executed the step, set to `HostAgent`. | String |
+| AgentName | The name of the agent. | String |
+| Application | The application process name. | String |
+| Cost | The cost of the step. | Float |
+| Results | The results of the step, set to an empty string. | String |
+| CleanScreenshot | The image path of the desktop screenshot. | String |
+
+
+
+## AppAgent Logs
+
+The `AppAgent` logs contain the following fields:
+
+### LLM Output
+
+| Field | Description | Type |
+| --- | --- | --- |
+| Observation | The observation of the current application screenshots. | String |
+| Thought | The logical reasoning process of the `AppAgent`. | String |
+| ControlLabel | The index of the selected control to interact with. | String |
+| ControlText | The name of the selected control to interact with. | String |
+| Function | The function to be executed on the selected control. | String |
+| Args | The arguments required for the function execution. | List of Strings |
+| Status | The status of the agent, mapped to the `AgentState`. | String |
+| Plan | The plan for the following steps after the current action. | List of Strings |
+| Comment | Additional comments or information provided to the user. | String |
+| SaveScreenshot | The flag to save the screenshot of the application to the `blackboard` for future reference. | Boolean |
+
+### Additional Information
+
+| Field | Description | Type |
+| --- | --- | --- |
+| Step | The step number of the session. | Integer |
+| RoundStep | The step number of the current round. | Integer |
+| AgentStep | The step number of the `AppAgent`. | Integer |
+| Round | The round number of the session. | Integer |
+| Subtask | The sub-task to be executed by the `AppAgent`. | String |
+| SubtaskIndex | The index of the sub-task in the current round. | Integer |
+| Action | The action to be executed by the `AppAgent`. | String |
+| ActionType | The type of the action to be executed. | String |
+| Request | The user request. | String |
+| Agent | The agent that executed the step, set to `AppAgent`. | String |
+| AgentName | The name of the agent. | String |
+| Application | The application process name. | String |
+| Cost | The cost of the step. | Float |
+| Results | The results of the step. | String |
+| CleanScreenshot | The image path of the desktop screenshot. | String |
+| AnnotatedScreenshot | The image path of the annotated application screenshot. | String |
+| ConcatScreenshot | The image path of the concatenated application screenshot. | String |
+
+!!! tip
+    You can use the following python code to read the request log:
+
+        import json
+
+        with open('logs/{task_name}/request.log', 'r') as f:
+            for line in f:
+                log = json.loads(line)
+
+!!! info
+    The `FollowerAgent` logs share the same fields as the `AppAgent` logs.
@@ -49,7 +49,7 @@ nav:
       - Request Logs: logs/request_logs.md
       - Step Logs: logs/step_logs.md
       - Evaluation Logs: logs/evaluation_logs.md
-      - Screenshots: logs/screenshots.md
+      - Screenshots: logs/screenshots_logs.md
   - Advanced Usage: 
       - Reinforcing AppAgent:
         - Learning from Demonstration: advanced_usage/learning_from_demonstration.md