Skip to content

Commit f78d888

Browse files
committed
logs doc
1 parent d32e55f commit f78d888

12 files changed

+207
-2
lines changed

documents/docs/agents/app_agent.md

+15-1
Original file line numberDiff line numberDiff line change
@@ -22,15 +22,29 @@ To interact with the application, the `AppAgent` receives the following inputs:
2222
| Sub-Task | The sub-task description to be executed by the `AppAgent`, assigned by the `HostAgent`. | String |
2323
| Current Application | The name of the application to be interacted with. | String |
2424
| Control Information | Index, name and control type of available controls in the application. | List of Dictionaries |
25-
| Application Screenshots | Screenshots of the application to provide context to the `AppAgent`. | Image |
25+
| Application Screenshots | Screenshots of the application, including a clean screenshot, an annotated screenshot with labeled controls, and a screenshot with a rectangle around the selected control at the previous step (optional). | List of Strings |
2626
| Previous Sub-Tasks | The previous sub-tasks and their completion status. | List of Strings |
2727
| Previous Plan | The previous plan for the following steps. | List of Strings |
2828
| HostAgent Message | The message from the `HostAgent` for the completion of the sub-task. | String |
2929
| Retrived Information | The retrieved information from external knowledge bases or demonstration libraries. | String |
3030
| Blackboard | The shared memory space for storing and sharing information among the agents. | Dictionary |
3131

32+
33+
Below is an example of the annotated application screenshot with labeled controls. This follow the [Set-of-Mark](https://arxiv.org/pdf/2310.11441) paradigm.
34+
<h1 align="center">
35+
<img src="../../img/screenshots.png" alt="AppAgent Image" width="90%">
36+
</h1>
37+
38+
3239
By processing these inputs, the `AppAgent` determines the necessary actions to fulfill the user's request within the application.
3340

41+
!!! tip
42+
Whether to concatenate the clean screenshot and annotated screenshot can be configured in the `CONCAT_SCREENSHOT` field in the `config_dev.yaml` file.
43+
44+
!!! tip
45+
Whether to include the screenshot with a rectangle around the selected control at the previous step can be configured in the `INCLUDE_LAST_SCREENSHOT` field in the `config_dev.yaml` file.
46+
47+
3448
## AppAgent Output
3549

3650
With the inputs provided, the `AppAgent` generates the following outputs:

documents/docs/img/action_step2.png

437 KB
Loading
463 KB
Loading
615 KB
Loading
Loading

documents/docs/img/screenshots.png

532 KB
Loading
+14
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# Evaluation Logs
2+
3+
The evaluation logs store the evaluation results from the `EvaluationAgent`. The evaluation log contains the following information:
4+
5+
| Field | Description | Type |
6+
| --- | --- | --- |
7+
| Reason | The detailed reason for your judgment, by observing the screenshot differences and the <Execution Trajectory>. | String |
8+
| Sub-score | The sub-score of the evaluation in decomposing the evaluation into multiple sub-goals. | List of Dictionaries |
9+
| Complete | The completion status of the evaluation, can be `yes`, `no`, or `unsure`. | String |
10+
| level | The level of the evaluation. | String |
11+
| request | The request sent to the `EvaluationAgent`. | Dictionary |
12+
| id | The ID of the evaluation. | Integer |
13+
14+

documents/docs/logs/overview.md

+12
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
# UFO Logs
2+
3+
Logs are essential for debugging and understanding the behavior of the UFO framework. There are three types of logs generated by UFO:
4+
5+
| Log Type | Description | Location | Level |
6+
| --- | --- | --- | --- |
7+
| [Request Log](./request_logs.md) | Contains the prompt requests to LLMs. | `logs/{task_name}/request.log` | Info |
8+
| [Step Log](./step_logs.md) | Contains the agent's response to the user's request and additional information at every step. | `logs/{task_name}/response.log` | Info |
9+
| [Evaluation Log](./evaluation_logs.md) | Contains the evaluation results from the `EvaluationAgent`. | `logs/{task_name}/evaluation.log` | Info |
10+
| [Screenshots](./screenshots_logs.md) | Contains the screenshots of the application UI. | `logs/{task_name}/` | - |
11+
12+
All logs are stored in the `logs/{task_name}` directory.

documents/docs/logs/request_logs.md

+20
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Request Logs
2+
3+
The request is the prompt requests to the LLMs. The request log is stored in the `request.log` file. The request log contains the following information for each step:
4+
5+
| Field | Description |
6+
| --- | --- |
7+
| `step` | The step number of the session. |
8+
| `prompt` | The prompt message sent to the LLMs. |
9+
10+
The request log is stored at the `debug` level. You can configure the logging level in the `LOG_LEVEL` field in the `config_dev.yaml` file.
11+
12+
!!! tip
13+
You can use the following python code to read the request log:
14+
15+
import json
16+
17+
with open('logs/{task_name}/request.log', 'r') as f:
18+
for line in f:
19+
log = json.loads(line)
20+
+48
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
# Screenshot Logs
2+
3+
UFO also save desktop or application screenshots for debugging and evaluation purposes. The screenshot logs are stored in the `logs/{task_name}/`.
4+
5+
There are 4 types of screenshot logs generated by UFO, as detailed below.
6+
7+
8+
## Clean Screenshots
9+
At each step, UFO saves a clean screenshot of the desktop or application. The clean screenshot is saved in the `action_step{step_number}.png` file. In addition, the clean screenshots are also saved when a sub-task, round or session is completed. The clean screenshots are saved in the `action_round_{round_id}_sub_round_{sub_task_id}_final.png`, `action_round_{round_id}_final.png` and `action_step_final.png` files, respectively. Below is an example of a clean screenshot.
10+
11+
<h1 align="center">
12+
<img src="../../img/action_step2.png" alt="AppAgent Image" width="100%">
13+
</h1>
14+
15+
16+
## Annotation Screenshots
17+
UFO also saves annotated screenshots of the application, with each control item is annotated with a number, following the [Set-of-Mark](https://arxiv.org/pdf/2310.11441) paradigm. The annotated screenshots are saved in the `action_step{step_number}_annotated.png` file. Below is an example of an annotated screenshot.
18+
19+
<h1 align="center">
20+
<img src="../../img/action_step2_annotated.png" alt="AppAgent Image" width="100%">
21+
</h1>
22+
23+
!!!info
24+
Only selected types of controls are annotated in the screenshots. They are configured in the `config_dev.yaml` file under the `CONTROL_LIST` field.
25+
26+
!!!tip
27+
Different types of controls are annotated with different colors. You can configure the colors in the `config_dev.yaml` file under the `ANNOTATION_COLORS` field.
28+
29+
30+
## Concatenated Screenshots
31+
UFO also saves concatenated screenshots of the application, with clean and annotated screenshots concatenated side by side. The concatenated screenshots are saved in the `action_step{step_number}_concat.png` file. Below is an example of a concatenated screenshot.
32+
33+
<h1 align="center">
34+
<img src="../../img/action_step2_concat.png" alt="AppAgent Image" width="100%">
35+
</h1>
36+
37+
!!!info
38+
You can configure whether to feed the concatenated screenshots to the LLMs, or separate clean and annotated screenshots, in the `config_dev.yaml` file under the `CONCAT_SCREENSHOT` field.
39+
40+
## Selected Control Screenshots
41+
UFO saves screenshots of the selected control item for operation. The selected control screenshots are saved in the `action_step{step_number}_selected_controls.png` file. Below is an example of a selected control screenshot.
42+
43+
<h1 align="center">
44+
<img src="../../img/action_step2_selected_controls.png" alt="AppAgent Image" width="100%">
45+
</h1>
46+
47+
!!!info
48+
You can configure whether to feed LLM with the selected control screenshots at the previous step to enhance the context, in the `config_dev.yaml` file under the `INCLUDE_LAST_SCREENSHOT` field.

documents/docs/logs/step_logs.md

+97
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
# Step Logs
2+
3+
The step log contains the agent's response to the user's request and additional information at every step. The step log is stored in the `response.log` file. The log fields are different for `HostAgent` and `AppAgent`. The step log is at the `info` level.
4+
## HostAgent Logs
5+
6+
The `HostAgent` logs contain the following fields:
7+
8+
9+
### LLM Output
10+
11+
| Field | Description | Type |
12+
| --- | --- | --- |
13+
| Observation | The observation of current desktop screenshots. | String |
14+
| Thought | The logical reasoning process of the `HostAgent`. | String |
15+
| Current Sub-Task | The current sub-task to be executed by the `AppAgent`. | String |
16+
| Message | The message to be sent to the `AppAgent` for the completion of the sub-task. | String |
17+
| ControlLabel | The index of the selected application to execute the sub-task. | String |
18+
| ControlText | The name of the selected application to execute the sub-task. | String |
19+
| Plan | The plan for the following sub-tasks after the current sub-task. | List of Strings |
20+
| Status | The status of the agent, mapped to the `AgentState`. | String |
21+
| Comment | Additional comments or information provided to the user. | String |
22+
| Questions | The questions to be asked to the user for additional information. | List of Strings |
23+
| AppsToOpen | The application to be opened to execute the sub-task if it is not already open. | Dictionary |
24+
25+
26+
### Additional Information
27+
28+
| Field | Description | Type |
29+
| --- | --- | --- |
30+
| Step | The step number of the session. | Integer |
31+
| RoundStep | The step number of the current round. | Integer |
32+
| AgentStep | The step number of the `HostAgent`. | Integer |
33+
| Round | The round number of the session. | Integer |
34+
| ControlLabel | The index of the selected application to execute the sub-task. | Integer |
35+
| ControlText | The name of the selected application to execute the sub-task. | String |
36+
| Request | The user request. | String |
37+
| Agent | The agent that executed the step, set to `HostAgent`. | String |
38+
| AgentName | The name of the agent. | String |
39+
| Application | The application process name. | String |
40+
| Cost | The cost of the step. | Float |
41+
| Results | The results of the step, set to an empty string. | String |
42+
| CleanScreenshot | The image path of the desktop screenshot. | String |
43+
44+
45+
46+
## AppAgent Logs
47+
48+
The `AppAgent` logs contain the following fields:
49+
50+
### LLM Output
51+
52+
| Field | Description | Type |
53+
| --- | --- | --- |
54+
| Observation | The observation of the current application screenshots. | String |
55+
| Thought | The logical reasoning process of the `AppAgent`. | String |
56+
| ControlLabel | The index of the selected control to interact with. | String |
57+
| ControlText | The name of the selected control to interact with. | String |
58+
| Function | The function to be executed on the selected control. | String |
59+
| Args | The arguments required for the function execution. | List of Strings |
60+
| Status | The status of the agent, mapped to the `AgentState`. | String |
61+
| Plan | The plan for the following steps after the current action. | List of Strings |
62+
| Comment | Additional comments or information provided to the user. | String |
63+
| SaveScreenshot | The flag to save the screenshot of the application to the `blackboard` for future reference. | Boolean |
64+
65+
### Additional Information
66+
67+
| Field | Description | Type |
68+
| --- | --- | --- |
69+
| Step | The step number of the session. | Integer |
70+
| RoundStep | The step number of the current round. | Integer |
71+
| AgentStep | The step number of the `AppAgent`. | Integer |
72+
| Round | The round number of the session. | Integer |
73+
| Subtask | The sub-task to be executed by the `AppAgent`. | String |
74+
| SubtaskIndex | The index of the sub-task in the current round. | Integer |
75+
| Action | The action to be executed by the `AppAgent`. | String |
76+
| ActionType | The type of the action to be executed. | String |
77+
| Request | The user request. | String |
78+
| Agent | The agent that executed the step, set to `AppAgent`. | String |
79+
| AgentName | The name of the agent. | String |
80+
| Application | The application process name. | String |
81+
| Cost | The cost of the step. | Float |
82+
| Results | The results of the step. | String |
83+
| CleanScreenshot | The image path of the desktop screenshot. | String |
84+
| AnnotatedScreenshot | The image path of the annotated application screenshot. | String |
85+
| ConcatScreenshot | The image path of the concatenated application screenshot. | String |
86+
87+
!!! tip
88+
You can use the following python code to read the request log:
89+
90+
import json
91+
92+
with open('logs/{task_name}/request.log', 'r') as f:
93+
for line in f:
94+
log = json.loads(line)
95+
96+
!!! info
97+
The `FollowerAgent` logs share the same fields as the `AppAgent` logs.

documents/mkdocs.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ nav:
4949
- Request Logs: logs/request_logs.md
5050
- Step Logs: logs/step_logs.md
5151
- Evaluation Logs: logs/evaluation_logs.md
52-
- Screenshots: logs/screenshots.md
52+
- Screenshots: logs/screenshots_logs.md
5353
- Advanced Usage:
5454
- Reinforcing AppAgent:
5555
- Learning from Demonstration: advanced_usage/learning_from_demonstration.md

0 commit comments

Comments
 (0)