Skip to content

Commit

Permalink
docs(app): AGE-517 update test set docs
Browse files Browse the repository at this point in the history
  • Loading branch information
mmabrouk committed Sep 6, 2024
1 parent a2b9ee8 commit 5e3117d
Show file tree
Hide file tree
Showing 6 changed files with 64 additions and 57 deletions.
121 changes: 64 additions & 57 deletions docs/docs/evaluation/03-create-test-sets.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -2,40 +2,52 @@
title: "Create Test Sets"
---

This guide will help you create, edit, and use test sets effectively.
This guide outlines the various methods for creating test sets in Agenta and provides specifications for the test set schema.

Test sets in Agenta can be loaded in the playground, used in evaluations, or for conducting human evaluations/annotations.
Test sets are used for runnning automatic or human evaluation. They can also be loaded into the playground, allowing you to experiment with different prompts.

You can create a test set in Agenta using various methods:
Test sets contain input data for the LLM application. They may also include a reference output (i.e., expected output or ground truth), though this is optional.

You can create a test set in Agenta using the following methods:

- [By uploading a CSV or JSON file](#creating-a-test-set-from-a-csv-or-json)
- [Using the API](#creating-a-test-set-using-the-api)
- [Using the UI](#creatingediting-a-test-set-from-the-ui)
- [From the playground](#creating-a-test-set-from-the-playground)
- [From traces in observability](#creating-a-test-set-from-traces-in-observability)
- [From traces in observability](#adding-data-from-traces)

## Creating a Test Set from a CSV or JSON

To create a test set from a CSV or JSON file:

Go to "Test sets.", Click "Upload test sets.", Select either CSV or JSON.
1. Go to `Test sets`
2. Click `Upload test sets`
3. Select either `CSV` or `JSON`

<img src="/images/test-sets/upload_test_set.png" />

### CSV Format

We use CSV with "," as a separator and '"' as a quote character. The first row should contain the header with the column names. Each input name should have its column, and the correct answer should be under the "correct_answer" column. Here's an example of a valid CSV:
We use CSV with commas (,) as separators and double quotes (") as quote characters. The first row should contain the header with column names. Each input should have its own column. The column containing the reference answer can have any name, but we use "correct_answer" by default.

:::info
If you choose a different column name for the reference answer, you'll need to configure the evaluator later with that specific name.
:::

Here's an example of a valid CSV:

```csv
text,instruction,correct_answer
Hello,How are you?,I'm good.
"Tell me a joke.",Sure, here's one:...
```

## JSON Format
### JSON Format

The test set should be in JSON format with specific requirements:
The test set should be in JSON format with the following structure:

1. A JSON file with an array of rows.
2. Each row in the array should be an object with column header names as keys and row data as values. Here's an example of a valid JSON file:
1. A JSON file containing an array of objects.
2. Each object in the array represents a row, with keys as column headers and values as row data. Here's an example of a valid JSON file:

```json
[
Expand All @@ -44,9 +56,29 @@ The test set should be in JSON format with specific requirements:
]
```

### Schema for Chat Applications

For chat applications created using the chat template in Agenta, the input should be saved in the column called `chat`, which would contain the input list of messages:

```json
[
{ "content": "message.", "role": "user" },
{ "content": "message.", "role": "assistant" }
// Add more messages if necessary
]
```

The reference answer column (by default `correct_answer`) should follow the same format:

```json
{ "content": "message.", "role": "assistant" }
```

## Creating a Test Set Using the API

You can upload a test set using the Agenta API. Here's a high-level overview of how to do it:
You can upload a test set using our API. Find the [API endpoint reference here](/reference/api/upload-file).

Here's an example of such a call:

**HTTP Request:**

Expand All @@ -67,73 +99,48 @@ POST /testsets/{app_id}/
}
```

If you are using the API for the cloud, you should add Bearer: `your Agenta API key` in the request.

## Creating/Editing a Test Set from the UI

To create or edit a test set from the UI:

1. Go to "Test sets."
2. Choose "Create a test set with UI."
1. Go to `Test sets`
2. Choose `Create a test set with UI` or select the test set
3. Name your test set and specify the columns for input types.
4. Add the dataset.

Remember to click "Save test set."
Remember to click `Save test set`

Additional UI Features:

- **Add Rows**: For new data entries.
- **Rename Columns**: By clicking the pen icon above a column.
- **Add Columns**: Using the '+' sign in the table header.
<img src="/images/test-sets/add_test_set_ui.png" />

## Creating a Test Set from the Playground

Creating a test set is simple while experimenting with your application directly from the playground:
The playground offers a convenient way to create and add data to a test set. This workflow is useful if you want to build your test set ad hoc, each time you find an interesting input for the LLM app, you can immediately add these inputs to the test set and optionally set a reference answer.

To add a data point to a test set from the playground, simply click the `Add to test set` button located near the `Run` button.

A drawer will display the inputs and outputs from the playground. Here, you can modify inputs and correct answers if needed. Select an existing test set to add to, or choose `+Add new` to create a new one. Once you're satisfied, click `Add` to finalize.

1. Navigate to the Playground.
2. Enter an input and click "Run."
3. Click on 'Add to test set’.
:::warning
Currently, when adding a test point from the playground, the correct answer is always added to a column called `correct_answer`.
:::

:::warning
When adding a new data point, ensure that the column names in the test set match those of the LLM application. All columns from the playground (input columns and `correct_answer`) must exist in the test set. They will be created automatically if you're making a new test set. Any additional columns in the test set not available in the playground will be left empty.
:::

The inputs and outputs from the playground will be displayed in the drawer. You can modify inputs and correct answers if necessary. Select an existing test set to add to, or choose "+Add new" if needed, then click "Add."
<img src="/images/test-sets/add_test_set_playground.png" />

<img
className="dark:hidden"
src="/images/basic_guides/14_playground_drawer_light.png"
/>
<img
className="hidden dark:block"
src="/images/basic_guides/14_playground_drawer_dark.png"
/>
### Adding Chat History from the Playground

:::note
When adding chat history, you have the option to include all turns from the history. For example:
When adding chat history, you can choose to include all turns from the conversation. For example:

- User: Hi
- Assistant: Hi, how can I help you?
- User: I would like to book a table
- Assistant: Sure, for how many people?

If you select "Turn by Turn," two rows will be added to the test set: one for "Hi/Hi, how can I help you?" and another for "Hi/Hi, how can I help you?/I would like to book a table/Sure, for how many people?"
:::

## The Test Set Schema

A test set in Agenta should have specific columns based on the input names of your application. For example, if your application takes a text and instruction as input, the test set should have two columns: "text" and "instruction." Optionally, you can include the correct answer under the column name "correct_answer."

### Test Set Schema for Chat Applications

For chat applications, format the chat column in the inputs as a list of messages:

```json
[
{ "content": "message.", "role": "user" },
{ "content": "message.", "role": "assistant" }
// Add more messages if necessary
]
```

The "correct_answer" should follow a specific format as well:
## Adding Data From Traces

```json
{ "content": "message.", "role": "assistant" }
```
You can add any data logged to agenta to test sets. Simply navigate to observability, select the trace (or any span), then click on `Add to testset` or the `+` button.
Binary file not shown.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/static/images/test-sets/add_test_set_ui.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/static/images/test-sets/upload_test_set.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 5e3117d

Please sign in to comment.