Skip to content

Commit

Permalink
Merge pull request #57 from dezoito/dev
Browse files Browse the repository at this point in the history
Repo: Adds TOC to README
  • Loading branch information
dezoito authored Nov 27, 2024
2 parents 0ab9a29 + f118a39 commit 18beb4c
Show file tree
Hide file tree
Showing 2 changed files with 161 additions and 17 deletions.
43 changes: 26 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,31 +1,40 @@
# Ollama Grid Search and A/B Testing Desktop App.
# Ollama Grid Search: Instantly Evaluate Multiple LLMs and Prompts.

A Rust based tool to evaluate LLM models, prompts and model params.

(Issues with Llama3? Please read [this](https://github.com/dezoito/ollama-grid-search/issues/8)).

## Purpose

This project automates the process of selecting the best models, prompts, or inference parameters for a given use-case, allowing you to iterate over their combinations and to visually inspect the results.

It assumes [Ollama](https://www.ollama.ai) is installed and serving endpoints, either in `localhost` or in a remote server.

## Quick Example

Here's a test for a simple prompt, tested on 2 models, using `0.7` and `1.0` as values for `temperature`:
Here's what an experiment for a simple prompt, tested on 3 different models, looks like:

[<img src="./screenshots/main.png?raw=true" alt="Main Screenshot" width="720">](./screenshots/main.png?raw=true)

(For a more in-depth look at an evaluation process assisted by this tool, please check https://dezoito.github.io/2023/12/27/rust-ollama-grid-search.html).


## Table of Contents

- [Installation](#installation)
- [Features](#features)
- [Grid Search Concept](#grid-search-or-something-similar)
- [A/B Testing](#ab-testing)
- [Prompt Archive](#prompt-archive)
- [Experiment Logs](#experiment-logs)
- [Future Features](#future-features)
- [Contributing](#contributing)
- [Development](#development)
- [Citations](#citations)
- [Acknowledgements](#thank-you)


## Installation

Check the [releases page](https://github.com/dezoito/ollama-grid-search/releases) for the project, or on the sidebar.

## Features

- Automatically fetches models from local or remote Ollama servers;
- Iterates over different models, prompts and parameters to generate inferences;
- Iterates over multiple different models, prompts and parameters to generate inferences;
- A/B test different prompts on several models simultaneously;
- Allows multiple iterations for each combination of parameters;
- Allows [limited concurrency](https://dezoito.github.io/2024/03/21/react-limited-concurrency.html) **or** synchronous inference calls (to prevent spamming servers);
Expand All @@ -36,9 +45,11 @@ Check the [releases page](https://github.com/dezoito/ollama-grid-search/releases
- Experiments can be inspected in readable views;
- Re-run past experiments, cloning or modifying the parameters used in the past;
- Configurable inference timeout;
- Custom default parameters and system prompts can be defined in settings:
- Custom default parameters and system prompts can be defined in settings
- Fully functional prompt database with examples;
- Prompts can be selected and "autocompleted" by typing "/" in the inputs


[<img src="./screenshots/settings.png?raw=true" alt="Settings" width="720">](./screenshots/settings.png?raw=true)

## Grid Search (or something similar...)

Expand All @@ -52,7 +63,6 @@ Lets define a selection of models, a prompt and some parameter combinations:

The prompt will be submitted once for each parameter **value**, for each one of the selected models, generating a set of responses.


## A/B Testing

Similarly, you can perform A/B tests by selecting different models and compare results for the same prompt/parameter combination, or test different prompts under similar configurations:
Expand All @@ -62,6 +72,7 @@ Similarly, you can perform A/B tests by selecting different models and compare r
<small>Comparing the results of different prompts for the same model</small>

## Prompt Archive

You can save and manage your prompts (we want to make prompts compatible with [Open WebUI](https://github.com/open-webui/open-webui))

[<img src="./screenshots/prompt-archive.png?raw=true" alt="Settings" width="720">](./screenshots/prompt-archive.png?raw=true)
Expand All @@ -70,8 +81,6 @@ You can **autocomplete** prompts by typing "/" (inspired by Open WebUI, as well)

[<img src="./screenshots/autocomplete.gif?raw=true" alt="A/B testing" width="720">](./screenshots/autocomplete.gif?raw=true)



## Experiment Logs

You can list, inspect, or download your experiments:
Expand All @@ -81,7 +90,7 @@ You can list, inspect, or download your experiments:
## Future Features

- Grading results and filtering by grade
- Importing, exporting and sharing prompt lists and experiment parameters.
- Importing, exporting and sharing prompt lists and experiment files.

## Contributing

Expand Down Expand Up @@ -113,7 +122,7 @@ cd ollama-grid-search

If you are running VS Code, add this to your `settings.json` file

```
```json
{
...
"rust-analyzer.check.command": "clippy",
Expand Down
135 changes: 135 additions & 0 deletions README.md.old
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
# Ollama Grid Search: Instantly Evaluate Multiple LLMs and Prompts.


This project automates the process of selecting the best models, prompts, or inference parameters for a given use-case, allowing you to iterate over their combinations and to visually inspect the results.

It assumes [Ollama](https://www.ollama.ai) is installed and serving endpoints, either in `localhost` or in a remote server.

Here's what an experiment for a simple prompt, tested on 3 different models, looks like:

[<img src="./screenshots/main.png?raw=true" alt="Main Screenshot" width="720">](./screenshots/main.png?raw=true)

(For a more in-depth look at an evaluation process assisted by this tool, please check https://dezoito.github.io/2023/12/27/rust-ollama-grid-search.html).

## Installation

Check the [releases page](https://github.com/dezoito/ollama-grid-search/releases) for the project, or on the sidebar.

## Features

- Automatically fetches models from local or remote Ollama servers;
- Iterates over different models, prompts and parameters to generate inferences;
- A/B test different prompts on several models simultaneously;
- Allows multiple iterations for each combination of parameters;
- Allows [limited concurrency](https://dezoito.github.io/2024/03/21/react-limited-concurrency.html) **or** synchronous inference calls (to prevent spamming servers);
- Optionally outputs inference parameters and response metadata (inference time, tokens and tokens/s);
- Refetching of individual inference calls;
- Model selection can be filtered by name;
- List experiments which can be downloaded in JSON format;
- Experiments can be inspected in readable views;
- Re-run past experiments, cloning or modifying the parameters used in the past;
- Configurable inference timeout;
- Custom default parameters and system prompts can be defined in settings:

[<img src="./screenshots/settings.png?raw=true" alt="Settings" width="720">](./screenshots/settings.png?raw=true)

## Grid Search (or something similar...)

Technically, the term "grid search" refers to iterating over a series of different model hyperparams to optimize model performance, but that usually means parameters like `batch_size`, `learning_rate`, or `number_of_epochs`, more commonly used in training.

But the concept here is similar:

Lets define a selection of models, a prompt and some parameter combinations:

[<img src="./screenshots/gridparams-animation.gif?raw=true" alt="gridparams" width="400">](./screenshots/gridparams-animation.gif?raw=true)

The prompt will be submitted once for each parameter **value**, for each one of the selected models, generating a set of responses.


## A/B Testing

Similarly, you can perform A/B tests by selecting different models and compare results for the same prompt/parameter combination, or test different prompts under similar configurations:

[<img src="./screenshots/ab-animation.gif?raw=true" alt="A/B testing" width="720">](./screenshots/ab-animation.gif?raw=true)

<small>Comparing the results of different prompts for the same model</small>

## Prompt Archive
You can save and manage your prompts (we want to make prompts compatible with [Open WebUI](https://github.com/open-webui/open-webui))

[<img src="./screenshots/prompt-archive.png?raw=true" alt="Settings" width="720">](./screenshots/prompt-archive.png?raw=true)

You can **autocomplete** prompts by typing "/" (inspired by Open WebUI, as well):

[<img src="./screenshots/autocomplete.gif?raw=true" alt="A/B testing" width="720">](./screenshots/autocomplete.gif?raw=true)



## Experiment Logs

You can list, inspect, or download your experiments:

[<img src="./screenshots/experiments.png?raw=true" alt="Settings" width="720">](./screenshots/experiments.png?raw=true)

## Future Features

- Grading results and filtering by grade
- Importing, exporting and sharing prompt lists and experiment parameters.

## Contributing

- For obvious bugs and spelling mistakes, please go ahead and submit a PR.

- If you want to propose a new feature, change existing functionality, or propose anything more complex, please open an issue for discussion, **before** getting work done on a PR.

## Development

1. Make sure you have Rust installed.

2. Clone the repository (or a fork)

```sh
git clone https://github.com/dezoito/ollama-grid-search.git
cd ollama-grid-search
```

3. Install the frontend dependencies.

```sh
cd <project root>
# I'm using bun to manage dependencies,
# but feel free to use yarn or npm
bun install
```

4. Make sure `rust-analyzer` is configured to run `Clippy` when checking code.

If you are running VS Code, add this to your `settings.json` file

```
{
...
"rust-analyzer.check.command": "clippy",
}
```

(or, better yet, just use the settings file provided with the code)

5. Run the app in development mode
```sh
cd <project root>/
bun tauri dev
```
6. Go grab a cup of coffee because this may take a while.

## Citations

The following works and theses have cited this repository:

Inouye, D & Lindo, L, & Lee, R & Allen, E; Computer Science and Engineering Senior Theses: **Applied Auto-tuning on LoRA Hyperparameters**
Santa Clara University, 2024
<https://scholarcommons.scu.edu/cgi/viewcontent.cgi?article=1271&context=cseng_senior>

## Thank you!

Huge thanks to [@FabianLars](https://github.com/FabianLars), [@peperroni21](https://github.com/pepperoni21) and [@TomReidNZ](https://github.com/TomReidNZ).

0 comments on commit 18beb4c

Please sign in to comment.