Skip to content

Commit

Permalink
Update docs with clarifications and notes (EricLBuehler#806)
Browse files Browse the repository at this point in the history
* Add some notes

* Update interactive mode

* Update docs and remove from interactive mode
  • Loading branch information
EricLBuehler authored Sep 30, 2024
1 parent 296acbb commit e449543
Show file tree
Hide file tree
Showing 8 changed files with 57 additions and 4 deletions.
10 changes: 10 additions & 0 deletions docs/IDEFICS2.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,12 @@ The Python and HTTP APIs support sending images as:

The Rust API takes an image from the [image](https://docs.rs/image/latest/image/index.html) crate.

## Interactive mode

> [!NOTE]
> In interactive mode, the Idefics 2 vision model does not automatically add the image token!
> It should be added to messages manually, and is of the format `<image>`.
## HTTP server
You can find this example [here](../examples/server/idefics2.py).

Expand All @@ -36,6 +42,10 @@ The image depicts a group of orange ants climbing over a black pole. The ants ar
---

1) Start the server

> [!NOTE]
> You should replace `--features ...` with one of the features specified [here](../README.md#supported-accelerators), or remove it for pure CPU inference.
```
cargo run --release --features ... -- --port 1234 --isq Q4K vision-plain -m HuggingFaceM4/idefics2-8b-chatty -a idefics2
```
Expand Down
11 changes: 11 additions & 0 deletions docs/LLaVA.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,12 @@ The Python and HTTP APIs support sending images as:

The Rust API takes an image from the [image](https://docs.rs/image/latest/image/index.html) crate.

## Interactive mode

> [!NOTE]
> In interactive mode, the LLaVA vision models do not automatically add the image token!
> It should be added to messages manually, and is of the format `<image>`.
## HTTP server
You can find this example [here](../examples/server/llava_next.py).

Expand All @@ -43,6 +49,11 @@ Text: The image shows a steep, snow-covered hillside with a pine tree on the rig
---

1) Start the server


> [!NOTE]
> You should replace `--features ...` with one of the features specified [here](../README.md#supported-accelerators), or remove it for pure CPU inference.
```
cargo run --release --features ... -- --port 1234 --isq Q4K vision-plain -m llava-hf/llava-v1.6-mistral-7b-hf -a llava_next
//or
Expand Down
8 changes: 8 additions & 0 deletions docs/PHI3V.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,10 @@ The Rust API takes an image from the [image](https://docs.rs/image/latest/image/
> Note: when sending multiple images, they will be resized to the minimum dimension by which all will fit without cropping.
> Aspect ratio is not preserved in that case.
> [!NOTE]
> The Phi 3 vision model does not automatically add the image tokens!
> They should be added to messages manually, and are of the format `<|image_{N}|>` where N starts from 1.
## HTTP server
You can find this example [here](../examples/server/phi3v.py).

Expand Down Expand Up @@ -42,6 +46,10 @@ The perspective from which this photo is taken offers an expansive view of the m
---

1) Start the server

> [!NOTE]
> You should replace `--features ...` with one of the features specified [here](../README.md#supported-accelerators), or remove it for pure CPU inference.
```
cargo run --release --features ... -- --port 1234 vision-plain -m microsoft/Phi-3.5-vision-instruct -a phi3v
```
Expand Down
8 changes: 8 additions & 0 deletions docs/TOPOLOGY.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,11 +43,19 @@ Note that:
Model topologies may be applied to all model types.
## CLI example
> [!NOTE]
> You should replace `--features ...` with one of the features specified [here](../README.md#supported-accelerators), or remove it for pure CPU inference.

```
cargo run --features ... -- -i plain -m microsoft/Phi-3-mini-128k-instruct -a phi3 --topology topologies/isq.yml
```
## HTTP server example
> [!NOTE]
> You should replace `--features ...` with one of the features specified [here](../README.md#supported-accelerators), or remove it for pure CPU inference.
```
cargo run --features ... -- --port 1234 plain -m microsoft/Phi-3-mini-128k-instruct -a phi3 --topology topologies/isq.yml
```
Expand Down
17 changes: 15 additions & 2 deletions docs/VLLAMA.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,10 @@ Mistral.rs supports interactive mode for vision models! It is an easy way to int
https://github.com/user-attachments/assets/4d11c35c-9ea2-42b8-8cab-5f7e8e2ee9ff

1) Start up interactive mode with the Llama 3.2 model

> [!NOTE]
> You should replace `--features ...` with one of the features specified [here](../README.md#supported-accelerators), or remove it for pure CPU inference.
```
cargo run --features ... --release -- -i --isq Q4K vision-plain -m lamm-mit/Cephalo-Llama-3.2-11B-Vision-Instruct-128k -a vllama
```
Expand All @@ -37,6 +41,11 @@ How can I assist you today?
```

3) Pass the model an image and ask a question.

> [!NOTE]
> In interactive mode, the Llama 3.2 vision models do not automatically add the image token!
> It should be added to messages manually, and is of the formatt `<|image|>`.
```
> Hello!
How can I assist you today?
Expand Down Expand Up @@ -92,6 +101,10 @@ Overall, the image showcases the diverse geological and ecological features of M
---

1) Start the server

> [!NOTE]
> You should replace `--features ...` with one of the features specified [here](../README.md#supported-accelerators), or remove it for pure CPU inference.
```
cargo run --release --features ... -- --port 1234 --isq Q4K vision-plain -m lamm-mit/Cephalo-Llama-3.2-11B-Vision-Instruct-128k -a vllama
```
Expand All @@ -118,7 +131,7 @@ completion = client.chat.completions.create(
},
{
"type": "text",
"text": "<|image|>What is shown in this image? Write a detailed response analyzing the scene.",
"text": "What is shown in this image? Write a detailed response analyzing the scene.",
},
],
},
Expand Down Expand Up @@ -218,7 +231,7 @@ res = runner.send_chat_completion_request(
},
{
"type": "text",
"text": "<|image|>What is shown in this image? Write a detailed response analyzing the scene.",
"text": "What is shown in this image? Write a detailed response analyzing the scene.",
},
],
}
Expand Down
2 changes: 1 addition & 1 deletion examples/python/llama_vision.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
},
{
"type": "text",
"text": "<|image|>What is shown in this image? Write a detailed response analyzing the scene.",
"text": "What is shown in this image? Write a detailed response analyzing the scene.",
},
],
}
Expand Down
2 changes: 1 addition & 1 deletion examples/server/llama_vision.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ def log_response(response: httpx.Response):
},
{
"type": "text",
"text": "<|image|>What is shown in this image? Write a detailed response analyzing the scene.",
"text": "What is shown in this image? Write a detailed response analyzing the scene.",
},
],
},
Expand Down
3 changes: 3 additions & 0 deletions mistralrs-bench/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@

This is our official benchmarking application, which allows you to collect structured information about the speed of `mistral.rs`.

> [!NOTE]
> You should replace `--features ...` with one of the features specified [here](../README.md#supported-accelerators), or remove it for pure CPU inference.
To run: `cargo run --release --features ... --package mistralrs-bench`

```bash
Expand Down

0 comments on commit e449543

Please sign in to comment.