Update docs with clarifications and notes (EricLBuehler#806)

* Add some notes * Update interactive mode * Update docs and remove from interactive mode
apepkuss · Sep 30, 2024 · e449543 · e449543
1 parent 296acbb
commit e449543
Show file tree

Hide file tree

Showing 8 changed files with 57 additions and 4 deletions.
diff --git a/docs/IDEFICS2.md b/docs/IDEFICS2.md
@@ -11,6 +11,12 @@ The Python and HTTP APIs support sending images as:
 
 The Rust API takes an image from the [image](https://docs.rs/image/latest/image/index.html) crate.
 
+## Interactive mode
+
+> [!NOTE]
+> In interactive mode, the Idefics 2 vision model does not automatically add the image token!
+> It should be added to messages manually, and is of the format `<image>`.
+
 ## HTTP server
 You can find this example [here](../examples/server/idefics2.py).
 
@@ -36,6 +42,10 @@ The image depicts a group of orange ants climbing over a black pole. The ants ar
 ---
 
 1) Start the server
+
+> [!NOTE]
+> You should replace `--features ...` with one of the features specified [here](../README.md#supported-accelerators), or remove it for pure CPU inference.
+
 ```
 cargo run --release --features ... -- --port 1234 --isq Q4K vision-plain -m HuggingFaceM4/idefics2-8b-chatty -a idefics2
 ```

diff --git a/docs/LLaVA.md b/docs/LLaVA.md
@@ -17,6 +17,12 @@ The Python and HTTP APIs support sending images as:
 
 The Rust API takes an image from the [image](https://docs.rs/image/latest/image/index.html) crate.
 
+## Interactive mode
+
+> [!NOTE]
+> In interactive mode, the LLaVA vision models do not automatically add the image token!
+> It should be added to messages manually, and is of the format `<image>`.
+
 ## HTTP server
 You can find this example [here](../examples/server/llava_next.py).
 
@@ -43,6 +49,11 @@ Text: The image shows a steep, snow-covered hillside with a pine tree on the rig
 ---
 
 1) Start the server
+
+
+> [!NOTE]
+> You should replace `--features ...` with one of the features specified [here](../README.md#supported-accelerators), or remove it for pure CPU inference.
+
 ```
 cargo run --release --features ... -- --port 1234 --isq Q4K vision-plain -m llava-hf/llava-v1.6-mistral-7b-hf -a llava_next
 //or 

diff --git a/docs/PHI3V.md b/docs/PHI3V.md
@@ -14,6 +14,10 @@ The Rust API takes an image from the [image](https://docs.rs/image/latest/image/
 > Note: when sending multiple images, they will be resized to the minimum dimension by which all will fit without cropping.
 > Aspect ratio is not preserved in that case.
 
+> [!NOTE]
+> The Phi 3 vision model does not automatically add the image tokens!
+> They should be added to messages manually, and are of the format `<|image_{N}|>` where N starts from 1.
+
 ## HTTP server
 You can find this example [here](../examples/server/phi3v.py).
 
@@ -42,6 +46,10 @@ The perspective from which this photo is taken offers an expansive view of the m
 ---
 
 1) Start the server
+
+> [!NOTE]
+> You should replace `--features ...` with one of the features specified [here](../README.md#supported-accelerators), or remove it for pure CPU inference.
+
 ```
 cargo run --release --features ... -- --port 1234 vision-plain -m microsoft/Phi-3.5-vision-instruct -a phi3v
 ```

diff --git a/docs/TOPOLOGY.md b/docs/TOPOLOGY.md
@@ -43,11 +43,19 @@ Note that:
 Model topologies may be applied to all model types.
 
 ## CLI example
+
+> [!NOTE]
+> You should replace `--features ...` with one of the features specified [here](../README.md#supported-accelerators), or remove it for pure CPU inference.
+
 ```
 cargo run --features ... -- -i plain -m microsoft/Phi-3-mini-128k-instruct -a phi3 --topology topologies/isq.yml   
 ```
 
 ## HTTP server example
+
+> [!NOTE]
+> You should replace `--features ...` with one of the features specified [here](../README.md#supported-accelerators), or remove it for pure CPU inference.
+
 ```
 cargo run --features ... -- --port 1234 plain -m microsoft/Phi-3-mini-128k-instruct -a phi3 --topology topologies/isq.yml   
 ```

diff --git a/docs/VLLAMA.md b/docs/VLLAMA.md
@@ -26,6 +26,10 @@ Mistral.rs supports interactive mode for vision models! It is an easy way to int
 https://github.com/user-attachments/assets/4d11c35c-9ea2-42b8-8cab-5f7e8e2ee9ff
 
 1) Start up interactive mode with the Llama 3.2 model
+
+> [!NOTE]
+> You should replace `--features ...` with one of the features specified [here](../README.md#supported-accelerators), or remove it for pure CPU inference.
+
 ```
 cargo run --features ... --release -- -i --isq Q4K vision-plain -m lamm-mit/Cephalo-Llama-3.2-11B-Vision-Instruct-128k -a vllama
 ```
@@ -37,6 +41,11 @@ How can I assist you today?
 ```
 
 3) Pass the model an image and ask a question.
+
+> [!NOTE]
+> In interactive mode, the Llama 3.2 vision models do not automatically add the image token!
+> It should be added to messages manually, and is of the formatt `<|image|>`.
+
 ```
 > Hello!
 How can I assist you today?
@@ -92,6 +101,10 @@ Overall, the image showcases the diverse geological and ecological features of M
 ---
 
 1) Start the server
+
+> [!NOTE]
+> You should replace `--features ...` with one of the features specified [here](../README.md#supported-accelerators), or remove it for pure CPU inference.
+
 ```
 cargo run --release --features ... -- --port 1234 --isq Q4K vision-plain -m lamm-mit/Cephalo-Llama-3.2-11B-Vision-Instruct-128k -a vllama
 ```
@@ -118,7 +131,7 @@ completion = client.chat.completions.create(
                 },
                 {
                     "type": "text",
-                    "text": "<|image|>What is shown in this image? Write a detailed response analyzing the scene.",
+                    "text": "What is shown in this image? Write a detailed response analyzing the scene.",
                 },
             ],
         },
@@ -218,7 +231,7 @@ res = runner.send_chat_completion_request(
                     },
                     {
                         "type": "text",
-                        "text": "<|image|>What is shown in this image? Write a detailed response analyzing the scene.",
+                        "text": "What is shown in this image? Write a detailed response analyzing the scene.",
                     },
                 ],
             }

diff --git a/examples/python/llama_vision.py b/examples/python/llama_vision.py
@@ -25,7 +25,7 @@
                     },
                     {
                         "type": "text",
-                        "text": "<|image|>What is shown in this image? Write a detailed response analyzing the scene.",
+                        "text": "What is shown in this image? Write a detailed response analyzing the scene.",
                     },
                 ],
             }

diff --git a/examples/server/llama_vision.py b/examples/server/llama_vision.py
@@ -49,7 +49,7 @@ def log_response(response: httpx.Response):
                 },
                 {
                     "type": "text",
-                    "text": "<|image|>What is shown in this image? Write a detailed response analyzing the scene.",
+                    "text": "What is shown in this image? Write a detailed response analyzing the scene.",
                 },
             ],
         },

diff --git a/mistralrs-bench/README.md b/mistralrs-bench/README.md
@@ -2,6 +2,9 @@
 
 This is our official benchmarking application, which allows you to collect structured information about the speed of `mistral.rs`.
 
+> [!NOTE]
+> You should replace `--features ...` with one of the features specified [here](../README.md#supported-accelerators), or remove it for pure CPU inference.
+
 To run: `cargo run --release --features ... --package mistralrs-bench`
 
 ```bash
-Original file line number
+Diff line change
@@ Expand Up / @@ -25,7 +25,7 @@ @@
                         },
                         {
                             "type": "text",
-                            "text": "<|image|>What is shown in this image? Write a detailed response analyzing the scene.",
+                            "text": "What is shown in this image? Write a detailed response analyzing the scene.",
                         },
                     ],
                 }
@@ Expand Down @@