mistralrs
is a Python package which provides an API for mistral.rs
. We build mistralrs
with the maturin
build manager.
-
Install Rust: https://rustup.rs/
Example on Ubuntu:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh source $HOME/.cargo/env
-
mistralrs
depends on theopenssl
library.
Example on Ubuntu:
sudo apt install libssl-dev
sudo apt install pkg-config
- Install it!
-
CUDA
pip install mistralrs-cuda -v
-
Metal
pip install mistralrs-metal -v
-
Apple Accelerate
pip install mistralrs-accelerate -v
-
Intel MKL
pip install mistralrs-mkl -v
-
Without accelerators
pip install mistralrs -v
All installations will install the mistralrs
package. The suffix on the package installed by pip
only controls the feature activation.
-
Install required packages
openssl
(Example on Ubuntu:sudo apt install libssl-dev
)pkg-config
(Example on Ubuntu:sudo apt install pkg-config
)
-
Install Rust: https://rustup.rs/
Example on Ubuntu:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh source $HOME/.cargo/env
-
Set HF token correctly (skip if already set or your model is not gated, or if you want to use the
token_source
parameters in Python or the command line.)Example on Ubuntu:
mkdir ~/.cache/huggingface touch ~/.cache/huggingface/token echo <HF_TOKEN_HERE> > ~/.cache/huggingface/token
-
Download the code
git clone https://github.com/EricLBuehler/mistral.rs.git cd mistral.rs
-
cd
into the correct directory for buildingmistralrs
:cd mistralrs-pyo3
-
Install
maturin
, our Rust + Python build system: Maturin requires a Python virtual environment such asvenv
orconda
to be active. Themistralrs
package will be installed into that environment.pip install maturin[patchelf]
-
Install
mistralrs
Installmistralrs
by executing the following in this directory where features such ascuda
orflash-attn
may be specified with the--features
argument just like they would be forcargo run
.The base build command is:
maturin develop -r
- To build for CUDA:
maturin develop -r --features cuda
- To build for CUDA with flash attention:
maturin develop -r --features "cuda flash-attn"
- To build for Metal:
maturin develop -r --features metal
- To build for Accelerate:
maturin develop -r --features accelerate
- To build for MKL:
maturin develop -r --features mkl
Please find API docs here and the type stubs here, which are another great form of documentation.
We also provide a cookbook here!
from mistralrs import Runner, Which, ChatCompletionRequest
runner = Runner(
which=Which.GGUF(
tok_model_id="mistralai/Mistral-7B-Instruct-v0.1",
quantized_model_id="TheBloke/Mistral-7B-Instruct-v0.1-GGUF",
quantized_filename="mistral-7b-instruct-v0.1.Q4_K_M.gguf",
)
)
res = runner.send_chat_completion_request(
ChatCompletionRequest(
model="mistral",
messages=[
{"role": "user", "content": "Tell me a story about the Rust type system."}
],
max_tokens=256,
presence_penalty=1.0,
top_p=0.1,
temperature=0.1,
)
)
print(res.choices[0].message.content)
print(res.usage)