Skip to content

Latest commit

 

History

History
42 lines (35 loc) · 1.48 KB

README.md

File metadata and controls

42 lines (35 loc) · 1.48 KB

mistralrs-bench

This is our official benchmarking application, which allows you to collect structured information about the speed of mistral.rs.

Note

You should replace --features ... with one of the features specified here, or remove it for pure CPU inference.

To run: cargo run --release --features ... --package mistralrs-bench

Fast and easy LLM serving.

Usage: mistralrs-bench [OPTIONS] <COMMAND>

Commands:
  plain        Select a plain model
  x-lora       Select an X-LoRA architecture
  lora         Select a LoRA architecture
  gguf         Select a GGUF model
  x-lora-gguf  Select a GGUF model with X-LoRA
  lora-gguf    Select a GGUF model with LoRA
  ggml         Select a GGML model
  x-lora-ggml  Select a GGML model with X-LoRA
  lora-ggml    Select a GGML model with LoRA
  help         Print this message or the help of the given subcommand(s)

Options:
  -p, --n-prompt <N_PROMPT>
          Number of prompt tokens to run [default: 512]
  -g, --n-gen <N_GEN>
          Number of generations tokens to run [default: 128]
  -c, --concurrency <CONCURRENCY>
          Number of concurrent requests to run. Default is 1
  -r, --repetitions <REPETITIONS>
          Number of times to repeat each test [default: 5]
  -n, --num-device-layers <NUM_DEVICE_LAYERS>
          Number of device layers to load and run on the device. All others will be on the CPU
  -h, --help
          Print help
  -V, --version
          Print version