Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
ilil96 authored May 5, 2024
1 parent a55d67a commit 4b2c493
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -189,7 +189,7 @@ We have provided a demo script to showcase the dynamic inference latency of the
To run the demo, execute the following command:

```bash
python demo.py -p 3 4 5 6 16
python demo.py
```

Note that the demo script requires the quantized `Llama-2-7b-chat-hf` model to be present in the cache directory.
Expand All @@ -198,7 +198,7 @@ Other models can be used by changing the `model_path` and `original_model_path`
The demo script will load the quantized model, and perform inference on a custom prompt, using specified precisions.
Include 16 to measure the latency of the original model in fp16.
The latency at each precision will be measured and displayed.
Please note that this demonstration serves as a proof-of-concept.
Please note that this demo serves as a proof-of-concept.
Further optimizations in the inference pipeline are needed to achieve the best performance of our engine.

The demo will look like this when run properly:
Expand Down

0 comments on commit 4b2c493

Please sign in to comment.