Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/main'
Browse files Browse the repository at this point in the history
  • Loading branch information
SyphonArch committed May 5, 2024
2 parents 2cca8a1 + d499c08 commit a55d67a
Showing 1 changed file with 4 additions and 1 deletion.
5 changes: 4 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -198,10 +198,13 @@ Other models can be used by changing the `model_path` and `original_model_path`
The demo script will load the quantized model, and perform inference on a custom prompt, using specified precisions.
Include 16 to measure the latency of the original model in fp16.
The latency at each precision will be measured and displayed.
Please note that this demonstration serves as a proof-of-concept.
Further optimizations in the inference pipeline are needed to achieve the best performance of our engine.

The demo will look like this when run properly:

TODO: Add demo GIF
![AnyPrec Latency Demo](https://github.com/SNU-ARC/any-precision-llm/assets/48833786/75a42bea-979a-489f-aee8-89697c55411a)


## Evaluation

Expand Down

0 comments on commit a55d67a

Please sign in to comment.