Simple extension on vLLM to help you speed up reasoning model without training.
Try our 🤖 Demo!
Use Dynasor:
# Install Dynasor
git clone https://github.com/hao-ai-lab/Dynasor.git
cd Dynasor && pip install . && cd -
# (Optional) Install and setup vllm endpoint
pip install vllm
vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-7B -tp 1 --enable-prefix-caching
# Start Dynasor Chat with an endpoint
dynasor-chat --base-url http://localhost:8000/v1
Dynasor is a tool that helps you speed up LLM reasoning model without training or finetuning. It uses a combination of techniques to improve the prompt, and dynamically execute the prompt, and stop when the LLM has enough information to make a decision.
git clone https://github.com/hao-ai-lab/Dynasor.git
cd Dynasor && pip install . && cd -
We provide 3 tools to launch Dynasor:
dynasor-chat
: CLI chat interface to interact with Dynasordynasor-openai
: OpenAI compatible server.dynasor-vllm
: vLLM-native server
Warning
We recommend enabling prefix caching, otherwise probing will be very slow.
- Setup a vLLM server
vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-7B -tp 1 --enable-prefix-caching
- Open Dynasor Chat in command line
dynasor-chat
- Setup a vLLM server
vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-7B -tp 1 --enable-prefix-caching
- Setup OpenAI compatible proxy server to server Dynasor
dynasor-openai
- Use our simiple client script to query:
# Sample Dynasor client script to ask some questions
python examples/client.py --prompt "2+2=?"
python examples/client.py --prompt "Solve x^2 + 4x = 4"
python examples/client.py --prompt "How many nonzero points are there on x^3y + y^3z + z^3x = 0 over the finite field 𝔽_{{5}^{18}} up to scaling?"
We build Dynasor on top of vLLM as a part of the vLLM OpenAI compatible server endpoint.
- Setup a dynasor-vllm server
dynasor-vllm --model deepseek-ai/DeepSeek-R1-Distill-Qwen-7B --enable-prefix-caching
- Use our simple client script to query:
python examples/client-vllm.py
To conduct the token deprivation experiment on the math500 dataset, first launch a vLLM server, then run the following command. Note that the current run.py
script processes only 10 questions. To obtain complete results, modify the --start and --end parameters for changing problem id and solve all problems in parallel!
bash benchmark/TokenDeprivation/run.sh
Run benchmark/TokenDeprivation/post_process.ipynb
to visualize the results
If you use Dynasor for your research, please cite our paper:
@article{fu2024efficiently,
title={Efficiently Serving LLM Reasoning Programs with Certaindex},
author={Fu, Yichao and Chen, Junda and Zhu, Siqi and Fu, Zheyu and Dai, Zhongdongming and Qiao, Aurick and Zhang, Hao},
journal={arXiv preprint arXiv:2412.20993},
year={2024}
}