Templating for prompts #4

kristiankielhofner · 2024-01-26T09:42:42Z

Related to #2 - potentially use a template engine to deal with prompt formats?

Important things (not necessarily in this order):

Flexibility. Support things like built-in templates for special tokens in various models that we can store/support in repo by name such as "mistral": see mistral example format. We can also extend/include support for transformers chat templating but that will require fetching the tokenizer config for the user configured model. See key "chat_template" for Mistral-Instruct as an example. In the event a user specifies a HF model we can attempt to "autoload" this but it brings up issues of fetching, caching, grabbing specific revisions, etc. Also allow users to extend this to support models they may have finetuned with specific/proprietary/goofy prompt/instruction formats.
Performance.
Rust support.
Use/familiarity/popularity.
Validation. Not all models support all ChatML/OpenAI roles. Ideally we would validate requests on input against the roles the model actually supports and return error if the user steps outside of these supported roles.

I'm guessing that for performance we can/should use something that supports compilation on startup? In the end what OpenAI clients will provide is often referred to as "ChatML". See the HF chat templating example for input above.

I believe this may be the most powerful, flexible, and performant way to support what is described in #2 for prompt handling and definition.

Replace .to_string() with String::from while at it. Both are used, but let's aim for consistency. Reported-by: Nick Bento <[email protected]> Fixes: c196362 ("Implement chat completion (#4)")

And use it in Triton chat completions and legacy completions. For Mistral-7B-Instruct-v0.2, here is an example template for chat completions. Put it in /etc/ai-router/templates/chat/mistral.j2: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }} {% endif -%} {% if message['role'] == 'user' -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endfor %} ``` And configure the prompt_format in /etc/ai-router.toml: ``` [models.chat_completions."Mistral-7B-Instruct-v0.2"] ... prompt_format = "mistral" ``` For legacy completions, a different template is needed, in /etc/ai-router/templates/completions/mistral.j2: ``` [INST] {% for message in messages -%} {{ message -}} {% endfor %} [/INST] ``` Closes: #4

And use it in Triton chat completions and legacy completions. To activate, configure a prompt_format for the chat_completions model: ``` [models.chat_completions."Mistral-7B-Instruct-v0.2"] ... prompt_format = "mistral" ``` This will look for templates in /etc/ai-router/templates. The template for chat completions should go in the chat subdirectory, and for legacy completions the template should go in the completions subdirectory. Example templates Mistral-7B-Instruct-v0.2 (exclude the ```): Chat, based on the template from the Hugging Face Hub, which only supports the user and assistant roles, to be placed in /etc/ai-router/templates/chat/mistral.j2: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }} {% endif -%} {% if message['role'] == 'user' -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endfor %} ``` Modified version of the above template that injects a system prompt before the first user prompt: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {% set mod = 0 -%} {% set system = '' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'system' and loop.index0 == 0) -%} {% set mod = 1 -%} {% set system = message['content'] %} {% else -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }} {% endif -%} {% if message['role'] == 'user' -%} {% if system and system | length > 0 and loop.index0 == 1 -%} {{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}} {% else -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% endif %} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endif -%} {% endfor -%} ``` Legacy completions do not support roles, so a much simpler template can be used, in /etc/ai-router/templates/completions/mistral.j2: ``` [INST] {% for message in messages -%} {{ message -}} {% endfor %} [/INST] ``` Closes: #4

And use it in Triton chat completions and legacy completions. To activate, configure a prompt_format for the chat_completions model: ``` [models.chat_completions."Mistral-7B-Instruct-v0.2"] ... prompt_format = "mistral" ``` This will look for templates in /etc/ai-router/templates. The template for chat completions should go in the chat subdirectory, and for legacy completions the template should go in the completions subdirectory. Example templates Mistral-7B-Instruct-v0.2 (exclude the ```): Chat, based on the template from the Hugging Face Hub, which only supports the user and assistant roles, to be placed in /etc/ai-router/templates/chat/mistral.j2: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }} {% endif -%} {% if message['role'] == 'user' -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endfor %} ``` Modified version of the above template that injects a system prompt before the first user prompt: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {% set mod = 0 -%} {% set system = '' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'system' and loop.index0 == 0) -%} {% set mod = 1 -%} {% set system = message['content'] %} {% else -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }} {% endif -%} {% if message['role'] == 'user' -%} {% if system and system | length > 0 and loop.index0 == 1 -%} {{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}} {% else -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% endif %} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endif -%} {% endfor -%} ``` Legacy completions do not support roles, so a much simpler template can be used, in /etc/ai-router/templates/completions/mistral.j2: ``` [INST] {% for message in messages -%} {{ message -}} {% endfor %} [/INST] ``` As we use chat_completion models in the config for both chat completions and legacy completions, configure a prompt_format for a model will require you to place a template file for both chat completions and legacy completions in the expected location. If one of them is missing, ai-router will not start. The error message should point out why: Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing If you wish to only enable chat completions for a model, and disable legacy completions, this can be done by simply raising an exception in the template: ``` {{ raise_exception('Legacy completions are disabled for this model') }} ``` Closes: #4

And use it in Triton chat completions and legacy completions. To activate, configure a prompt_format for the chat_completions model: ``` [models.chat_completions."Mistral-7B-Instruct-v0.2"] ... prompt_format = "mistral" ``` This will look for templates in /etc/ai-router/templates. The template for chat completions should go in the chat subdirectory, and for legacy completions the template should go in the completions subdirectory. Example templates Mistral-7B-Instruct-v0.2 (exclude the ```): Chat, based on the template from the Hugging Face Hub (https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/blob/main/tokenizer_config.json#L42), which only supports the user and assistant roles, to be placed in /etc/ai-router/templates/chat/mistral.j2: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }} {% endif -%} {% if message['role'] == 'user' -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endfor %} ``` Modified version of the above template that injects a system prompt before the first user prompt: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {% set mod = 0 -%} {% set system = '' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'system' and loop.index0 == 0) -%} {% set mod = 1 -%} {% set system = message['content'] %} {% else -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }} {% endif -%} {% if message['role'] == 'user' -%} {% if system and system | length > 0 and loop.index0 == 1 -%} {{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}} {% else -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% endif %} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endif -%} {% endfor -%} ``` Legacy completions do not support roles, so a much simpler template can be used, in /etc/ai-router/templates/completions/mistral.j2: ``` [INST] {% for message in messages -%} {{ message -}} {% endfor %} [/INST] ``` As we use chat_completion models in the config for both chat completions and legacy completions, configure a prompt_format for a model will require you to place a template file for both chat completions and legacy completions in the expected location. If one of them is missing, ai-router will not start. The error message should point out why: Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing If you wish to only enable chat completions for a model, and disable legacy completions, this can be done by simply raising an exception in the template: ``` {{ raise_exception('Legacy completions are disabled for this model') }} ``` Closes: #4

kristiankielhofner assigned stintel and kristiankielhofner Jan 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Templating for prompts #4

Templating for prompts #4

kristiankielhofner commented Jan 26, 2024

Templating for prompts #4

Templating for prompts #4

Comments

kristiankielhofner commented Jan 26, 2024