forked from npuichigo/openai_trtllm
-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Templating for prompts #4
Comments
stintel
added a commit
that referenced
this issue
Feb 23, 2024
Replace .to_string() with String::from while at it. Both are used, but let's aim for consistency. Reported-by: Nick Bento <[email protected]> Fixes: c196362 ("Implement chat completion (#4)")
stintel
added a commit
that referenced
this issue
Mar 26, 2024
And use it in Triton chat completions and legacy completions. For Mistral-7B-Instruct-v0.2, here is an example template for chat completions. Put it in /etc/ai-router/templates/chat/mistral.j2: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }} {% endif -%} {% if message['role'] == 'user' -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endfor %} ``` And configure the prompt_format in /etc/ai-router.toml: ``` [models.chat_completions."Mistral-7B-Instruct-v0.2"] ... prompt_format = "mistral" ``` For legacy completions, a different template is needed, in /etc/ai-router/templates/completions/mistral.j2: ``` [INST] {% for message in messages -%} {{ message -}} {% endfor %} [/INST] ``` Closes: #4
stintel
added a commit
that referenced
this issue
Mar 26, 2024
And use it in Triton chat completions and legacy completions. For Mistral-7B-Instruct-v0.2, here is an example template for chat completions. Put it in /etc/ai-router/templates/chat/mistral.j2: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }} {% endif -%} {% if message['role'] == 'user' -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endfor %} ``` And configure the prompt_format in /etc/ai-router.toml: ``` [models.chat_completions."Mistral-7B-Instruct-v0.2"] ... prompt_format = "mistral" ``` For legacy completions, a different template is needed, in /etc/ai-router/templates/completions/mistral.j2: ``` [INST] {% for message in messages -%} {{ message -}} {% endfor %} [/INST] ``` Closes: #4
stintel
added a commit
that referenced
this issue
Mar 26, 2024
And use it in Triton chat completions and legacy completions. For Mistral-7B-Instruct-v0.2, here is an example template for chat completions. Put it in /etc/ai-router/templates/chat/mistral.j2: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }} {% endif -%} {% if message['role'] == 'user' -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endfor %} ``` And configure the prompt_format in /etc/ai-router.toml: ``` [models.chat_completions."Mistral-7B-Instruct-v0.2"] ... prompt_format = "mistral" ``` For legacy completions, a different template is needed, in /etc/ai-router/templates/completions/mistral.j2: ``` [INST] {% for message in messages -%} {{ message -}} {% endfor %} [/INST] ``` Closes: #4
stintel
added a commit
that referenced
this issue
Mar 26, 2024
And use it in Triton chat completions and legacy completions. For Mistral-7B-Instruct-v0.2, here is an example template for chat completions. Put it in /etc/ai-router/templates/chat/mistral.j2: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }} {% endif -%} {% if message['role'] == 'user' -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endfor %} ``` And configure the prompt_format in /etc/ai-router.toml: ``` [models.chat_completions."Mistral-7B-Instruct-v0.2"] ... prompt_format = "mistral" ``` For legacy completions, a different template is needed, in /etc/ai-router/templates/completions/mistral.j2: ``` [INST] {% for message in messages -%} {{ message -}} {% endfor %} [/INST] ``` Closes: #4
stintel
added a commit
that referenced
this issue
Mar 27, 2024
And use it in Triton chat completions and legacy completions. For Mistral-7B-Instruct-v0.2, here is an example template for chat completions. Put it in /etc/ai-router/templates/chat/mistral.j2: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }} {% endif -%} {% if message['role'] == 'user' -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endfor %} ``` And configure the prompt_format in /etc/ai-router.toml: ``` [models.chat_completions."Mistral-7B-Instruct-v0.2"] ... prompt_format = "mistral" ``` For legacy completions, a different template is needed, in /etc/ai-router/templates/completions/mistral.j2: ``` [INST] {% for message in messages -%} {{ message -}} {% endfor %} [/INST] ``` Closes: #4
stintel
added a commit
that referenced
this issue
Mar 28, 2024
And use it in Triton chat completions and legacy completions. To activate, configure a prompt_format for the chat_completions model: ``` [models.chat_completions."Mistral-7B-Instruct-v0.2"] ... prompt_format = "mistral" ``` This will look for templates in /etc/ai-router/templates. The template for chat completions should go in the chat subdirectory, and for legacy completions the template should go in the completions subdirectory. Example templates Mistral-7B-Instruct-v0.2 (exclude the ```): Chat, based on the template from the Hugging Face Hub, which only supports the user and assistant roles, to be placed in /etc/ai-router/templates/chat/mistral.j2: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }} {% endif -%} {% if message['role'] == 'user' -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endfor %} ``` Modified version of the above template that injects a system prompt before the first user prompt: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {% set mod = 0 -%} {% set system = '' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'system' and loop.index0 == 0) -%} {% set mod = 1 -%} {% set system = message['content'] %} {% else -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }} {% endif -%} {% if message['role'] == 'user' -%} {% if system and system | length > 0 and loop.index0 == 1 -%} {{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}} {% else -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% endif %} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endif -%} {% endfor -%} ``` Legacy completions do not support roles, so a much simpler template can be used, in /etc/ai-router/templates/completions/mistral.j2: ``` [INST] {% for message in messages -%} {{ message -}} {% endfor %} [/INST] ``` Closes: #4
stintel
added a commit
that referenced
this issue
Mar 28, 2024
And use it in Triton chat completions and legacy completions. To activate, configure a prompt_format for the chat_completions model: ``` [models.chat_completions."Mistral-7B-Instruct-v0.2"] ... prompt_format = "mistral" ``` This will look for templates in /etc/ai-router/templates. The template for chat completions should go in the chat subdirectory, and for legacy completions the template should go in the completions subdirectory. Example templates Mistral-7B-Instruct-v0.2 (exclude the ```): Chat, based on the template from the Hugging Face Hub, which only supports the user and assistant roles, to be placed in /etc/ai-router/templates/chat/mistral.j2: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }} {% endif -%} {% if message['role'] == 'user' -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endfor %} ``` Modified version of the above template that injects a system prompt before the first user prompt: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {% set mod = 0 -%} {% set system = '' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'system' and loop.index0 == 0) -%} {% set mod = 1 -%} {% set system = message['content'] %} {% else -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }} {% endif -%} {% if message['role'] == 'user' -%} {% if system and system | length > 0 and loop.index0 == 1 -%} {{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}} {% else -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% endif %} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endif -%} {% endfor -%} ``` Legacy completions do not support roles, so a much simpler template can be used, in /etc/ai-router/templates/completions/mistral.j2: ``` [INST] {% for message in messages -%} {{ message -}} {% endfor %} [/INST] ``` As we use chat_completion models in the config for both chat completions and legacy completions, configure a prompt_format for a model will require you to place a template file for both chat completions and legacy completions in the expected location. If one of them is missing, ai-router will not start. The error message should point out why: Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing If you wish to only enable chat completions for a model, and disable legacy completions, this can be done by simply raising an exception in the template: ``` {{ raise_exception('Legacy completions are disabled for this model') }} ``` Closes: #4
stintel
added a commit
that referenced
this issue
Mar 28, 2024
And use it in Triton chat completions and legacy completions. To activate, configure a prompt_format for the chat_completions model: ``` [models.chat_completions."Mistral-7B-Instruct-v0.2"] ... prompt_format = "mistral" ``` This will look for templates in /etc/ai-router/templates. The template for chat completions should go in the chat subdirectory, and for legacy completions the template should go in the completions subdirectory. Example templates Mistral-7B-Instruct-v0.2 (exclude the ```): Chat, based on the template from the Hugging Face Hub, which only supports the user and assistant roles, to be placed in /etc/ai-router/templates/chat/mistral.j2: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }} {% endif -%} {% if message['role'] == 'user' -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endfor %} ``` Modified version of the above template that injects a system prompt before the first user prompt: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {% set mod = 0 -%} {% set system = '' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'system' and loop.index0 == 0) -%} {% set mod = 1 -%} {% set system = message['content'] %} {% else -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }} {% endif -%} {% if message['role'] == 'user' -%} {% if system and system | length > 0 and loop.index0 == 1 -%} {{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}} {% else -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% endif %} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endif -%} {% endfor -%} ``` Legacy completions do not support roles, so a much simpler template can be used, in /etc/ai-router/templates/completions/mistral.j2: ``` [INST] {% for message in messages -%} {{ message -}} {% endfor %} [/INST] ``` As we use chat_completion models in the config for both chat completions and legacy completions, configure a prompt_format for a model will require you to place a template file for both chat completions and legacy completions in the expected location. If one of them is missing, ai-router will not start. The error message should point out why: Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing If you wish to only enable chat completions for a model, and disable legacy completions, this can be done by simply raising an exception in the template: ``` {{ raise_exception('Legacy completions are disabled for this model') }} ``` Closes: #4
stintel
added a commit
that referenced
this issue
Mar 29, 2024
And use it in Triton chat completions and legacy completions. To activate, configure a prompt_format for the chat_completions model: ``` [models.chat_completions."Mistral-7B-Instruct-v0.2"] ... prompt_format = "mistral" ``` This will look for templates in /etc/ai-router/templates. The template for chat completions should go in the chat subdirectory, and for legacy completions the template should go in the completions subdirectory. Example templates Mistral-7B-Instruct-v0.2 (exclude the ```): Chat, based on the template from the Hugging Face Hub, which only supports the user and assistant roles, to be placed in /etc/ai-router/templates/chat/mistral.j2: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }} {% endif -%} {% if message['role'] == 'user' -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endfor %} ``` Modified version of the above template that injects a system prompt before the first user prompt: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {% set mod = 0 -%} {% set system = '' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'system' and loop.index0 == 0) -%} {% set mod = 1 -%} {% set system = message['content'] %} {% else -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }} {% endif -%} {% if message['role'] == 'user' -%} {% if system and system | length > 0 and loop.index0 == 1 -%} {{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}} {% else -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% endif %} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endif -%} {% endfor -%} ``` Legacy completions do not support roles, so a much simpler template can be used, in /etc/ai-router/templates/completions/mistral.j2: ``` [INST] {% for message in messages -%} {{ message -}} {% endfor %} [/INST] ``` As we use chat_completion models in the config for both chat completions and legacy completions, configure a prompt_format for a model will require you to place a template file for both chat completions and legacy completions in the expected location. If one of them is missing, ai-router will not start. The error message should point out why: Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing If you wish to only enable chat completions for a model, and disable legacy completions, this can be done by simply raising an exception in the template: ``` {{ raise_exception('Legacy completions are disabled for this model') }} ``` Closes: #4
stintel
added a commit
that referenced
this issue
Mar 29, 2024
And use it in Triton chat completions and legacy completions. To activate, configure a prompt_format for the chat_completions model: ``` [models.chat_completions."Mistral-7B-Instruct-v0.2"] ... prompt_format = "mistral" ``` This will look for templates in /etc/ai-router/templates. The template for chat completions should go in the chat subdirectory, and for legacy completions the template should go in the completions subdirectory. Example templates Mistral-7B-Instruct-v0.2 (exclude the ```): Chat, based on the template from the Hugging Face Hub, which only supports the user and assistant roles, to be placed in /etc/ai-router/templates/chat/mistral.j2: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }} {% endif -%} {% if message['role'] == 'user' -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endfor %} ``` Modified version of the above template that injects a system prompt before the first user prompt: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {% set mod = 0 -%} {% set system = '' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'system' and loop.index0 == 0) -%} {% set mod = 1 -%} {% set system = message['content'] %} {% else -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }} {% endif -%} {% if message['role'] == 'user' -%} {% if system and system | length > 0 and loop.index0 == 1 -%} {{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}} {% else -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% endif %} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endif -%} {% endfor -%} ``` Legacy completions do not support roles, so a much simpler template can be used, in /etc/ai-router/templates/completions/mistral.j2: ``` [INST] {% for message in messages -%} {{ message -}} {% endfor %} [/INST] ``` As we use chat_completion models in the config for both chat completions and legacy completions, configure a prompt_format for a model will require you to place a template file for both chat completions and legacy completions in the expected location. If one of them is missing, ai-router will not start. The error message should point out why: Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing If you wish to only enable chat completions for a model, and disable legacy completions, this can be done by simply raising an exception in the template: ``` {{ raise_exception('Legacy completions are disabled for this model') }} ``` Closes: #4
stintel
added a commit
that referenced
this issue
Mar 29, 2024
And use it in Triton chat completions and legacy completions. To activate, configure a prompt_format for the chat_completions model: ``` [models.chat_completions."Mistral-7B-Instruct-v0.2"] ... prompt_format = "mistral" ``` This will look for templates in /etc/ai-router/templates. The template for chat completions should go in the chat subdirectory, and for legacy completions the template should go in the completions subdirectory. Example templates Mistral-7B-Instruct-v0.2 (exclude the ```): Chat, based on the template from the Hugging Face Hub, which only supports the user and assistant roles, to be placed in /etc/ai-router/templates/chat/mistral.j2: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }} {% endif -%} {% if message['role'] == 'user' -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endfor %} ``` Modified version of the above template that injects a system prompt before the first user prompt: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {% set mod = 0 -%} {% set system = '' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'system' and loop.index0 == 0) -%} {% set mod = 1 -%} {% set system = message['content'] %} {% else -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }} {% endif -%} {% if message['role'] == 'user' -%} {% if system and system | length > 0 and loop.index0 == 1 -%} {{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}} {% else -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% endif %} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endif -%} {% endfor -%} ``` Legacy completions do not support roles, so a much simpler template can be used, in /etc/ai-router/templates/completions/mistral.j2: ``` [INST] {% for message in messages -%} {{ message -}} {% endfor %} [/INST] ``` As we use chat_completion models in the config for both chat completions and legacy completions, configure a prompt_format for a model will require you to place a template file for both chat completions and legacy completions in the expected location. If one of them is missing, ai-router will not start. The error message should point out why: Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing If you wish to only enable chat completions for a model, and disable legacy completions, this can be done by simply raising an exception in the template: ``` {{ raise_exception('Legacy completions are disabled for this model') }} ``` Closes: #4
stintel
added a commit
that referenced
this issue
Mar 29, 2024
And use it in Triton chat completions and legacy completions. To activate, configure a prompt_format for the chat_completions model: ``` [models.chat_completions."Mistral-7B-Instruct-v0.2"] ... prompt_format = "mistral" ``` This will look for templates in /etc/ai-router/templates. The template for chat completions should go in the chat subdirectory, and for legacy completions the template should go in the completions subdirectory. Example templates Mistral-7B-Instruct-v0.2 (exclude the ```): Chat, based on the template from the Hugging Face Hub, which only supports the user and assistant roles, to be placed in /etc/ai-router/templates/chat/mistral.j2: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }} {% endif -%} {% if message['role'] == 'user' -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endfor %} ``` Modified version of the above template that injects a system prompt before the first user prompt: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {% set mod = 0 -%} {% set system = '' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'system' and loop.index0 == 0) -%} {% set mod = 1 -%} {% set system = message['content'] %} {% else -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }} {% endif -%} {% if message['role'] == 'user' -%} {% if system and system | length > 0 and loop.index0 == 1 -%} {{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}} {% else -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% endif %} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endif -%} {% endfor -%} ``` Legacy completions do not support roles, so a much simpler template can be used, in /etc/ai-router/templates/completions/mistral.j2: ``` [INST] {% for message in messages -%} {{ message -}} {% endfor %} [/INST] ``` As we use chat_completion models in the config for both chat completions and legacy completions, configure a prompt_format for a model will require you to place a template file for both chat completions and legacy completions in the expected location. If one of them is missing, ai-router will not start. The error message should point out why: Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing If you wish to only enable chat completions for a model, and disable legacy completions, this can be done by simply raising an exception in the template: ``` {{ raise_exception('Legacy completions are disabled for this model') }} ``` Closes: #4
stintel
added a commit
that referenced
this issue
Mar 29, 2024
And use it in Triton chat completions and legacy completions. To activate, configure a prompt_format for the chat_completions model: ``` [models.chat_completions."Mistral-7B-Instruct-v0.2"] ... prompt_format = "mistral" ``` This will look for templates in /etc/ai-router/templates. The template for chat completions should go in the chat subdirectory, and for legacy completions the template should go in the completions subdirectory. Example templates Mistral-7B-Instruct-v0.2 (exclude the ```): Chat, based on the template from the Hugging Face Hub, which only supports the user and assistant roles, to be placed in /etc/ai-router/templates/chat/mistral.j2: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }} {% endif -%} {% if message['role'] == 'user' -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endfor %} ``` Modified version of the above template that injects a system prompt before the first user prompt: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {% set mod = 0 -%} {% set system = '' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'system' and loop.index0 == 0) -%} {% set mod = 1 -%} {% set system = message['content'] %} {% else -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }} {% endif -%} {% if message['role'] == 'user' -%} {% if system and system | length > 0 and loop.index0 == 1 -%} {{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}} {% else -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% endif %} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endif -%} {% endfor -%} ``` Legacy completions do not support roles, so a much simpler template can be used, in /etc/ai-router/templates/completions/mistral.j2: ``` [INST] {% for message in messages -%} {{ message -}} {% endfor %} [/INST] ``` As we use chat_completion models in the config for both chat completions and legacy completions, configure a prompt_format for a model will require you to place a template file for both chat completions and legacy completions in the expected location. If one of them is missing, ai-router will not start. The error message should point out why: Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing If you wish to only enable chat completions for a model, and disable legacy completions, this can be done by simply raising an exception in the template: ``` {{ raise_exception('Legacy completions are disabled for this model') }} ``` Closes: #4
stintel
added a commit
that referenced
this issue
Mar 29, 2024
And use it in Triton chat completions and legacy completions. To activate, configure a prompt_format for the chat_completions model: ``` [models.chat_completions."Mistral-7B-Instruct-v0.2"] ... prompt_format = "mistral" ``` This will look for templates in /etc/ai-router/templates. The template for chat completions should go in the chat subdirectory, and for legacy completions the template should go in the completions subdirectory. Example templates Mistral-7B-Instruct-v0.2 (exclude the ```): Chat, based on the template from the Hugging Face Hub, which only supports the user and assistant roles, to be placed in /etc/ai-router/templates/chat/mistral.j2: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }} {% endif -%} {% if message['role'] == 'user' -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endfor %} ``` Modified version of the above template that injects a system prompt before the first user prompt: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {% set mod = 0 -%} {% set system = '' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'system' and loop.index0 == 0) -%} {% set mod = 1 -%} {% set system = message['content'] %} {% else -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }} {% endif -%} {% if message['role'] == 'user' -%} {% if system and system | length > 0 and loop.index0 == 1 -%} {{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}} {% else -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% endif %} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endif -%} {% endfor -%} ``` Legacy completions do not support roles, so a much simpler template can be used, in /etc/ai-router/templates/completions/mistral.j2: ``` [INST] {% for message in messages -%} {{ message -}} {% endfor %} [/INST] ``` As we use chat_completion models in the config for both chat completions and legacy completions, configure a prompt_format for a model will require you to place a template file for both chat completions and legacy completions in the expected location. If one of them is missing, ai-router will not start. The error message should point out why: Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing If you wish to only enable chat completions for a model, and disable legacy completions, this can be done by simply raising an exception in the template: ``` {{ raise_exception('Legacy completions are disabled for this model') }} ``` Closes: #4
stintel
added a commit
that referenced
this issue
Mar 31, 2024
And use it in Triton chat completions and legacy completions. To activate, configure a prompt_format for the chat_completions model: ``` [models.chat_completions."Mistral-7B-Instruct-v0.2"] ... prompt_format = "mistral" ``` This will look for templates in /etc/ai-router/templates. The template for chat completions should go in the chat subdirectory, and for legacy completions the template should go in the completions subdirectory. Example templates Mistral-7B-Instruct-v0.2 (exclude the ```): Chat, based on the template from the Hugging Face Hub, which only supports the user and assistant roles, to be placed in /etc/ai-router/templates/chat/mistral.j2: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }} {% endif -%} {% if message['role'] == 'user' -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endfor %} ``` Modified version of the above template that injects a system prompt before the first user prompt: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {% set mod = 0 -%} {% set system = '' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'system' and loop.index0 == 0) -%} {% set mod = 1 -%} {% set system = message['content'] %} {% else -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }} {% endif -%} {% if message['role'] == 'user' -%} {% if system and system | length > 0 and loop.index0 == 1 -%} {{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}} {% else -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% endif %} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endif -%} {% endfor -%} ``` Legacy completions do not support roles, so a much simpler template can be used, in /etc/ai-router/templates/completions/mistral.j2: ``` [INST] {% for message in messages -%} {{ message -}} {% endfor %} [/INST] ``` As we use chat_completion models in the config for both chat completions and legacy completions, configure a prompt_format for a model will require you to place a template file for both chat completions and legacy completions in the expected location. If one of them is missing, ai-router will not start. The error message should point out why: Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing If you wish to only enable chat completions for a model, and disable legacy completions, this can be done by simply raising an exception in the template: ``` {{ raise_exception('Legacy completions are disabled for this model') }} ``` Closes: #4
stintel
added a commit
that referenced
this issue
Mar 31, 2024
And use it in Triton chat completions and legacy completions. To activate, configure a prompt_format for the chat_completions model: ``` [models.chat_completions."Mistral-7B-Instruct-v0.2"] ... prompt_format = "mistral" ``` This will look for templates in /etc/ai-router/templates. The template for chat completions should go in the chat subdirectory, and for legacy completions the template should go in the completions subdirectory. Example templates Mistral-7B-Instruct-v0.2 (exclude the ```): Chat, based on the template from the Hugging Face Hub, which only supports the user and assistant roles, to be placed in /etc/ai-router/templates/chat/mistral.j2: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }} {% endif -%} {% if message['role'] == 'user' -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endfor %} ``` Modified version of the above template that injects a system prompt before the first user prompt: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {% set mod = 0 -%} {% set system = '' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'system' and loop.index0 == 0) -%} {% set mod = 1 -%} {% set system = message['content'] %} {% else -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }} {% endif -%} {% if message['role'] == 'user' -%} {% if system and system | length > 0 and loop.index0 == 1 -%} {{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}} {% else -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% endif %} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endif -%} {% endfor -%} ``` Legacy completions do not support roles, so a much simpler template can be used, in /etc/ai-router/templates/completions/mistral.j2: ``` [INST] {% for message in messages -%} {{ message -}} {% endfor %} [/INST] ``` As we use chat_completion models in the config for both chat completions and legacy completions, configure a prompt_format for a model will require you to place a template file for both chat completions and legacy completions in the expected location. If one of them is missing, ai-router will not start. The error message should point out why: Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing If you wish to only enable chat completions for a model, and disable legacy completions, this can be done by simply raising an exception in the template: ``` {{ raise_exception('Legacy completions are disabled for this model') }} ``` Closes: #4
stintel
added a commit
that referenced
this issue
Apr 16, 2024
And use it in Triton chat completions and legacy completions. To activate, configure a prompt_format for the chat_completions model: ``` [models.chat_completions."Mistral-7B-Instruct-v0.2"] ... prompt_format = "mistral" ``` This will look for templates in /etc/ai-router/templates. The template for chat completions should go in the chat subdirectory, and for legacy completions the template should go in the completions subdirectory. Example templates Mistral-7B-Instruct-v0.2 (exclude the ```): Chat, based on the template from the Hugging Face Hub, which only supports the user and assistant roles, to be placed in /etc/ai-router/templates/chat/mistral.j2: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }} {% endif -%} {% if message['role'] == 'user' -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endfor %} ``` Modified version of the above template that injects a system prompt before the first user prompt: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {% set mod = 0 -%} {% set system = '' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'system' and loop.index0 == 0) -%} {% set mod = 1 -%} {% set system = message['content'] %} {% else -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }} {% endif -%} {% if message['role'] == 'user' -%} {% if system and system | length > 0 and loop.index0 == 1 -%} {{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}} {% else -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% endif %} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endif -%} {% endfor -%} ``` Legacy completions do not support roles, so a much simpler template can be used, in /etc/ai-router/templates/completions/mistral.j2: ``` [INST] {% for message in messages -%} {{ message -}} {% endfor %} [/INST] ``` As we use chat_completion models in the config for both chat completions and legacy completions, configure a prompt_format for a model will require you to place a template file for both chat completions and legacy completions in the expected location. If one of them is missing, ai-router will not start. The error message should point out why: Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing If you wish to only enable chat completions for a model, and disable legacy completions, this can be done by simply raising an exception in the template: ``` {{ raise_exception('Legacy completions are disabled for this model') }} ``` Closes: #4
stintel
added a commit
that referenced
this issue
Apr 16, 2024
And use it in Triton chat completions and legacy completions. To activate, configure a prompt_format for the chat_completions model: ``` [models.chat_completions."Mistral-7B-Instruct-v0.2"] ... prompt_format = "mistral" ``` This will look for templates in /etc/ai-router/templates. The template for chat completions should go in the chat subdirectory, and for legacy completions the template should go in the completions subdirectory. Example templates Mistral-7B-Instruct-v0.2 (exclude the ```): Chat, based on the template from the Hugging Face Hub, which only supports the user and assistant roles, to be placed in /etc/ai-router/templates/chat/mistral.j2: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }} {% endif -%} {% if message['role'] == 'user' -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endfor %} ``` Modified version of the above template that injects a system prompt before the first user prompt: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {% set mod = 0 -%} {% set system = '' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'system' and loop.index0 == 0) -%} {% set mod = 1 -%} {% set system = message['content'] %} {% else -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }} {% endif -%} {% if message['role'] == 'user' -%} {% if system and system | length > 0 and loop.index0 == 1 -%} {{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}} {% else -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% endif %} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endif -%} {% endfor -%} ``` Legacy completions do not support roles, so a much simpler template can be used, in /etc/ai-router/templates/completions/mistral.j2: ``` [INST] {% for message in messages -%} {{ message -}} {% endfor %} [/INST] ``` As we use chat_completion models in the config for both chat completions and legacy completions, configure a prompt_format for a model will require you to place a template file for both chat completions and legacy completions in the expected location. If one of them is missing, ai-router will not start. The error message should point out why: Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing If you wish to only enable chat completions for a model, and disable legacy completions, this can be done by simply raising an exception in the template: ``` {{ raise_exception('Legacy completions are disabled for this model') }} ``` Closes: #4
stintel
added a commit
that referenced
this issue
Apr 17, 2024
And use it in Triton chat completions and legacy completions. To activate, configure a prompt_format for the chat_completions model: ``` [models.chat_completions."Mistral-7B-Instruct-v0.2"] ... prompt_format = "mistral" ``` This will look for templates in /etc/ai-router/templates. The template for chat completions should go in the chat subdirectory, and for legacy completions the template should go in the completions subdirectory. Example templates Mistral-7B-Instruct-v0.2 (exclude the ```): Chat, based on the template from the Hugging Face Hub, which only supports the user and assistant roles, to be placed in /etc/ai-router/templates/chat/mistral.j2: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }} {% endif -%} {% if message['role'] == 'user' -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endfor %} ``` Modified version of the above template that injects a system prompt before the first user prompt: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {% set mod = 0 -%} {% set system = '' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'system' and loop.index0 == 0) -%} {% set mod = 1 -%} {% set system = message['content'] %} {% else -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }} {% endif -%} {% if message['role'] == 'user' -%} {% if system and system | length > 0 and loop.index0 == 1 -%} {{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}} {% else -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% endif %} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endif -%} {% endfor -%} ``` Legacy completions do not support roles, so a much simpler template can be used, in /etc/ai-router/templates/completions/mistral.j2: ``` [INST] {% for message in messages -%} {{ message -}} {% endfor %} [/INST] ``` As we use chat_completion models in the config for both chat completions and legacy completions, configure a prompt_format for a model will require you to place a template file for both chat completions and legacy completions in the expected location. If one of them is missing, ai-router will not start. The error message should point out why: Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing If you wish to only enable chat completions for a model, and disable legacy completions, this can be done by simply raising an exception in the template: ``` {{ raise_exception('Legacy completions are disabled for this model') }} ``` Closes: #4
stintel
added a commit
that referenced
this issue
Apr 19, 2024
And use it in Triton chat completions and legacy completions. To activate, configure a prompt_format for the chat_completions model: ``` [models.chat_completions."Mistral-7B-Instruct-v0.2"] ... prompt_format = "mistral" ``` This will look for templates in /etc/ai-router/templates. The template for chat completions should go in the chat subdirectory, and for legacy completions the template should go in the completions subdirectory. Example templates Mistral-7B-Instruct-v0.2 (exclude the ```): Chat, based on the template from the Hugging Face Hub, which only supports the user and assistant roles, to be placed in /etc/ai-router/templates/chat/mistral.j2: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }} {% endif -%} {% if message['role'] == 'user' -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endfor %} ``` Modified version of the above template that injects a system prompt before the first user prompt: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {% set mod = 0 -%} {% set system = '' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'system' and loop.index0 == 0) -%} {% set mod = 1 -%} {% set system = message['content'] %} {% else -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }} {% endif -%} {% if message['role'] == 'user' -%} {% if system and system | length > 0 and loop.index0 == 1 -%} {{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}} {% else -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% endif %} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endif -%} {% endfor -%} ``` Legacy completions do not support roles, so a much simpler template can be used, in /etc/ai-router/templates/completions/mistral.j2: ``` [INST] {% for message in messages -%} {{ message -}} {% endfor %} [/INST] ``` As we use chat_completion models in the config for both chat completions and legacy completions, configure a prompt_format for a model will require you to place a template file for both chat completions and legacy completions in the expected location. If one of them is missing, ai-router will not start. The error message should point out why: Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing If you wish to only enable chat completions for a model, and disable legacy completions, this can be done by simply raising an exception in the template: ``` {{ raise_exception('Legacy completions are disabled for this model') }} ``` Closes: #4
stintel
added a commit
that referenced
this issue
Apr 19, 2024
And use it in Triton chat completions and legacy completions. To activate, configure a prompt_format for the chat_completions model: ``` [models.chat_completions."Mistral-7B-Instruct-v0.2"] ... prompt_format = "mistral" ``` This will look for templates in /etc/ai-router/templates. The template for chat completions should go in the chat subdirectory, and for legacy completions the template should go in the completions subdirectory. Example templates Mistral-7B-Instruct-v0.2 (exclude the ```): Chat, based on the template from the Hugging Face Hub, which only supports the user and assistant roles, to be placed in /etc/ai-router/templates/chat/mistral.j2: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }} {% endif -%} {% if message['role'] == 'user' -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endfor %} ``` Modified version of the above template that injects a system prompt before the first user prompt: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {% set mod = 0 -%} {% set system = '' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'system' and loop.index0 == 0) -%} {% set mod = 1 -%} {% set system = message['content'] %} {% else -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }} {% endif -%} {% if message['role'] == 'user' -%} {% if system and system | length > 0 and loop.index0 == 1 -%} {{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}} {% else -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% endif %} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endif -%} {% endfor -%} ``` Legacy completions do not support roles, so a much simpler template can be used, in /etc/ai-router/templates/completions/mistral.j2: ``` [INST] {% for message in messages -%} {{ message -}} {% endfor %} [/INST] ``` As we use chat_completion models in the config for both chat completions and legacy completions, configure a prompt_format for a model will require you to place a template file for both chat completions and legacy completions in the expected location. If one of them is missing, ai-router will not start. The error message should point out why: Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing If you wish to only enable chat completions for a model, and disable legacy completions, this can be done by simply raising an exception in the template: ``` {{ raise_exception('Legacy completions are disabled for this model') }} ``` Closes: #4
stintel
added a commit
that referenced
this issue
Apr 23, 2024
And use it in Triton chat completions and legacy completions. To activate, configure a prompt_format for the chat_completions model: ``` [models.chat_completions."Mistral-7B-Instruct-v0.2"] ... prompt_format = "mistral" ``` This will look for templates in /etc/ai-router/templates. The template for chat completions should go in the chat subdirectory, and for legacy completions the template should go in the completions subdirectory. Example templates Mistral-7B-Instruct-v0.2 (exclude the ```): Chat, based on the template from the Hugging Face Hub, which only supports the user and assistant roles, to be placed in /etc/ai-router/templates/chat/mistral.j2: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }} {% endif -%} {% if message['role'] == 'user' -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endfor %} ``` Modified version of the above template that injects a system prompt before the first user prompt: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {% set mod = 0 -%} {% set system = '' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'system' and loop.index0 == 0) -%} {% set mod = 1 -%} {% set system = message['content'] %} {% else -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }} {% endif -%} {% if message['role'] == 'user' -%} {% if system and system | length > 0 and loop.index0 == 1 -%} {{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}} {% else -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% endif %} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endif -%} {% endfor -%} ``` Legacy completions do not support roles, so a much simpler template can be used, in /etc/ai-router/templates/completions/mistral.j2: ``` [INST] {% for message in messages -%} {{ message -}} {% endfor %} [/INST] ``` As we use chat_completion models in the config for both chat completions and legacy completions, configure a prompt_format for a model will require you to place a template file for both chat completions and legacy completions in the expected location. If one of them is missing, ai-router will not start. The error message should point out why: Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing If you wish to only enable chat completions for a model, and disable legacy completions, this can be done by simply raising an exception in the template: ``` {{ raise_exception('Legacy completions are disabled for this model') }} ``` Closes: #4
stintel
added a commit
that referenced
this issue
Apr 24, 2024
And use it in Triton chat completions and legacy completions. To activate, configure a prompt_format for the chat_completions model: ``` [models.chat_completions."Mistral-7B-Instruct-v0.2"] ... prompt_format = "mistral" ``` This will look for templates in /etc/ai-router/templates. The template for chat completions should go in the chat subdirectory, and for legacy completions the template should go in the completions subdirectory. Example templates Mistral-7B-Instruct-v0.2 (exclude the ```): Chat, based on the template from the Hugging Face Hub, which only supports the user and assistant roles, to be placed in /etc/ai-router/templates/chat/mistral.j2: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }} {% endif -%} {% if message['role'] == 'user' -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endfor %} ``` Modified version of the above template that injects a system prompt before the first user prompt: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {% set mod = 0 -%} {% set system = '' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'system' and loop.index0 == 0) -%} {% set mod = 1 -%} {% set system = message['content'] %} {% else -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }} {% endif -%} {% if message['role'] == 'user' -%} {% if system and system | length > 0 and loop.index0 == 1 -%} {{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}} {% else -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% endif %} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endif -%} {% endfor -%} ``` Legacy completions do not support roles, so a much simpler template can be used, in /etc/ai-router/templates/completions/mistral.j2: ``` [INST] {% for message in messages -%} {{ message -}} {% endfor %} [/INST] ``` As we use chat_completion models in the config for both chat completions and legacy completions, configure a prompt_format for a model will require you to place a template file for both chat completions and legacy completions in the expected location. If one of them is missing, ai-router will not start. The error message should point out why: Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing If you wish to only enable chat completions for a model, and disable legacy completions, this can be done by simply raising an exception in the template: ``` {{ raise_exception('Legacy completions are disabled for this model') }} ``` Closes: #4
stintel
added a commit
that referenced
this issue
Apr 24, 2024
And use it in Triton chat completions and legacy completions. To activate, configure a prompt_format for the chat_completions model: ``` [models.chat_completions."Mistral-7B-Instruct-v0.2"] ... prompt_format = "mistral" ``` This will look for templates in /etc/ai-router/templates. The template for chat completions should go in the chat subdirectory, and for legacy completions the template should go in the completions subdirectory. Example templates Mistral-7B-Instruct-v0.2 (exclude the ```): Chat, based on the template from the Hugging Face Hub, which only supports the user and assistant roles, to be placed in /etc/ai-router/templates/chat/mistral.j2: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }} {% endif -%} {% if message['role'] == 'user' -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endfor %} ``` Modified version of the above template that injects a system prompt before the first user prompt: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {% set mod = 0 -%} {% set system = '' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'system' and loop.index0 == 0) -%} {% set mod = 1 -%} {% set system = message['content'] %} {% else -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }} {% endif -%} {% if message['role'] == 'user' -%} {% if system and system | length > 0 and loop.index0 == 1 -%} {{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}} {% else -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% endif %} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endif -%} {% endfor -%} ``` Legacy completions do not support roles, so a much simpler template can be used, in /etc/ai-router/templates/completions/mistral.j2: ``` [INST] {% for message in messages -%} {{ message -}} {% endfor %} [/INST] ``` As we use chat_completion models in the config for both chat completions and legacy completions, configure a prompt_format for a model will require you to place a template file for both chat completions and legacy completions in the expected location. If one of them is missing, ai-router will not start. The error message should point out why: Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing If you wish to only enable chat completions for a model, and disable legacy completions, this can be done by simply raising an exception in the template: ``` {{ raise_exception('Legacy completions are disabled for this model') }} ``` Closes: #4
stintel
added a commit
that referenced
this issue
Apr 24, 2024
And use it in Triton chat completions and legacy completions. To activate, configure a prompt_format for the chat_completions model: ``` [models.chat_completions."Mistral-7B-Instruct-v0.2"] ... prompt_format = "mistral" ``` This will look for templates in /etc/ai-router/templates. The template for chat completions should go in the chat subdirectory, and for legacy completions the template should go in the completions subdirectory. Example templates Mistral-7B-Instruct-v0.2 (exclude the ```): Chat, based on the template from the Hugging Face Hub, which only supports the user and assistant roles, to be placed in /etc/ai-router/templates/chat/mistral.j2: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }} {% endif -%} {% if message['role'] == 'user' -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endfor %} ``` Modified version of the above template that injects a system prompt before the first user prompt: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {% set mod = 0 -%} {% set system = '' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'system' and loop.index0 == 0) -%} {% set mod = 1 -%} {% set system = message['content'] %} {% else -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }} {% endif -%} {% if message['role'] == 'user' -%} {% if system and system | length > 0 and loop.index0 == 1 -%} {{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}} {% else -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% endif %} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endif -%} {% endfor -%} ``` Legacy completions do not support roles, so a much simpler template can be used, in /etc/ai-router/templates/completions/mistral.j2: ``` [INST] {% for message in messages -%} {{ message -}} {% endfor %} [/INST] ``` As we use chat_completion models in the config for both chat completions and legacy completions, configure a prompt_format for a model will require you to place a template file for both chat completions and legacy completions in the expected location. If one of them is missing, ai-router will not start. The error message should point out why: Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing If you wish to only enable chat completions for a model, and disable legacy completions, this can be done by simply raising an exception in the template: ``` {{ raise_exception('Legacy completions are disabled for this model') }} ``` Closes: #4
stintel
added a commit
that referenced
this issue
Apr 24, 2024
And use it in Triton chat completions and legacy completions. To activate, configure a prompt_format for the chat_completions model: ``` [models.chat_completions."Mistral-7B-Instruct-v0.2"] ... prompt_format = "mistral" ``` This will look for templates in /etc/ai-router/templates. The template for chat completions should go in the chat subdirectory, and for legacy completions the template should go in the completions subdirectory. Example templates Mistral-7B-Instruct-v0.2 (exclude the ```): Chat, based on the template from the Hugging Face Hub, which only supports the user and assistant roles, to be placed in /etc/ai-router/templates/chat/mistral.j2: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }} {% endif -%} {% if message['role'] == 'user' -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endfor %} ``` Modified version of the above template that injects a system prompt before the first user prompt: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {% set mod = 0 -%} {% set system = '' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'system' and loop.index0 == 0) -%} {% set mod = 1 -%} {% set system = message['content'] %} {% else -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }} {% endif -%} {% if message['role'] == 'user' -%} {% if system and system | length > 0 and loop.index0 == 1 -%} {{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}} {% else -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% endif %} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endif -%} {% endfor -%} ``` Legacy completions do not support roles, so a much simpler template can be used, in /etc/ai-router/templates/completions/mistral.j2: ``` [INST] {% for message in messages -%} {{ message -}} {% endfor %} [/INST] ``` As we use chat_completion models in the config for both chat completions and legacy completions, configure a prompt_format for a model will require you to place a template file for both chat completions and legacy completions in the expected location. If one of them is missing, ai-router will not start. The error message should point out why: Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing If you wish to only enable chat completions for a model, and disable legacy completions, this can be done by simply raising an exception in the template: ``` {{ raise_exception('Legacy completions are disabled for this model') }} ``` Closes: #4
stintel
added a commit
that referenced
this issue
Apr 24, 2024
And use it in Triton chat completions and legacy completions. To activate, configure a prompt_format for the chat_completions model: ``` [models.chat_completions."Mistral-7B-Instruct-v0.2"] ... prompt_format = "mistral" ``` This will look for templates in /etc/ai-router/templates. The template for chat completions should go in the chat subdirectory, and for legacy completions the template should go in the completions subdirectory. Example templates Mistral-7B-Instruct-v0.2 (exclude the ```): Chat, based on the template from the Hugging Face Hub, which only supports the user and assistant roles, to be placed in /etc/ai-router/templates/chat/mistral.j2: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }} {% endif -%} {% if message['role'] == 'user' -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endfor %} ``` Modified version of the above template that injects a system prompt before the first user prompt: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {% set mod = 0 -%} {% set system = '' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'system' and loop.index0 == 0) -%} {% set mod = 1 -%} {% set system = message['content'] %} {% else -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }} {% endif -%} {% if message['role'] == 'user' -%} {% if system and system | length > 0 and loop.index0 == 1 -%} {{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}} {% else -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% endif %} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endif -%} {% endfor -%} ``` Legacy completions do not support roles, so a much simpler template can be used, in /etc/ai-router/templates/completions/mistral.j2: ``` [INST] {% for message in messages -%} {{ message -}} {% endfor %} [/INST] ``` As we use chat_completion models in the config for both chat completions and legacy completions, configure a prompt_format for a model will require you to place a template file for both chat completions and legacy completions in the expected location. If one of them is missing, ai-router will not start. The error message should point out why: Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing If you wish to only enable chat completions for a model, and disable legacy completions, this can be done by simply raising an exception in the template: ``` {{ raise_exception('Legacy completions are disabled for this model') }} ``` Closes: #4
stintel
added a commit
that referenced
this issue
Apr 24, 2024
And use it in Triton chat completions and legacy completions. To activate, configure a prompt_format for the chat_completions model: ``` [models.chat_completions."Mistral-7B-Instruct-v0.2"] ... prompt_format = "mistral" ``` This will look for templates in /etc/ai-router/templates. The template for chat completions should go in the chat subdirectory, and for legacy completions the template should go in the completions subdirectory. Example templates Mistral-7B-Instruct-v0.2 (exclude the ```): Chat, based on the template from the Hugging Face Hub, which only supports the user and assistant roles, to be placed in /etc/ai-router/templates/chat/mistral.j2: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }} {% endif -%} {% if message['role'] == 'user' -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endfor %} ``` Modified version of the above template that injects a system prompt before the first user prompt: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {% set mod = 0 -%} {% set system = '' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'system' and loop.index0 == 0) -%} {% set mod = 1 -%} {% set system = message['content'] %} {% else -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }} {% endif -%} {% if message['role'] == 'user' -%} {% if system and system | length > 0 and loop.index0 == 1 -%} {{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}} {% else -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% endif %} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endif -%} {% endfor -%} ``` Legacy completions do not support roles, so a much simpler template can be used, in /etc/ai-router/templates/completions/mistral.j2: ``` [INST] {% for message in messages -%} {{ message -}} {% endfor %} [/INST] ``` As we use chat_completion models in the config for both chat completions and legacy completions, configure a prompt_format for a model will require you to place a template file for both chat completions and legacy completions in the expected location. If one of them is missing, ai-router will not start. The error message should point out why: Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing If you wish to only enable chat completions for a model, and disable legacy completions, this can be done by simply raising an exception in the template: ``` {{ raise_exception('Legacy completions are disabled for this model') }} ``` Closes: #4
stintel
added a commit
that referenced
this issue
Apr 25, 2024
And use it in Triton chat completions and legacy completions. To activate, configure a prompt_format for the chat_completions model: ``` [models.chat_completions."Mistral-7B-Instruct-v0.2"] ... prompt_format = "mistral" ``` This will look for templates in /etc/ai-router/templates. The template for chat completions should go in the chat subdirectory, and for legacy completions the template should go in the completions subdirectory. Example templates Mistral-7B-Instruct-v0.2 (exclude the ```): Chat, based on the template from the Hugging Face Hub, which only supports the user and assistant roles, to be placed in /etc/ai-router/templates/chat/mistral.j2: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }} {% endif -%} {% if message['role'] == 'user' -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endfor %} ``` Modified version of the above template that injects a system prompt before the first user prompt: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {% set mod = 0 -%} {% set system = '' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'system' and loop.index0 == 0) -%} {% set mod = 1 -%} {% set system = message['content'] %} {% else -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }} {% endif -%} {% if message['role'] == 'user' -%} {% if system and system | length > 0 and loop.index0 == 1 -%} {{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}} {% else -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% endif %} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endif -%} {% endfor -%} ``` Legacy completions do not support roles, so a much simpler template can be used, in /etc/ai-router/templates/completions/mistral.j2: ``` [INST] {% for message in messages -%} {{ message -}} {% endfor %} [/INST] ``` As we use chat_completion models in the config for both chat completions and legacy completions, configure a prompt_format for a model will require you to place a template file for both chat completions and legacy completions in the expected location. If one of them is missing, ai-router will not start. The error message should point out why: Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing If you wish to only enable chat completions for a model, and disable legacy completions, this can be done by simply raising an exception in the template: ``` {{ raise_exception('Legacy completions are disabled for this model') }} ``` Closes: #4
stintel
added a commit
that referenced
this issue
Apr 25, 2024
And use it in Triton chat completions and legacy completions. To activate, configure a prompt_format for the chat_completions model: ``` [models.chat_completions."Mistral-7B-Instruct-v0.2"] ... prompt_format = "mistral" ``` This will look for templates in /etc/ai-router/templates. The template for chat completions should go in the chat subdirectory, and for legacy completions the template should go in the completions subdirectory. Example templates Mistral-7B-Instruct-v0.2 (exclude the ```): Chat, based on the template from the Hugging Face Hub, which only supports the user and assistant roles, to be placed in /etc/ai-router/templates/chat/mistral.j2: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }} {% endif -%} {% if message['role'] == 'user' -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endfor %} ``` Modified version of the above template that injects a system prompt before the first user prompt: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {% set mod = 0 -%} {% set system = '' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'system' and loop.index0 == 0) -%} {% set mod = 1 -%} {% set system = message['content'] %} {% else -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }} {% endif -%} {% if message['role'] == 'user' -%} {% if system and system | length > 0 and loop.index0 == 1 -%} {{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}} {% else -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% endif %} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endif -%} {% endfor -%} ``` Legacy completions do not support roles, so a much simpler template can be used, in /etc/ai-router/templates/completions/mistral.j2: ``` [INST] {% for message in messages -%} {{ message -}} {% endfor %} [/INST] ``` As we use chat_completion models in the config for both chat completions and legacy completions, configure a prompt_format for a model will require you to place a template file for both chat completions and legacy completions in the expected location. If one of them is missing, ai-router will not start. The error message should point out why: Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing If you wish to only enable chat completions for a model, and disable legacy completions, this can be done by simply raising an exception in the template: ``` {{ raise_exception('Legacy completions are disabled for this model') }} ``` Closes: #4
stintel
added a commit
that referenced
this issue
Apr 25, 2024
And use it in Triton chat completions and legacy completions. To activate, configure a prompt_format for the chat_completions model: ``` [models.chat_completions."Mistral-7B-Instruct-v0.2"] ... prompt_format = "mistral" ``` This will look for templates in /etc/ai-router/templates. The template for chat completions should go in the chat subdirectory, and for legacy completions the template should go in the completions subdirectory. Example templates Mistral-7B-Instruct-v0.2 (exclude the ```): Chat, based on the template from the Hugging Face Hub, which only supports the user and assistant roles, to be placed in /etc/ai-router/templates/chat/mistral.j2: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }} {% endif -%} {% if message['role'] == 'user' -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endfor %} ``` Modified version of the above template that injects a system prompt before the first user prompt: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {% set mod = 0 -%} {% set system = '' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'system' and loop.index0 == 0) -%} {% set mod = 1 -%} {% set system = message['content'] %} {% else -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }} {% endif -%} {% if message['role'] == 'user' -%} {% if system and system | length > 0 and loop.index0 == 1 -%} {{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}} {% else -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% endif %} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endif -%} {% endfor -%} ``` Legacy completions do not support roles, so a much simpler template can be used, in /etc/ai-router/templates/completions/mistral.j2: ``` [INST] {% for message in messages -%} {{ message -}} {% endfor %} [/INST] ``` As we use chat_completion models in the config for both chat completions and legacy completions, configure a prompt_format for a model will require you to place a template file for both chat completions and legacy completions in the expected location. If one of them is missing, ai-router will not start. The error message should point out why: Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing If you wish to only enable chat completions for a model, and disable legacy completions, this can be done by simply raising an exception in the template: ``` {{ raise_exception('Legacy completions are disabled for this model') }} ``` Closes: #4
stintel
added a commit
that referenced
this issue
Apr 25, 2024
And use it in Triton chat completions and legacy completions. To activate, configure a prompt_format for the chat_completions model: ``` [models.chat_completions."Mistral-7B-Instruct-v0.2"] ... prompt_format = "mistral" ``` This will look for templates in /etc/ai-router/templates. The template for chat completions should go in the chat subdirectory, and for legacy completions the template should go in the completions subdirectory. Example templates Mistral-7B-Instruct-v0.2 (exclude the ```): Chat, based on the template from the Hugging Face Hub (https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/blob/main/tokenizer_config.json#L42), which only supports the user and assistant roles, to be placed in /etc/ai-router/templates/chat/mistral.j2: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }} {% endif -%} {% if message['role'] == 'user' -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endfor %} ``` Modified version of the above template that injects a system prompt before the first user prompt: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {% set mod = 0 -%} {% set system = '' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'system' and loop.index0 == 0) -%} {% set mod = 1 -%} {% set system = message['content'] %} {% else -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }} {% endif -%} {% if message['role'] == 'user' -%} {% if system and system | length > 0 and loop.index0 == 1 -%} {{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}} {% else -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% endif %} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endif -%} {% endfor -%} ``` Legacy completions do not support roles, so a much simpler template can be used, in /etc/ai-router/templates/completions/mistral.j2: ``` [INST] {% for message in messages -%} {{ message -}} {% endfor %} [/INST] ``` As we use chat_completion models in the config for both chat completions and legacy completions, configure a prompt_format for a model will require you to place a template file for both chat completions and legacy completions in the expected location. If one of them is missing, ai-router will not start. The error message should point out why: Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing If you wish to only enable chat completions for a model, and disable legacy completions, this can be done by simply raising an exception in the template: ``` {{ raise_exception('Legacy completions are disabled for this model') }} ``` Closes: #4
stintel
added a commit
that referenced
this issue
Apr 25, 2024
And use it in Triton chat completions and legacy completions. To activate, configure a prompt_format for the chat_completions model: ``` [models.chat_completions."Mistral-7B-Instruct-v0.2"] ... prompt_format = "mistral" ``` This will look for templates in /etc/ai-router/templates. The template for chat completions should go in the chat subdirectory, and for legacy completions the template should go in the completions subdirectory. Example templates Mistral-7B-Instruct-v0.2 (exclude the ```): Chat, based on the template from the Hugging Face Hub (https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/blob/main/tokenizer_config.json#L42), which only supports the user and assistant roles, to be placed in /etc/ai-router/templates/chat/mistral.j2: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }} {% endif -%} {% if message['role'] == 'user' -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endfor %} ``` Modified version of the above template that injects a system prompt before the first user prompt: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {% set mod = 0 -%} {% set system = '' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'system' and loop.index0 == 0) -%} {% set mod = 1 -%} {% set system = message['content'] %} {% else -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }} {% endif -%} {% if message['role'] == 'user' -%} {% if system and system | length > 0 and loop.index0 == 1 -%} {{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}} {% else -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% endif %} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endif -%} {% endfor -%} ``` Legacy completions do not support roles, so a much simpler template can be used, in /etc/ai-router/templates/completions/mistral.j2: ``` [INST] {% for message in messages -%} {{ message -}} {% endfor %} [/INST] ``` As we use chat_completion models in the config for both chat completions and legacy completions, configure a prompt_format for a model will require you to place a template file for both chat completions and legacy completions in the expected location. If one of them is missing, ai-router will not start. The error message should point out why: Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing If you wish to only enable chat completions for a model, and disable legacy completions, this can be done by simply raising an exception in the template: ``` {{ raise_exception('Legacy completions are disabled for this model') }} ``` Closes: #4
stintel
added a commit
that referenced
this issue
Apr 26, 2024
And use it in Triton chat completions and legacy completions. To activate, configure a prompt_format for the chat_completions model: ``` [models.chat_completions."Mistral-7B-Instruct-v0.2"] ... prompt_format = "mistral" ``` This will look for templates in /etc/ai-router/templates. The template for chat completions should go in the chat subdirectory, and for legacy completions the template should go in the completions subdirectory. Example templates Mistral-7B-Instruct-v0.2 (exclude the ```): Chat, based on the template from the Hugging Face Hub (https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/blob/main/tokenizer_config.json#L42), which only supports the user and assistant roles, to be placed in /etc/ai-router/templates/chat/mistral.j2: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }} {% endif -%} {% if message['role'] == 'user' -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endfor %} ``` Modified version of the above template that injects a system prompt before the first user prompt: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {% set mod = 0 -%} {% set system = '' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'system' and loop.index0 == 0) -%} {% set mod = 1 -%} {% set system = message['content'] %} {% else -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }} {% endif -%} {% if message['role'] == 'user' -%} {% if system and system | length > 0 and loop.index0 == 1 -%} {{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}} {% else -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% endif %} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endif -%} {% endfor -%} ``` Legacy completions do not support roles, so a much simpler template can be used, in /etc/ai-router/templates/completions/mistral.j2: ``` [INST] {% for message in messages -%} {{ message -}} {% endfor %} [/INST] ``` As we use chat_completion models in the config for both chat completions and legacy completions, configure a prompt_format for a model will require you to place a template file for both chat completions and legacy completions in the expected location. If one of them is missing, ai-router will not start. The error message should point out why: Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing If you wish to only enable chat completions for a model, and disable legacy completions, this can be done by simply raising an exception in the template: ``` {{ raise_exception('Legacy completions are disabled for this model') }} ``` Closes: #4
stintel
added a commit
that referenced
this issue
May 16, 2024
And use it in Triton chat completions and legacy completions. To activate, configure a prompt_format for the chat_completions model: ``` [models.chat_completions."Mistral-7B-Instruct-v0.2"] ... prompt_format = "mistral" ``` This will look for templates in /etc/ai-router/templates. The template for chat completions should go in the chat subdirectory, and for legacy completions the template should go in the completions subdirectory. Example templates Mistral-7B-Instruct-v0.2 (exclude the ```): Chat, based on the template from the Hugging Face Hub (https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/blob/main/tokenizer_config.json#L42), which only supports the user and assistant roles, to be placed in /etc/ai-router/templates/chat/mistral.j2: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }} {% endif -%} {% if message['role'] == 'user' -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endfor %} ``` Modified version of the above template that injects a system prompt before the first user prompt: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {% set mod = 0 -%} {% set system = '' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'system' and loop.index0 == 0) -%} {% set mod = 1 -%} {% set system = message['content'] %} {% else -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }} {% endif -%} {% if message['role'] == 'user' -%} {% if system and system | length > 0 and loop.index0 == 1 -%} {{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}} {% else -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% endif %} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endif -%} {% endfor -%} ``` Legacy completions do not support roles, so a much simpler template can be used, in /etc/ai-router/templates/completions/mistral.j2: ``` [INST] {% for message in messages -%} {{ message -}} {% endfor %} [/INST] ``` As we use chat_completion models in the config for both chat completions and legacy completions, configure a prompt_format for a model will require you to place a template file for both chat completions and legacy completions in the expected location. If one of them is missing, ai-router will not start. The error message should point out why: Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing If you wish to only enable chat completions for a model, and disable legacy completions, this can be done by simply raising an exception in the template: ``` {{ raise_exception('Legacy completions are disabled for this model') }} ``` Closes: #4
stintel
added a commit
that referenced
this issue
May 16, 2024
And use it in Triton chat completions and legacy completions. To activate, configure a prompt_format for the chat_completions model: ``` [models.chat_completions."Mistral-7B-Instruct-v0.2"] ... prompt_format = "mistral" ``` This will look for templates in /etc/ai-router/templates. The template for chat completions should go in the chat subdirectory, and for legacy completions the template should go in the completions subdirectory. Example templates Mistral-7B-Instruct-v0.2 (exclude the ```): Chat, based on the template from the Hugging Face Hub (https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/blob/main/tokenizer_config.json#L42), which only supports the user and assistant roles, to be placed in /etc/ai-router/templates/chat/mistral.j2: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }} {% endif -%} {% if message['role'] == 'user' -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endfor %} ``` Modified version of the above template that injects a system prompt before the first user prompt: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {% set mod = 0 -%} {% set system = '' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'system' and loop.index0 == 0) -%} {% set mod = 1 -%} {% set system = message['content'] %} {% else -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }} {% endif -%} {% if message['role'] == 'user' -%} {% if system and system | length > 0 and loop.index0 == 1 -%} {{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}} {% else -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% endif %} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endif -%} {% endfor -%} ``` Legacy completions do not support roles, so a much simpler template can be used, in /etc/ai-router/templates/completions/mistral.j2: ``` [INST] {% for message in messages -%} {{ message -}} {% endfor %} [/INST] ``` As we use chat_completion models in the config for both chat completions and legacy completions, configure a prompt_format for a model will require you to place a template file for both chat completions and legacy completions in the expected location. If one of them is missing, ai-router will not start. The error message should point out why: Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing If you wish to only enable chat completions for a model, and disable legacy completions, this can be done by simply raising an exception in the template: ``` {{ raise_exception('Legacy completions are disabled for this model') }} ``` Closes: #4
stintel
added a commit
that referenced
this issue
May 16, 2024
And use it in Triton chat completions and legacy completions. To activate, configure a prompt_format for the chat_completions model: ``` [models.chat_completions."Mistral-7B-Instruct-v0.2"] ... prompt_format = "mistral" ``` This will look for templates in /etc/ai-router/templates. The template for chat completions should go in the chat subdirectory, and for legacy completions the template should go in the completions subdirectory. Example templates Mistral-7B-Instruct-v0.2 (exclude the ```): Chat, based on the template from the Hugging Face Hub (https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/blob/main/tokenizer_config.json#L42), which only supports the user and assistant roles, to be placed in /etc/ai-router/templates/chat/mistral.j2: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }} {% endif -%} {% if message['role'] == 'user' -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endfor %} ``` Modified version of the above template that injects a system prompt before the first user prompt: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {% set mod = 0 -%} {% set system = '' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'system' and loop.index0 == 0) -%} {% set mod = 1 -%} {% set system = message['content'] %} {% else -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }} {% endif -%} {% if message['role'] == 'user' -%} {% if system and system | length > 0 and loop.index0 == 1 -%} {{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}} {% else -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% endif %} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endif -%} {% endfor -%} ``` Legacy completions do not support roles, so a much simpler template can be used, in /etc/ai-router/templates/completions/mistral.j2: ``` [INST] {% for message in messages -%} {{ message -}} {% endfor %} [/INST] ``` As we use chat_completion models in the config for both chat completions and legacy completions, configure a prompt_format for a model will require you to place a template file for both chat completions and legacy completions in the expected location. If one of them is missing, ai-router will not start. The error message should point out why: Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing If you wish to only enable chat completions for a model, and disable legacy completions, this can be done by simply raising an exception in the template: ``` {{ raise_exception('Legacy completions are disabled for this model') }} ``` Closes: #4
stintel
added a commit
that referenced
this issue
May 16, 2024
And use it in Triton chat completions and legacy completions. To activate, configure a prompt_format for the chat_completions model: ``` [models.chat_completions."Mistral-7B-Instruct-v0.2"] ... prompt_format = "mistral" ``` This will look for templates in /etc/ai-router/templates. The template for chat completions should go in the chat subdirectory, and for legacy completions the template should go in the completions subdirectory. Example templates Mistral-7B-Instruct-v0.2 (exclude the ```): Chat, based on the template from the Hugging Face Hub (https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/blob/main/tokenizer_config.json#L42), which only supports the user and assistant roles, to be placed in /etc/ai-router/templates/chat/mistral.j2: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }} {% endif -%} {% if message['role'] == 'user' -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endfor %} ``` Modified version of the above template that injects a system prompt before the first user prompt: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {% set mod = 0 -%} {% set system = '' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'system' and loop.index0 == 0) -%} {% set mod = 1 -%} {% set system = message['content'] %} {% else -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }} {% endif -%} {% if message['role'] == 'user' -%} {% if system and system | length > 0 and loop.index0 == 1 -%} {{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}} {% else -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% endif %} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endif -%} {% endfor -%} ``` Legacy completions do not support roles, so a much simpler template can be used, in /etc/ai-router/templates/completions/mistral.j2: ``` [INST] {% for message in messages -%} {{ message -}} {% endfor %} [/INST] ``` As we use chat_completion models in the config for both chat completions and legacy completions, configure a prompt_format for a model will require you to place a template file for both chat completions and legacy completions in the expected location. If one of them is missing, ai-router will not start. The error message should point out why: Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing If you wish to only enable chat completions for a model, and disable legacy completions, this can be done by simply raising an exception in the template: ``` {{ raise_exception('Legacy completions are disabled for this model') }} ``` Closes: #4
stintel
added a commit
that referenced
this issue
May 22, 2024
And use it in Triton chat completions and legacy completions. To activate, configure a prompt_format for the chat_completions model: ``` [models.chat_completions."Mistral-7B-Instruct-v0.2"] ... prompt_format = "mistral" ``` This will look for templates in /etc/ai-router/templates. The template for chat completions should go in the chat subdirectory, and for legacy completions the template should go in the completions subdirectory. Example templates Mistral-7B-Instruct-v0.2 (exclude the ```): Chat, based on the template from the Hugging Face Hub (https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/blob/main/tokenizer_config.json#L42), which only supports the user and assistant roles, to be placed in /etc/ai-router/templates/chat/mistral.j2: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }} {% endif -%} {% if message['role'] == 'user' -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endfor %} ``` Modified version of the above template that injects a system prompt before the first user prompt: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {% set mod = 0 -%} {% set system = '' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'system' and loop.index0 == 0) -%} {% set mod = 1 -%} {% set system = message['content'] %} {% else -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }} {% endif -%} {% if message['role'] == 'user' -%} {% if system and system | length > 0 and loop.index0 == 1 -%} {{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}} {% else -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% endif %} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endif -%} {% endfor -%} ``` Legacy completions do not support roles, so a much simpler template can be used, in /etc/ai-router/templates/completions/mistral.j2: ``` [INST] {% for message in messages -%} {{ message -}} {% endfor %} [/INST] ``` As we use chat_completion models in the config for both chat completions and legacy completions, configure a prompt_format for a model will require you to place a template file for both chat completions and legacy completions in the expected location. If one of them is missing, ai-router will not start. The error message should point out why: Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing If you wish to only enable chat completions for a model, and disable legacy completions, this can be done by simply raising an exception in the template: ``` {{ raise_exception('Legacy completions are disabled for this model') }} ``` Closes: #4
stintel
added a commit
that referenced
this issue
May 22, 2024
And use it in Triton chat completions and legacy completions. To activate, configure a prompt_format for the chat_completions model: ``` [models.chat_completions."Mistral-7B-Instruct-v0.2"] ... prompt_format = "mistral" ``` This will look for templates in /etc/ai-router/templates. The template for chat completions should go in the chat subdirectory, and for legacy completions the template should go in the completions subdirectory. Example templates Mistral-7B-Instruct-v0.2 (exclude the ```): Chat, based on the template from the Hugging Face Hub (https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/blob/main/tokenizer_config.json#L42), which only supports the user and assistant roles, to be placed in /etc/ai-router/templates/chat/mistral.j2: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }} {% endif -%} {% if message['role'] == 'user' -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endfor %} ``` Modified version of the above template that injects a system prompt before the first user prompt: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {% set mod = 0 -%} {% set system = '' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'system' and loop.index0 == 0) -%} {% set mod = 1 -%} {% set system = message['content'] %} {% else -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }} {% endif -%} {% if message['role'] == 'user' -%} {% if system and system | length > 0 and loop.index0 == 1 -%} {{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}} {% else -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% endif %} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endif -%} {% endfor -%} ``` Legacy completions do not support roles, so a much simpler template can be used, in /etc/ai-router/templates/completions/mistral.j2: ``` [INST] {% for message in messages -%} {{ message -}} {% endfor %} [/INST] ``` As we use chat_completion models in the config for both chat completions and legacy completions, configure a prompt_format for a model will require you to place a template file for both chat completions and legacy completions in the expected location. If one of them is missing, ai-router will not start. The error message should point out why: Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing If you wish to only enable chat completions for a model, and disable legacy completions, this can be done by simply raising an exception in the template: ``` {{ raise_exception('Legacy completions are disabled for this model') }} ``` Closes: #4
stintel
added a commit
that referenced
this issue
May 31, 2024
And use it in Triton chat completions and legacy completions. To activate, configure a prompt_format for the chat_completions model: ``` [models.chat_completions."Mistral-7B-Instruct-v0.2"] ... prompt_format = "mistral" ``` This will look for templates in /etc/ai-router/templates. The template for chat completions should go in the chat subdirectory, and for legacy completions the template should go in the completions subdirectory. Example templates Mistral-7B-Instruct-v0.2 (exclude the ```): Chat, based on the template from the Hugging Face Hub (https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/blob/main/tokenizer_config.json#L42), which only supports the user and assistant roles, to be placed in /etc/ai-router/templates/chat/mistral.j2: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }} {% endif -%} {% if message['role'] == 'user' -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endfor %} ``` Modified version of the above template that injects a system prompt before the first user prompt: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {% set mod = 0 -%} {% set system = '' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'system' and loop.index0 == 0) -%} {% set mod = 1 -%} {% set system = message['content'] %} {% else -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }} {% endif -%} {% if message['role'] == 'user' -%} {% if system and system | length > 0 and loop.index0 == 1 -%} {{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}} {% else -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% endif %} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endif -%} {% endfor -%} ``` Legacy completions do not support roles, so a much simpler template can be used, in /etc/ai-router/templates/completions/mistral.j2: ``` [INST] {% for message in messages -%} {{ message -}} {% endfor %} [/INST] ``` As we use chat_completion models in the config for both chat completions and legacy completions, configure a prompt_format for a model will require you to place a template file for both chat completions and legacy completions in the expected location. If one of them is missing, ai-router will not start. The error message should point out why: Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing If you wish to only enable chat completions for a model, and disable legacy completions, this can be done by simply raising an exception in the template: ``` {{ raise_exception('Legacy completions are disabled for this model') }} ``` Closes: #4
stintel
added a commit
that referenced
this issue
May 31, 2024
And use it in Triton chat completions and legacy completions. To activate, configure a prompt_format for the chat_completions model: ``` [models.chat_completions."Mistral-7B-Instruct-v0.2"] ... prompt_format = "mistral" ``` This will look for templates in /etc/ai-router/templates. The template for chat completions should go in the chat subdirectory, and for legacy completions the template should go in the completions subdirectory. Example templates Mistral-7B-Instruct-v0.2 (exclude the ```): Chat, based on the template from the Hugging Face Hub (https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/blob/main/tokenizer_config.json#L42), which only supports the user and assistant roles, to be placed in /etc/ai-router/templates/chat/mistral.j2: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }} {% endif -%} {% if message['role'] == 'user' -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endfor %} ``` Modified version of the above template that injects a system prompt before the first user prompt: ``` {%- set bos_token = '<s>' -%} {% set eos_token = '</s>' -%} {% set mod = 0 -%} {% set system = '' -%} {{ bos_token -}} {%- for message in messages -%} {% if (message['role'] == 'system' and loop.index0 == 0) -%} {% set mod = 1 -%} {% set system = message['content'] %} {% else -%} {% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }} {% endif -%} {% if message['role'] == 'user' -%} {% if system and system | length > 0 and loop.index0 == 1 -%} {{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}} {% else -%} {{ '[INST] ' + message['content'] + ' [/INST]' -}} {% endif %} {% elif message ['role'] == 'assistant' -%} {{ ' ' + message['content'] + eos_token -}} {% else -%} {{ raise_exception('Only user and assistant roles are supported!') }} {% endif -%} {% endif -%} {% endfor -%} ``` Legacy completions do not support roles, so a much simpler template can be used, in /etc/ai-router/templates/completions/mistral.j2: ``` [INST] {% for message in messages -%} {{ message -}} {% endfor %} [/INST] ``` As we use chat_completion models in the config for both chat completions and legacy completions, configure a prompt_format for a model will require you to place a template file for both chat completions and legacy completions in the expected location. If one of them is missing, ai-router will not start. The error message should point out why: Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing If you wish to only enable chat completions for a model, and disable legacy completions, this can be done by simply raising an exception in the template: ``` {{ raise_exception('Legacy completions are disabled for this model') }} ``` Closes: #4
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Related to #2 - potentially use a template engine to deal with prompt formats?
Some examples.
Important things (not necessarily in this order):
I'm guessing that for performance we can/should use something that supports compilation on startup? In the end what OpenAI clients will provide is often referred to as "ChatML". See the HF chat templating example for input above.
I believe this may be the most powerful, flexible, and performant way to support what is described in #2 for prompt handling and definition.
The text was updated successfully, but these errors were encountered: