Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Templating for prompts #4

Open
kristiankielhofner opened this issue Jan 26, 2024 · 0 comments
Open

Templating for prompts #4

kristiankielhofner opened this issue Jan 26, 2024 · 0 comments
Assignees

Comments

@kristiankielhofner
Copy link

Related to #2 - potentially use a template engine to deal with prompt formats?

Some examples.

Important things (not necessarily in this order):

  1. Flexibility. Support things like built-in templates for special tokens in various models that we can store/support in repo by name such as "mistral": see mistral example format. We can also extend/include support for transformers chat templating but that will require fetching the tokenizer config for the user configured model. See key "chat_template" for Mistral-Instruct as an example. In the event a user specifies a HF model we can attempt to "autoload" this but it brings up issues of fetching, caching, grabbing specific revisions, etc. Also allow users to extend this to support models they may have finetuned with specific/proprietary/goofy prompt/instruction formats.
  2. Performance.
  3. Rust support.
  4. Use/familiarity/popularity.
  5. Validation. Not all models support all ChatML/OpenAI roles. Ideally we would validate requests on input against the roles the model actually supports and return error if the user steps outside of these supported roles.

I'm guessing that for performance we can/should use something that supports compilation on startup? In the end what OpenAI clients will provide is often referred to as "ChatML". See the HF chat templating example for input above.

I believe this may be the most powerful, flexible, and performant way to support what is described in #2 for prompt handling and definition.

stintel added a commit that referenced this issue Feb 23, 2024
Replace .to_string() with String::from while at it. Both are used, but
let's aim for consistency.

Reported-by: Nick Bento <[email protected]>
Fixes: c196362 ("Implement chat completion (#4)")
stintel added a commit that referenced this issue Mar 26, 2024
And use it in Triton chat completions and legacy completions.

For Mistral-7B-Instruct-v0.2, here is an example template for chat
completions. Put it in /etc/ai-router/templates/chat/mistral.j2:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
{% endif -%}
{% if message['role'] == 'user' -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endfor %}
```

And configure the prompt_format in /etc/ai-router.toml:

```
[models.chat_completions."Mistral-7B-Instruct-v0.2"]
...
prompt_format = "mistral"
```

For legacy completions, a different template is needed, in
/etc/ai-router/templates/completions/mistral.j2:

```
[INST] {% for message in messages -%}
{{ message -}}
{% endfor %} [/INST]
```

Closes: #4
stintel added a commit that referenced this issue Mar 26, 2024
And use it in Triton chat completions and legacy completions.

For Mistral-7B-Instruct-v0.2, here is an example template for chat
completions. Put it in /etc/ai-router/templates/chat/mistral.j2:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
{% endif -%}
{% if message['role'] == 'user' -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endfor %}
```

And configure the prompt_format in /etc/ai-router.toml:

```
[models.chat_completions."Mistral-7B-Instruct-v0.2"]
...
prompt_format = "mistral"
```

For legacy completions, a different template is needed, in
/etc/ai-router/templates/completions/mistral.j2:

```
[INST] {% for message in messages -%}
{{ message -}}
{% endfor %} [/INST]
```

Closes: #4
stintel added a commit that referenced this issue Mar 26, 2024
And use it in Triton chat completions and legacy completions.

For Mistral-7B-Instruct-v0.2, here is an example template for chat
completions. Put it in /etc/ai-router/templates/chat/mistral.j2:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
{% endif -%}
{% if message['role'] == 'user' -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endfor %}
```

And configure the prompt_format in /etc/ai-router.toml:

```
[models.chat_completions."Mistral-7B-Instruct-v0.2"]
...
prompt_format = "mistral"
```

For legacy completions, a different template is needed, in
/etc/ai-router/templates/completions/mistral.j2:

```
[INST] {% for message in messages -%}
{{ message -}}
{% endfor %} [/INST]
```

Closes: #4
stintel added a commit that referenced this issue Mar 26, 2024
And use it in Triton chat completions and legacy completions.

For Mistral-7B-Instruct-v0.2, here is an example template for chat
completions. Put it in /etc/ai-router/templates/chat/mistral.j2:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
{% endif -%}
{% if message['role'] == 'user' -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endfor %}
```

And configure the prompt_format in /etc/ai-router.toml:

```
[models.chat_completions."Mistral-7B-Instruct-v0.2"]
...
prompt_format = "mistral"
```

For legacy completions, a different template is needed, in
/etc/ai-router/templates/completions/mistral.j2:

```
[INST] {% for message in messages -%}
{{ message -}}
{% endfor %} [/INST]
```

Closes: #4
stintel added a commit that referenced this issue Mar 27, 2024
And use it in Triton chat completions and legacy completions.

For Mistral-7B-Instruct-v0.2, here is an example template for chat
completions. Put it in /etc/ai-router/templates/chat/mistral.j2:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
{% endif -%}
{% if message['role'] == 'user' -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endfor %}
```

And configure the prompt_format in /etc/ai-router.toml:

```
[models.chat_completions."Mistral-7B-Instruct-v0.2"]
...
prompt_format = "mistral"
```

For legacy completions, a different template is needed, in
/etc/ai-router/templates/completions/mistral.j2:

```
[INST] {% for message in messages -%}
{{ message -}}
{% endfor %} [/INST]
```

Closes: #4
stintel added a commit that referenced this issue Mar 28, 2024
And use it in Triton chat completions and legacy completions.

To activate, configure a prompt_format for the chat_completions model:

```
[models.chat_completions."Mistral-7B-Instruct-v0.2"]
...
prompt_format = "mistral"
```

This will look for templates in /etc/ai-router/templates. The template
for chat completions should go in the chat subdirectory, and for legacy
completions the template should go in the completions subdirectory.

Example templates Mistral-7B-Instruct-v0.2 (exclude the ```):

Chat, based on the template from the Hugging Face Hub, which only
supports the user and assistant roles, to be placed in
/etc/ai-router/templates/chat/mistral.j2:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
{% endif -%}
{% if message['role'] == 'user' -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endfor %}
```

Modified version of the above template that injects a system prompt
before the first user prompt:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{% set mod = 0 -%}
{% set system = '' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'system' and loop.index0 == 0) -%}
{% set mod = 1 -%}
{% set system = message['content'] %}
{% else -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }}
{% endif -%}
{% if message['role'] == 'user' -%}
{% if system and system | length > 0 and loop.index0 == 1 -%}
{{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}}
{% else -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% endif %}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endif -%}
{% endfor -%}
```

Legacy completions do not support roles, so a much simpler template can
be used, in /etc/ai-router/templates/completions/mistral.j2:

```
[INST] {% for message in messages -%}
{{ message -}}
{% endfor %} [/INST]
```

Closes: #4
stintel added a commit that referenced this issue Mar 28, 2024
And use it in Triton chat completions and legacy completions.

To activate, configure a prompt_format for the chat_completions model:

```
[models.chat_completions."Mistral-7B-Instruct-v0.2"]
...
prompt_format = "mistral"
```

This will look for templates in /etc/ai-router/templates. The template
for chat completions should go in the chat subdirectory, and for legacy
completions the template should go in the completions subdirectory.

Example templates Mistral-7B-Instruct-v0.2 (exclude the ```):

Chat, based on the template from the Hugging Face Hub, which only
supports the user and assistant roles, to be placed in
/etc/ai-router/templates/chat/mistral.j2:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
{% endif -%}
{% if message['role'] == 'user' -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endfor %}
```

Modified version of the above template that injects a system prompt
before the first user prompt:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{% set mod = 0 -%}
{% set system = '' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'system' and loop.index0 == 0) -%}
{% set mod = 1 -%}
{% set system = message['content'] %}
{% else -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }}
{% endif -%}
{% if message['role'] == 'user' -%}
{% if system and system | length > 0 and loop.index0 == 1 -%}
{{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}}
{% else -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% endif %}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endif -%}
{% endfor -%}
```

Legacy completions do not support roles, so a much simpler template can
be used, in /etc/ai-router/templates/completions/mistral.j2:

```
[INST] {% for message in messages -%}
{{ message -}}
{% endfor %} [/INST]
```

As we use chat_completion models in the config for both chat completions
and legacy completions, configure a prompt_format for a model will
require you to place a template file for both chat completions and
legacy completions in the expected location. If one of them is missing,
ai-router will not start. The error message should point out why:

Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing

If you wish to only enable chat completions for a model, and disable
legacy completions, this can be done by simply raising an exception in
the template:

```
{{ raise_exception('Legacy completions are disabled for this model') }}
```

Closes: #4
stintel added a commit that referenced this issue Mar 28, 2024
And use it in Triton chat completions and legacy completions.

To activate, configure a prompt_format for the chat_completions model:

```
[models.chat_completions."Mistral-7B-Instruct-v0.2"]
...
prompt_format = "mistral"
```

This will look for templates in /etc/ai-router/templates. The template
for chat completions should go in the chat subdirectory, and for legacy
completions the template should go in the completions subdirectory.

Example templates Mistral-7B-Instruct-v0.2 (exclude the ```):

Chat, based on the template from the Hugging Face Hub, which only
supports the user and assistant roles, to be placed in
/etc/ai-router/templates/chat/mistral.j2:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
{% endif -%}
{% if message['role'] == 'user' -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endfor %}
```

Modified version of the above template that injects a system prompt
before the first user prompt:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{% set mod = 0 -%}
{% set system = '' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'system' and loop.index0 == 0) -%}
{% set mod = 1 -%}
{% set system = message['content'] %}
{% else -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }}
{% endif -%}
{% if message['role'] == 'user' -%}
{% if system and system | length > 0 and loop.index0 == 1 -%}
{{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}}
{% else -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% endif %}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endif -%}
{% endfor -%}
```

Legacy completions do not support roles, so a much simpler template can
be used, in /etc/ai-router/templates/completions/mistral.j2:

```
[INST] {% for message in messages -%}
{{ message -}}
{% endfor %} [/INST]
```

As we use chat_completion models in the config for both chat completions
and legacy completions, configure a prompt_format for a model will
require you to place a template file for both chat completions and
legacy completions in the expected location. If one of them is missing,
ai-router will not start. The error message should point out why:

Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing

If you wish to only enable chat completions for a model, and disable
legacy completions, this can be done by simply raising an exception in
the template:

```
{{ raise_exception('Legacy completions are disabled for this model') }}
```

Closes: #4
stintel added a commit that referenced this issue Mar 29, 2024
And use it in Triton chat completions and legacy completions.

To activate, configure a prompt_format for the chat_completions model:

```
[models.chat_completions."Mistral-7B-Instruct-v0.2"]
...
prompt_format = "mistral"
```

This will look for templates in /etc/ai-router/templates. The template
for chat completions should go in the chat subdirectory, and for legacy
completions the template should go in the completions subdirectory.

Example templates Mistral-7B-Instruct-v0.2 (exclude the ```):

Chat, based on the template from the Hugging Face Hub, which only
supports the user and assistant roles, to be placed in
/etc/ai-router/templates/chat/mistral.j2:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
{% endif -%}
{% if message['role'] == 'user' -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endfor %}
```

Modified version of the above template that injects a system prompt
before the first user prompt:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{% set mod = 0 -%}
{% set system = '' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'system' and loop.index0 == 0) -%}
{% set mod = 1 -%}
{% set system = message['content'] %}
{% else -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }}
{% endif -%}
{% if message['role'] == 'user' -%}
{% if system and system | length > 0 and loop.index0 == 1 -%}
{{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}}
{% else -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% endif %}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endif -%}
{% endfor -%}
```

Legacy completions do not support roles, so a much simpler template can
be used, in /etc/ai-router/templates/completions/mistral.j2:

```
[INST] {% for message in messages -%}
{{ message -}}
{% endfor %} [/INST]
```

As we use chat_completion models in the config for both chat completions
and legacy completions, configure a prompt_format for a model will
require you to place a template file for both chat completions and
legacy completions in the expected location. If one of them is missing,
ai-router will not start. The error message should point out why:

Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing

If you wish to only enable chat completions for a model, and disable
legacy completions, this can be done by simply raising an exception in
the template:

```
{{ raise_exception('Legacy completions are disabled for this model') }}
```

Closes: #4
stintel added a commit that referenced this issue Mar 29, 2024
And use it in Triton chat completions and legacy completions.

To activate, configure a prompt_format for the chat_completions model:

```
[models.chat_completions."Mistral-7B-Instruct-v0.2"]
...
prompt_format = "mistral"
```

This will look for templates in /etc/ai-router/templates. The template
for chat completions should go in the chat subdirectory, and for legacy
completions the template should go in the completions subdirectory.

Example templates Mistral-7B-Instruct-v0.2 (exclude the ```):

Chat, based on the template from the Hugging Face Hub, which only
supports the user and assistant roles, to be placed in
/etc/ai-router/templates/chat/mistral.j2:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
{% endif -%}
{% if message['role'] == 'user' -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endfor %}
```

Modified version of the above template that injects a system prompt
before the first user prompt:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{% set mod = 0 -%}
{% set system = '' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'system' and loop.index0 == 0) -%}
{% set mod = 1 -%}
{% set system = message['content'] %}
{% else -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }}
{% endif -%}
{% if message['role'] == 'user' -%}
{% if system and system | length > 0 and loop.index0 == 1 -%}
{{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}}
{% else -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% endif %}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endif -%}
{% endfor -%}
```

Legacy completions do not support roles, so a much simpler template can
be used, in /etc/ai-router/templates/completions/mistral.j2:

```
[INST] {% for message in messages -%}
{{ message -}}
{% endfor %} [/INST]
```

As we use chat_completion models in the config for both chat completions
and legacy completions, configure a prompt_format for a model will
require you to place a template file for both chat completions and
legacy completions in the expected location. If one of them is missing,
ai-router will not start. The error message should point out why:

Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing

If you wish to only enable chat completions for a model, and disable
legacy completions, this can be done by simply raising an exception in
the template:

```
{{ raise_exception('Legacy completions are disabled for this model') }}
```

Closes: #4
stintel added a commit that referenced this issue Mar 29, 2024
And use it in Triton chat completions and legacy completions.

To activate, configure a prompt_format for the chat_completions model:

```
[models.chat_completions."Mistral-7B-Instruct-v0.2"]
...
prompt_format = "mistral"
```

This will look for templates in /etc/ai-router/templates. The template
for chat completions should go in the chat subdirectory, and for legacy
completions the template should go in the completions subdirectory.

Example templates Mistral-7B-Instruct-v0.2 (exclude the ```):

Chat, based on the template from the Hugging Face Hub, which only
supports the user and assistant roles, to be placed in
/etc/ai-router/templates/chat/mistral.j2:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
{% endif -%}
{% if message['role'] == 'user' -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endfor %}
```

Modified version of the above template that injects a system prompt
before the first user prompt:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{% set mod = 0 -%}
{% set system = '' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'system' and loop.index0 == 0) -%}
{% set mod = 1 -%}
{% set system = message['content'] %}
{% else -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }}
{% endif -%}
{% if message['role'] == 'user' -%}
{% if system and system | length > 0 and loop.index0 == 1 -%}
{{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}}
{% else -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% endif %}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endif -%}
{% endfor -%}
```

Legacy completions do not support roles, so a much simpler template can
be used, in /etc/ai-router/templates/completions/mistral.j2:

```
[INST] {% for message in messages -%}
{{ message -}}
{% endfor %} [/INST]
```

As we use chat_completion models in the config for both chat completions
and legacy completions, configure a prompt_format for a model will
require you to place a template file for both chat completions and
legacy completions in the expected location. If one of them is missing,
ai-router will not start. The error message should point out why:

Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing

If you wish to only enable chat completions for a model, and disable
legacy completions, this can be done by simply raising an exception in
the template:

```
{{ raise_exception('Legacy completions are disabled for this model') }}
```

Closes: #4
stintel added a commit that referenced this issue Mar 29, 2024
And use it in Triton chat completions and legacy completions.

To activate, configure a prompt_format for the chat_completions model:

```
[models.chat_completions."Mistral-7B-Instruct-v0.2"]
...
prompt_format = "mistral"
```

This will look for templates in /etc/ai-router/templates. The template
for chat completions should go in the chat subdirectory, and for legacy
completions the template should go in the completions subdirectory.

Example templates Mistral-7B-Instruct-v0.2 (exclude the ```):

Chat, based on the template from the Hugging Face Hub, which only
supports the user and assistant roles, to be placed in
/etc/ai-router/templates/chat/mistral.j2:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
{% endif -%}
{% if message['role'] == 'user' -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endfor %}
```

Modified version of the above template that injects a system prompt
before the first user prompt:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{% set mod = 0 -%}
{% set system = '' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'system' and loop.index0 == 0) -%}
{% set mod = 1 -%}
{% set system = message['content'] %}
{% else -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }}
{% endif -%}
{% if message['role'] == 'user' -%}
{% if system and system | length > 0 and loop.index0 == 1 -%}
{{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}}
{% else -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% endif %}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endif -%}
{% endfor -%}
```

Legacy completions do not support roles, so a much simpler template can
be used, in /etc/ai-router/templates/completions/mistral.j2:

```
[INST] {% for message in messages -%}
{{ message -}}
{% endfor %} [/INST]
```

As we use chat_completion models in the config for both chat completions
and legacy completions, configure a prompt_format for a model will
require you to place a template file for both chat completions and
legacy completions in the expected location. If one of them is missing,
ai-router will not start. The error message should point out why:

Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing

If you wish to only enable chat completions for a model, and disable
legacy completions, this can be done by simply raising an exception in
the template:

```
{{ raise_exception('Legacy completions are disabled for this model') }}
```

Closes: #4
stintel added a commit that referenced this issue Mar 29, 2024
And use it in Triton chat completions and legacy completions.

To activate, configure a prompt_format for the chat_completions model:

```
[models.chat_completions."Mistral-7B-Instruct-v0.2"]
...
prompt_format = "mistral"
```

This will look for templates in /etc/ai-router/templates. The template
for chat completions should go in the chat subdirectory, and for legacy
completions the template should go in the completions subdirectory.

Example templates Mistral-7B-Instruct-v0.2 (exclude the ```):

Chat, based on the template from the Hugging Face Hub, which only
supports the user and assistant roles, to be placed in
/etc/ai-router/templates/chat/mistral.j2:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
{% endif -%}
{% if message['role'] == 'user' -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endfor %}
```

Modified version of the above template that injects a system prompt
before the first user prompt:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{% set mod = 0 -%}
{% set system = '' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'system' and loop.index0 == 0) -%}
{% set mod = 1 -%}
{% set system = message['content'] %}
{% else -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }}
{% endif -%}
{% if message['role'] == 'user' -%}
{% if system and system | length > 0 and loop.index0 == 1 -%}
{{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}}
{% else -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% endif %}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endif -%}
{% endfor -%}
```

Legacy completions do not support roles, so a much simpler template can
be used, in /etc/ai-router/templates/completions/mistral.j2:

```
[INST] {% for message in messages -%}
{{ message -}}
{% endfor %} [/INST]
```

As we use chat_completion models in the config for both chat completions
and legacy completions, configure a prompt_format for a model will
require you to place a template file for both chat completions and
legacy completions in the expected location. If one of them is missing,
ai-router will not start. The error message should point out why:

Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing

If you wish to only enable chat completions for a model, and disable
legacy completions, this can be done by simply raising an exception in
the template:

```
{{ raise_exception('Legacy completions are disabled for this model') }}
```

Closes: #4
stintel added a commit that referenced this issue Mar 29, 2024
And use it in Triton chat completions and legacy completions.

To activate, configure a prompt_format for the chat_completions model:

```
[models.chat_completions."Mistral-7B-Instruct-v0.2"]
...
prompt_format = "mistral"
```

This will look for templates in /etc/ai-router/templates. The template
for chat completions should go in the chat subdirectory, and for legacy
completions the template should go in the completions subdirectory.

Example templates Mistral-7B-Instruct-v0.2 (exclude the ```):

Chat, based on the template from the Hugging Face Hub, which only
supports the user and assistant roles, to be placed in
/etc/ai-router/templates/chat/mistral.j2:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
{% endif -%}
{% if message['role'] == 'user' -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endfor %}
```

Modified version of the above template that injects a system prompt
before the first user prompt:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{% set mod = 0 -%}
{% set system = '' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'system' and loop.index0 == 0) -%}
{% set mod = 1 -%}
{% set system = message['content'] %}
{% else -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }}
{% endif -%}
{% if message['role'] == 'user' -%}
{% if system and system | length > 0 and loop.index0 == 1 -%}
{{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}}
{% else -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% endif %}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endif -%}
{% endfor -%}
```

Legacy completions do not support roles, so a much simpler template can
be used, in /etc/ai-router/templates/completions/mistral.j2:

```
[INST] {% for message in messages -%}
{{ message -}}
{% endfor %} [/INST]
```

As we use chat_completion models in the config for both chat completions
and legacy completions, configure a prompt_format for a model will
require you to place a template file for both chat completions and
legacy completions in the expected location. If one of them is missing,
ai-router will not start. The error message should point out why:

Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing

If you wish to only enable chat completions for a model, and disable
legacy completions, this can be done by simply raising an exception in
the template:

```
{{ raise_exception('Legacy completions are disabled for this model') }}
```

Closes: #4
stintel added a commit that referenced this issue Mar 31, 2024
And use it in Triton chat completions and legacy completions.

To activate, configure a prompt_format for the chat_completions model:

```
[models.chat_completions."Mistral-7B-Instruct-v0.2"]
...
prompt_format = "mistral"
```

This will look for templates in /etc/ai-router/templates. The template
for chat completions should go in the chat subdirectory, and for legacy
completions the template should go in the completions subdirectory.

Example templates Mistral-7B-Instruct-v0.2 (exclude the ```):

Chat, based on the template from the Hugging Face Hub, which only
supports the user and assistant roles, to be placed in
/etc/ai-router/templates/chat/mistral.j2:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
{% endif -%}
{% if message['role'] == 'user' -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endfor %}
```

Modified version of the above template that injects a system prompt
before the first user prompt:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{% set mod = 0 -%}
{% set system = '' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'system' and loop.index0 == 0) -%}
{% set mod = 1 -%}
{% set system = message['content'] %}
{% else -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }}
{% endif -%}
{% if message['role'] == 'user' -%}
{% if system and system | length > 0 and loop.index0 == 1 -%}
{{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}}
{% else -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% endif %}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endif -%}
{% endfor -%}
```

Legacy completions do not support roles, so a much simpler template can
be used, in /etc/ai-router/templates/completions/mistral.j2:

```
[INST] {% for message in messages -%}
{{ message -}}
{% endfor %} [/INST]
```

As we use chat_completion models in the config for both chat completions
and legacy completions, configure a prompt_format for a model will
require you to place a template file for both chat completions and
legacy completions in the expected location. If one of them is missing,
ai-router will not start. The error message should point out why:

Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing

If you wish to only enable chat completions for a model, and disable
legacy completions, this can be done by simply raising an exception in
the template:

```
{{ raise_exception('Legacy completions are disabled for this model') }}
```

Closes: #4
stintel added a commit that referenced this issue Mar 31, 2024
And use it in Triton chat completions and legacy completions.

To activate, configure a prompt_format for the chat_completions model:

```
[models.chat_completions."Mistral-7B-Instruct-v0.2"]
...
prompt_format = "mistral"
```

This will look for templates in /etc/ai-router/templates. The template
for chat completions should go in the chat subdirectory, and for legacy
completions the template should go in the completions subdirectory.

Example templates Mistral-7B-Instruct-v0.2 (exclude the ```):

Chat, based on the template from the Hugging Face Hub, which only
supports the user and assistant roles, to be placed in
/etc/ai-router/templates/chat/mistral.j2:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
{% endif -%}
{% if message['role'] == 'user' -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endfor %}
```

Modified version of the above template that injects a system prompt
before the first user prompt:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{% set mod = 0 -%}
{% set system = '' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'system' and loop.index0 == 0) -%}
{% set mod = 1 -%}
{% set system = message['content'] %}
{% else -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }}
{% endif -%}
{% if message['role'] == 'user' -%}
{% if system and system | length > 0 and loop.index0 == 1 -%}
{{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}}
{% else -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% endif %}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endif -%}
{% endfor -%}
```

Legacy completions do not support roles, so a much simpler template can
be used, in /etc/ai-router/templates/completions/mistral.j2:

```
[INST] {% for message in messages -%}
{{ message -}}
{% endfor %} [/INST]
```

As we use chat_completion models in the config for both chat completions
and legacy completions, configure a prompt_format for a model will
require you to place a template file for both chat completions and
legacy completions in the expected location. If one of them is missing,
ai-router will not start. The error message should point out why:

Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing

If you wish to only enable chat completions for a model, and disable
legacy completions, this can be done by simply raising an exception in
the template:

```
{{ raise_exception('Legacy completions are disabled for this model') }}
```

Closes: #4
stintel added a commit that referenced this issue Apr 16, 2024
And use it in Triton chat completions and legacy completions.

To activate, configure a prompt_format for the chat_completions model:

```
[models.chat_completions."Mistral-7B-Instruct-v0.2"]
...
prompt_format = "mistral"
```

This will look for templates in /etc/ai-router/templates. The template
for chat completions should go in the chat subdirectory, and for legacy
completions the template should go in the completions subdirectory.

Example templates Mistral-7B-Instruct-v0.2 (exclude the ```):

Chat, based on the template from the Hugging Face Hub, which only
supports the user and assistant roles, to be placed in
/etc/ai-router/templates/chat/mistral.j2:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
{% endif -%}
{% if message['role'] == 'user' -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endfor %}
```

Modified version of the above template that injects a system prompt
before the first user prompt:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{% set mod = 0 -%}
{% set system = '' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'system' and loop.index0 == 0) -%}
{% set mod = 1 -%}
{% set system = message['content'] %}
{% else -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }}
{% endif -%}
{% if message['role'] == 'user' -%}
{% if system and system | length > 0 and loop.index0 == 1 -%}
{{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}}
{% else -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% endif %}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endif -%}
{% endfor -%}
```

Legacy completions do not support roles, so a much simpler template can
be used, in /etc/ai-router/templates/completions/mistral.j2:

```
[INST] {% for message in messages -%}
{{ message -}}
{% endfor %} [/INST]
```

As we use chat_completion models in the config for both chat completions
and legacy completions, configure a prompt_format for a model will
require you to place a template file for both chat completions and
legacy completions in the expected location. If one of them is missing,
ai-router will not start. The error message should point out why:

Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing

If you wish to only enable chat completions for a model, and disable
legacy completions, this can be done by simply raising an exception in
the template:

```
{{ raise_exception('Legacy completions are disabled for this model') }}
```

Closes: #4
stintel added a commit that referenced this issue Apr 16, 2024
And use it in Triton chat completions and legacy completions.

To activate, configure a prompt_format for the chat_completions model:

```
[models.chat_completions."Mistral-7B-Instruct-v0.2"]
...
prompt_format = "mistral"
```

This will look for templates in /etc/ai-router/templates. The template
for chat completions should go in the chat subdirectory, and for legacy
completions the template should go in the completions subdirectory.

Example templates Mistral-7B-Instruct-v0.2 (exclude the ```):

Chat, based on the template from the Hugging Face Hub, which only
supports the user and assistant roles, to be placed in
/etc/ai-router/templates/chat/mistral.j2:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
{% endif -%}
{% if message['role'] == 'user' -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endfor %}
```

Modified version of the above template that injects a system prompt
before the first user prompt:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{% set mod = 0 -%}
{% set system = '' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'system' and loop.index0 == 0) -%}
{% set mod = 1 -%}
{% set system = message['content'] %}
{% else -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }}
{% endif -%}
{% if message['role'] == 'user' -%}
{% if system and system | length > 0 and loop.index0 == 1 -%}
{{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}}
{% else -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% endif %}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endif -%}
{% endfor -%}
```

Legacy completions do not support roles, so a much simpler template can
be used, in /etc/ai-router/templates/completions/mistral.j2:

```
[INST] {% for message in messages -%}
{{ message -}}
{% endfor %} [/INST]
```

As we use chat_completion models in the config for both chat completions
and legacy completions, configure a prompt_format for a model will
require you to place a template file for both chat completions and
legacy completions in the expected location. If one of them is missing,
ai-router will not start. The error message should point out why:

Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing

If you wish to only enable chat completions for a model, and disable
legacy completions, this can be done by simply raising an exception in
the template:

```
{{ raise_exception('Legacy completions are disabled for this model') }}
```

Closes: #4
stintel added a commit that referenced this issue Apr 17, 2024
And use it in Triton chat completions and legacy completions.

To activate, configure a prompt_format for the chat_completions model:

```
[models.chat_completions."Mistral-7B-Instruct-v0.2"]
...
prompt_format = "mistral"
```

This will look for templates in /etc/ai-router/templates. The template
for chat completions should go in the chat subdirectory, and for legacy
completions the template should go in the completions subdirectory.

Example templates Mistral-7B-Instruct-v0.2 (exclude the ```):

Chat, based on the template from the Hugging Face Hub, which only
supports the user and assistant roles, to be placed in
/etc/ai-router/templates/chat/mistral.j2:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
{% endif -%}
{% if message['role'] == 'user' -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endfor %}
```

Modified version of the above template that injects a system prompt
before the first user prompt:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{% set mod = 0 -%}
{% set system = '' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'system' and loop.index0 == 0) -%}
{% set mod = 1 -%}
{% set system = message['content'] %}
{% else -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }}
{% endif -%}
{% if message['role'] == 'user' -%}
{% if system and system | length > 0 and loop.index0 == 1 -%}
{{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}}
{% else -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% endif %}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endif -%}
{% endfor -%}
```

Legacy completions do not support roles, so a much simpler template can
be used, in /etc/ai-router/templates/completions/mistral.j2:

```
[INST] {% for message in messages -%}
{{ message -}}
{% endfor %} [/INST]
```

As we use chat_completion models in the config for both chat completions
and legacy completions, configure a prompt_format for a model will
require you to place a template file for both chat completions and
legacy completions in the expected location. If one of them is missing,
ai-router will not start. The error message should point out why:

Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing

If you wish to only enable chat completions for a model, and disable
legacy completions, this can be done by simply raising an exception in
the template:

```
{{ raise_exception('Legacy completions are disabled for this model') }}
```

Closes: #4
stintel added a commit that referenced this issue Apr 19, 2024
And use it in Triton chat completions and legacy completions.

To activate, configure a prompt_format for the chat_completions model:

```
[models.chat_completions."Mistral-7B-Instruct-v0.2"]
...
prompt_format = "mistral"
```

This will look for templates in /etc/ai-router/templates. The template
for chat completions should go in the chat subdirectory, and for legacy
completions the template should go in the completions subdirectory.

Example templates Mistral-7B-Instruct-v0.2 (exclude the ```):

Chat, based on the template from the Hugging Face Hub, which only
supports the user and assistant roles, to be placed in
/etc/ai-router/templates/chat/mistral.j2:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
{% endif -%}
{% if message['role'] == 'user' -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endfor %}
```

Modified version of the above template that injects a system prompt
before the first user prompt:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{% set mod = 0 -%}
{% set system = '' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'system' and loop.index0 == 0) -%}
{% set mod = 1 -%}
{% set system = message['content'] %}
{% else -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }}
{% endif -%}
{% if message['role'] == 'user' -%}
{% if system and system | length > 0 and loop.index0 == 1 -%}
{{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}}
{% else -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% endif %}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endif -%}
{% endfor -%}
```

Legacy completions do not support roles, so a much simpler template can
be used, in /etc/ai-router/templates/completions/mistral.j2:

```
[INST] {% for message in messages -%}
{{ message -}}
{% endfor %} [/INST]
```

As we use chat_completion models in the config for both chat completions
and legacy completions, configure a prompt_format for a model will
require you to place a template file for both chat completions and
legacy completions in the expected location. If one of them is missing,
ai-router will not start. The error message should point out why:

Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing

If you wish to only enable chat completions for a model, and disable
legacy completions, this can be done by simply raising an exception in
the template:

```
{{ raise_exception('Legacy completions are disabled for this model') }}
```

Closes: #4
stintel added a commit that referenced this issue Apr 19, 2024
And use it in Triton chat completions and legacy completions.

To activate, configure a prompt_format for the chat_completions model:

```
[models.chat_completions."Mistral-7B-Instruct-v0.2"]
...
prompt_format = "mistral"
```

This will look for templates in /etc/ai-router/templates. The template
for chat completions should go in the chat subdirectory, and for legacy
completions the template should go in the completions subdirectory.

Example templates Mistral-7B-Instruct-v0.2 (exclude the ```):

Chat, based on the template from the Hugging Face Hub, which only
supports the user and assistant roles, to be placed in
/etc/ai-router/templates/chat/mistral.j2:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
{% endif -%}
{% if message['role'] == 'user' -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endfor %}
```

Modified version of the above template that injects a system prompt
before the first user prompt:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{% set mod = 0 -%}
{% set system = '' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'system' and loop.index0 == 0) -%}
{% set mod = 1 -%}
{% set system = message['content'] %}
{% else -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }}
{% endif -%}
{% if message['role'] == 'user' -%}
{% if system and system | length > 0 and loop.index0 == 1 -%}
{{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}}
{% else -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% endif %}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endif -%}
{% endfor -%}
```

Legacy completions do not support roles, so a much simpler template can
be used, in /etc/ai-router/templates/completions/mistral.j2:

```
[INST] {% for message in messages -%}
{{ message -}}
{% endfor %} [/INST]
```

As we use chat_completion models in the config for both chat completions
and legacy completions, configure a prompt_format for a model will
require you to place a template file for both chat completions and
legacy completions in the expected location. If one of them is missing,
ai-router will not start. The error message should point out why:

Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing

If you wish to only enable chat completions for a model, and disable
legacy completions, this can be done by simply raising an exception in
the template:

```
{{ raise_exception('Legacy completions are disabled for this model') }}
```

Closes: #4
stintel added a commit that referenced this issue Apr 23, 2024
And use it in Triton chat completions and legacy completions.

To activate, configure a prompt_format for the chat_completions model:

```
[models.chat_completions."Mistral-7B-Instruct-v0.2"]
...
prompt_format = "mistral"
```

This will look for templates in /etc/ai-router/templates. The template
for chat completions should go in the chat subdirectory, and for legacy
completions the template should go in the completions subdirectory.

Example templates Mistral-7B-Instruct-v0.2 (exclude the ```):

Chat, based on the template from the Hugging Face Hub, which only
supports the user and assistant roles, to be placed in
/etc/ai-router/templates/chat/mistral.j2:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
{% endif -%}
{% if message['role'] == 'user' -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endfor %}
```

Modified version of the above template that injects a system prompt
before the first user prompt:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{% set mod = 0 -%}
{% set system = '' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'system' and loop.index0 == 0) -%}
{% set mod = 1 -%}
{% set system = message['content'] %}
{% else -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }}
{% endif -%}
{% if message['role'] == 'user' -%}
{% if system and system | length > 0 and loop.index0 == 1 -%}
{{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}}
{% else -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% endif %}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endif -%}
{% endfor -%}
```

Legacy completions do not support roles, so a much simpler template can
be used, in /etc/ai-router/templates/completions/mistral.j2:

```
[INST] {% for message in messages -%}
{{ message -}}
{% endfor %} [/INST]
```

As we use chat_completion models in the config for both chat completions
and legacy completions, configure a prompt_format for a model will
require you to place a template file for both chat completions and
legacy completions in the expected location. If one of them is missing,
ai-router will not start. The error message should point out why:

Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing

If you wish to only enable chat completions for a model, and disable
legacy completions, this can be done by simply raising an exception in
the template:

```
{{ raise_exception('Legacy completions are disabled for this model') }}
```

Closes: #4
stintel added a commit that referenced this issue Apr 24, 2024
And use it in Triton chat completions and legacy completions.

To activate, configure a prompt_format for the chat_completions model:

```
[models.chat_completions."Mistral-7B-Instruct-v0.2"]
...
prompt_format = "mistral"
```

This will look for templates in /etc/ai-router/templates. The template
for chat completions should go in the chat subdirectory, and for legacy
completions the template should go in the completions subdirectory.

Example templates Mistral-7B-Instruct-v0.2 (exclude the ```):

Chat, based on the template from the Hugging Face Hub, which only
supports the user and assistant roles, to be placed in
/etc/ai-router/templates/chat/mistral.j2:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
{% endif -%}
{% if message['role'] == 'user' -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endfor %}
```

Modified version of the above template that injects a system prompt
before the first user prompt:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{% set mod = 0 -%}
{% set system = '' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'system' and loop.index0 == 0) -%}
{% set mod = 1 -%}
{% set system = message['content'] %}
{% else -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }}
{% endif -%}
{% if message['role'] == 'user' -%}
{% if system and system | length > 0 and loop.index0 == 1 -%}
{{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}}
{% else -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% endif %}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endif -%}
{% endfor -%}
```

Legacy completions do not support roles, so a much simpler template can
be used, in /etc/ai-router/templates/completions/mistral.j2:

```
[INST] {% for message in messages -%}
{{ message -}}
{% endfor %} [/INST]
```

As we use chat_completion models in the config for both chat completions
and legacy completions, configure a prompt_format for a model will
require you to place a template file for both chat completions and
legacy completions in the expected location. If one of them is missing,
ai-router will not start. The error message should point out why:

Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing

If you wish to only enable chat completions for a model, and disable
legacy completions, this can be done by simply raising an exception in
the template:

```
{{ raise_exception('Legacy completions are disabled for this model') }}
```

Closes: #4
stintel added a commit that referenced this issue Apr 24, 2024
And use it in Triton chat completions and legacy completions.

To activate, configure a prompt_format for the chat_completions model:

```
[models.chat_completions."Mistral-7B-Instruct-v0.2"]
...
prompt_format = "mistral"
```

This will look for templates in /etc/ai-router/templates. The template
for chat completions should go in the chat subdirectory, and for legacy
completions the template should go in the completions subdirectory.

Example templates Mistral-7B-Instruct-v0.2 (exclude the ```):

Chat, based on the template from the Hugging Face Hub, which only
supports the user and assistant roles, to be placed in
/etc/ai-router/templates/chat/mistral.j2:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
{% endif -%}
{% if message['role'] == 'user' -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endfor %}
```

Modified version of the above template that injects a system prompt
before the first user prompt:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{% set mod = 0 -%}
{% set system = '' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'system' and loop.index0 == 0) -%}
{% set mod = 1 -%}
{% set system = message['content'] %}
{% else -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }}
{% endif -%}
{% if message['role'] == 'user' -%}
{% if system and system | length > 0 and loop.index0 == 1 -%}
{{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}}
{% else -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% endif %}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endif -%}
{% endfor -%}
```

Legacy completions do not support roles, so a much simpler template can
be used, in /etc/ai-router/templates/completions/mistral.j2:

```
[INST] {% for message in messages -%}
{{ message -}}
{% endfor %} [/INST]
```

As we use chat_completion models in the config for both chat completions
and legacy completions, configure a prompt_format for a model will
require you to place a template file for both chat completions and
legacy completions in the expected location. If one of them is missing,
ai-router will not start. The error message should point out why:

Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing

If you wish to only enable chat completions for a model, and disable
legacy completions, this can be done by simply raising an exception in
the template:

```
{{ raise_exception('Legacy completions are disabled for this model') }}
```

Closes: #4
stintel added a commit that referenced this issue Apr 24, 2024
And use it in Triton chat completions and legacy completions.

To activate, configure a prompt_format for the chat_completions model:

```
[models.chat_completions."Mistral-7B-Instruct-v0.2"]
...
prompt_format = "mistral"
```

This will look for templates in /etc/ai-router/templates. The template
for chat completions should go in the chat subdirectory, and for legacy
completions the template should go in the completions subdirectory.

Example templates Mistral-7B-Instruct-v0.2 (exclude the ```):

Chat, based on the template from the Hugging Face Hub, which only
supports the user and assistant roles, to be placed in
/etc/ai-router/templates/chat/mistral.j2:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
{% endif -%}
{% if message['role'] == 'user' -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endfor %}
```

Modified version of the above template that injects a system prompt
before the first user prompt:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{% set mod = 0 -%}
{% set system = '' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'system' and loop.index0 == 0) -%}
{% set mod = 1 -%}
{% set system = message['content'] %}
{% else -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }}
{% endif -%}
{% if message['role'] == 'user' -%}
{% if system and system | length > 0 and loop.index0 == 1 -%}
{{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}}
{% else -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% endif %}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endif -%}
{% endfor -%}
```

Legacy completions do not support roles, so a much simpler template can
be used, in /etc/ai-router/templates/completions/mistral.j2:

```
[INST] {% for message in messages -%}
{{ message -}}
{% endfor %} [/INST]
```

As we use chat_completion models in the config for both chat completions
and legacy completions, configure a prompt_format for a model will
require you to place a template file for both chat completions and
legacy completions in the expected location. If one of them is missing,
ai-router will not start. The error message should point out why:

Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing

If you wish to only enable chat completions for a model, and disable
legacy completions, this can be done by simply raising an exception in
the template:

```
{{ raise_exception('Legacy completions are disabled for this model') }}
```

Closes: #4
stintel added a commit that referenced this issue Apr 24, 2024
And use it in Triton chat completions and legacy completions.

To activate, configure a prompt_format for the chat_completions model:

```
[models.chat_completions."Mistral-7B-Instruct-v0.2"]
...
prompt_format = "mistral"
```

This will look for templates in /etc/ai-router/templates. The template
for chat completions should go in the chat subdirectory, and for legacy
completions the template should go in the completions subdirectory.

Example templates Mistral-7B-Instruct-v0.2 (exclude the ```):

Chat, based on the template from the Hugging Face Hub, which only
supports the user and assistant roles, to be placed in
/etc/ai-router/templates/chat/mistral.j2:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
{% endif -%}
{% if message['role'] == 'user' -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endfor %}
```

Modified version of the above template that injects a system prompt
before the first user prompt:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{% set mod = 0 -%}
{% set system = '' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'system' and loop.index0 == 0) -%}
{% set mod = 1 -%}
{% set system = message['content'] %}
{% else -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }}
{% endif -%}
{% if message['role'] == 'user' -%}
{% if system and system | length > 0 and loop.index0 == 1 -%}
{{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}}
{% else -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% endif %}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endif -%}
{% endfor -%}
```

Legacy completions do not support roles, so a much simpler template can
be used, in /etc/ai-router/templates/completions/mistral.j2:

```
[INST] {% for message in messages -%}
{{ message -}}
{% endfor %} [/INST]
```

As we use chat_completion models in the config for both chat completions
and legacy completions, configure a prompt_format for a model will
require you to place a template file for both chat completions and
legacy completions in the expected location. If one of them is missing,
ai-router will not start. The error message should point out why:

Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing

If you wish to only enable chat completions for a model, and disable
legacy completions, this can be done by simply raising an exception in
the template:

```
{{ raise_exception('Legacy completions are disabled for this model') }}
```

Closes: #4
stintel added a commit that referenced this issue Apr 24, 2024
And use it in Triton chat completions and legacy completions.

To activate, configure a prompt_format for the chat_completions model:

```
[models.chat_completions."Mistral-7B-Instruct-v0.2"]
...
prompt_format = "mistral"
```

This will look for templates in /etc/ai-router/templates. The template
for chat completions should go in the chat subdirectory, and for legacy
completions the template should go in the completions subdirectory.

Example templates Mistral-7B-Instruct-v0.2 (exclude the ```):

Chat, based on the template from the Hugging Face Hub, which only
supports the user and assistant roles, to be placed in
/etc/ai-router/templates/chat/mistral.j2:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
{% endif -%}
{% if message['role'] == 'user' -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endfor %}
```

Modified version of the above template that injects a system prompt
before the first user prompt:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{% set mod = 0 -%}
{% set system = '' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'system' and loop.index0 == 0) -%}
{% set mod = 1 -%}
{% set system = message['content'] %}
{% else -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }}
{% endif -%}
{% if message['role'] == 'user' -%}
{% if system and system | length > 0 and loop.index0 == 1 -%}
{{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}}
{% else -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% endif %}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endif -%}
{% endfor -%}
```

Legacy completions do not support roles, so a much simpler template can
be used, in /etc/ai-router/templates/completions/mistral.j2:

```
[INST] {% for message in messages -%}
{{ message -}}
{% endfor %} [/INST]
```

As we use chat_completion models in the config for both chat completions
and legacy completions, configure a prompt_format for a model will
require you to place a template file for both chat completions and
legacy completions in the expected location. If one of them is missing,
ai-router will not start. The error message should point out why:

Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing

If you wish to only enable chat completions for a model, and disable
legacy completions, this can be done by simply raising an exception in
the template:

```
{{ raise_exception('Legacy completions are disabled for this model') }}
```

Closes: #4
stintel added a commit that referenced this issue Apr 24, 2024
And use it in Triton chat completions and legacy completions.

To activate, configure a prompt_format for the chat_completions model:

```
[models.chat_completions."Mistral-7B-Instruct-v0.2"]
...
prompt_format = "mistral"
```

This will look for templates in /etc/ai-router/templates. The template
for chat completions should go in the chat subdirectory, and for legacy
completions the template should go in the completions subdirectory.

Example templates Mistral-7B-Instruct-v0.2 (exclude the ```):

Chat, based on the template from the Hugging Face Hub, which only
supports the user and assistant roles, to be placed in
/etc/ai-router/templates/chat/mistral.j2:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
{% endif -%}
{% if message['role'] == 'user' -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endfor %}
```

Modified version of the above template that injects a system prompt
before the first user prompt:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{% set mod = 0 -%}
{% set system = '' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'system' and loop.index0 == 0) -%}
{% set mod = 1 -%}
{% set system = message['content'] %}
{% else -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }}
{% endif -%}
{% if message['role'] == 'user' -%}
{% if system and system | length > 0 and loop.index0 == 1 -%}
{{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}}
{% else -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% endif %}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endif -%}
{% endfor -%}
```

Legacy completions do not support roles, so a much simpler template can
be used, in /etc/ai-router/templates/completions/mistral.j2:

```
[INST] {% for message in messages -%}
{{ message -}}
{% endfor %} [/INST]
```

As we use chat_completion models in the config for both chat completions
and legacy completions, configure a prompt_format for a model will
require you to place a template file for both chat completions and
legacy completions in the expected location. If one of them is missing,
ai-router will not start. The error message should point out why:

Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing

If you wish to only enable chat completions for a model, and disable
legacy completions, this can be done by simply raising an exception in
the template:

```
{{ raise_exception('Legacy completions are disabled for this model') }}
```

Closes: #4
stintel added a commit that referenced this issue Apr 25, 2024
And use it in Triton chat completions and legacy completions.

To activate, configure a prompt_format for the chat_completions model:

```
[models.chat_completions."Mistral-7B-Instruct-v0.2"]
...
prompt_format = "mistral"
```

This will look for templates in /etc/ai-router/templates. The template
for chat completions should go in the chat subdirectory, and for legacy
completions the template should go in the completions subdirectory.

Example templates Mistral-7B-Instruct-v0.2 (exclude the ```):

Chat, based on the template from the Hugging Face Hub, which only
supports the user and assistant roles, to be placed in
/etc/ai-router/templates/chat/mistral.j2:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
{% endif -%}
{% if message['role'] == 'user' -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endfor %}
```

Modified version of the above template that injects a system prompt
before the first user prompt:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{% set mod = 0 -%}
{% set system = '' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'system' and loop.index0 == 0) -%}
{% set mod = 1 -%}
{% set system = message['content'] %}
{% else -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }}
{% endif -%}
{% if message['role'] == 'user' -%}
{% if system and system | length > 0 and loop.index0 == 1 -%}
{{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}}
{% else -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% endif %}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endif -%}
{% endfor -%}
```

Legacy completions do not support roles, so a much simpler template can
be used, in /etc/ai-router/templates/completions/mistral.j2:

```
[INST] {% for message in messages -%}
{{ message -}}
{% endfor %} [/INST]
```

As we use chat_completion models in the config for both chat completions
and legacy completions, configure a prompt_format for a model will
require you to place a template file for both chat completions and
legacy completions in the expected location. If one of them is missing,
ai-router will not start. The error message should point out why:

Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing

If you wish to only enable chat completions for a model, and disable
legacy completions, this can be done by simply raising an exception in
the template:

```
{{ raise_exception('Legacy completions are disabled for this model') }}
```

Closes: #4
stintel added a commit that referenced this issue Apr 25, 2024
And use it in Triton chat completions and legacy completions.

To activate, configure a prompt_format for the chat_completions model:

```
[models.chat_completions."Mistral-7B-Instruct-v0.2"]
...
prompt_format = "mistral"
```

This will look for templates in /etc/ai-router/templates. The template
for chat completions should go in the chat subdirectory, and for legacy
completions the template should go in the completions subdirectory.

Example templates Mistral-7B-Instruct-v0.2 (exclude the ```):

Chat, based on the template from the Hugging Face Hub, which only
supports the user and assistant roles, to be placed in
/etc/ai-router/templates/chat/mistral.j2:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
{% endif -%}
{% if message['role'] == 'user' -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endfor %}
```

Modified version of the above template that injects a system prompt
before the first user prompt:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{% set mod = 0 -%}
{% set system = '' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'system' and loop.index0 == 0) -%}
{% set mod = 1 -%}
{% set system = message['content'] %}
{% else -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }}
{% endif -%}
{% if message['role'] == 'user' -%}
{% if system and system | length > 0 and loop.index0 == 1 -%}
{{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}}
{% else -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% endif %}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endif -%}
{% endfor -%}
```

Legacy completions do not support roles, so a much simpler template can
be used, in /etc/ai-router/templates/completions/mistral.j2:

```
[INST] {% for message in messages -%}
{{ message -}}
{% endfor %} [/INST]
```

As we use chat_completion models in the config for both chat completions
and legacy completions, configure a prompt_format for a model will
require you to place a template file for both chat completions and
legacy completions in the expected location. If one of them is missing,
ai-router will not start. The error message should point out why:

Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing

If you wish to only enable chat completions for a model, and disable
legacy completions, this can be done by simply raising an exception in
the template:

```
{{ raise_exception('Legacy completions are disabled for this model') }}
```

Closes: #4
stintel added a commit that referenced this issue Apr 25, 2024
And use it in Triton chat completions and legacy completions.

To activate, configure a prompt_format for the chat_completions model:

```
[models.chat_completions."Mistral-7B-Instruct-v0.2"]
...
prompt_format = "mistral"
```

This will look for templates in /etc/ai-router/templates. The template
for chat completions should go in the chat subdirectory, and for legacy
completions the template should go in the completions subdirectory.

Example templates Mistral-7B-Instruct-v0.2 (exclude the ```):

Chat, based on the template from the Hugging Face Hub, which only
supports the user and assistant roles, to be placed in
/etc/ai-router/templates/chat/mistral.j2:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
{% endif -%}
{% if message['role'] == 'user' -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endfor %}
```

Modified version of the above template that injects a system prompt
before the first user prompt:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{% set mod = 0 -%}
{% set system = '' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'system' and loop.index0 == 0) -%}
{% set mod = 1 -%}
{% set system = message['content'] %}
{% else -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }}
{% endif -%}
{% if message['role'] == 'user' -%}
{% if system and system | length > 0 and loop.index0 == 1 -%}
{{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}}
{% else -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% endif %}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endif -%}
{% endfor -%}
```

Legacy completions do not support roles, so a much simpler template can
be used, in /etc/ai-router/templates/completions/mistral.j2:

```
[INST] {% for message in messages -%}
{{ message -}}
{% endfor %} [/INST]
```

As we use chat_completion models in the config for both chat completions
and legacy completions, configure a prompt_format for a model will
require you to place a template file for both chat completions and
legacy completions in the expected location. If one of them is missing,
ai-router will not start. The error message should point out why:

Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing

If you wish to only enable chat completions for a model, and disable
legacy completions, this can be done by simply raising an exception in
the template:

```
{{ raise_exception('Legacy completions are disabled for this model') }}
```

Closes: #4
stintel added a commit that referenced this issue Apr 25, 2024
And use it in Triton chat completions and legacy completions.

To activate, configure a prompt_format for the chat_completions model:

```
[models.chat_completions."Mistral-7B-Instruct-v0.2"]
...
prompt_format = "mistral"
```

This will look for templates in /etc/ai-router/templates. The template
for chat completions should go in the chat subdirectory, and for legacy
completions the template should go in the completions subdirectory.

Example templates Mistral-7B-Instruct-v0.2 (exclude the ```):

Chat, based on the template from the Hugging Face Hub
(https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/blob/main/tokenizer_config.json#L42),
which only supports the user and assistant roles, to be placed in
/etc/ai-router/templates/chat/mistral.j2:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
{% endif -%}
{% if message['role'] == 'user' -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endfor %}
```

Modified version of the above template that injects a system prompt
before the first user prompt:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{% set mod = 0 -%}
{% set system = '' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'system' and loop.index0 == 0) -%}
{% set mod = 1 -%}
{% set system = message['content'] %}
{% else -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }}
{% endif -%}
{% if message['role'] == 'user' -%}
{% if system and system | length > 0 and loop.index0 == 1 -%}
{{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}}
{% else -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% endif %}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endif -%}
{% endfor -%}
```

Legacy completions do not support roles, so a much simpler template can
be used, in /etc/ai-router/templates/completions/mistral.j2:

```
[INST] {% for message in messages -%}
{{ message -}}
{% endfor %} [/INST]
```

As we use chat_completion models in the config for both chat completions
and legacy completions, configure a prompt_format for a model will
require you to place a template file for both chat completions and
legacy completions in the expected location. If one of them is missing,
ai-router will not start. The error message should point out why:

Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing

If you wish to only enable chat completions for a model, and disable
legacy completions, this can be done by simply raising an exception in
the template:

```
{{ raise_exception('Legacy completions are disabled for this model') }}
```

Closes: #4
stintel added a commit that referenced this issue Apr 25, 2024
And use it in Triton chat completions and legacy completions.

To activate, configure a prompt_format for the chat_completions model:

```
[models.chat_completions."Mistral-7B-Instruct-v0.2"]
...
prompt_format = "mistral"
```

This will look for templates in /etc/ai-router/templates. The template
for chat completions should go in the chat subdirectory, and for legacy
completions the template should go in the completions subdirectory.

Example templates Mistral-7B-Instruct-v0.2 (exclude the ```):

Chat, based on the template from the Hugging Face Hub
(https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/blob/main/tokenizer_config.json#L42),
which only supports the user and assistant roles, to be placed in
/etc/ai-router/templates/chat/mistral.j2:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
{% endif -%}
{% if message['role'] == 'user' -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endfor %}
```

Modified version of the above template that injects a system prompt
before the first user prompt:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{% set mod = 0 -%}
{% set system = '' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'system' and loop.index0 == 0) -%}
{% set mod = 1 -%}
{% set system = message['content'] %}
{% else -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }}
{% endif -%}
{% if message['role'] == 'user' -%}
{% if system and system | length > 0 and loop.index0 == 1 -%}
{{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}}
{% else -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% endif %}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endif -%}
{% endfor -%}
```

Legacy completions do not support roles, so a much simpler template can
be used, in /etc/ai-router/templates/completions/mistral.j2:

```
[INST] {% for message in messages -%}
{{ message -}}
{% endfor %} [/INST]
```

As we use chat_completion models in the config for both chat completions
and legacy completions, configure a prompt_format for a model will
require you to place a template file for both chat completions and
legacy completions in the expected location. If one of them is missing,
ai-router will not start. The error message should point out why:

Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing

If you wish to only enable chat completions for a model, and disable
legacy completions, this can be done by simply raising an exception in
the template:

```
{{ raise_exception('Legacy completions are disabled for this model') }}
```

Closes: #4
stintel added a commit that referenced this issue Apr 26, 2024
And use it in Triton chat completions and legacy completions.

To activate, configure a prompt_format for the chat_completions model:

```
[models.chat_completions."Mistral-7B-Instruct-v0.2"]
...
prompt_format = "mistral"
```

This will look for templates in /etc/ai-router/templates. The template
for chat completions should go in the chat subdirectory, and for legacy
completions the template should go in the completions subdirectory.

Example templates Mistral-7B-Instruct-v0.2 (exclude the ```):

Chat, based on the template from the Hugging Face Hub
(https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/blob/main/tokenizer_config.json#L42),
which only supports the user and assistant roles, to be placed in
/etc/ai-router/templates/chat/mistral.j2:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
{% endif -%}
{% if message['role'] == 'user' -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endfor %}
```

Modified version of the above template that injects a system prompt
before the first user prompt:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{% set mod = 0 -%}
{% set system = '' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'system' and loop.index0 == 0) -%}
{% set mod = 1 -%}
{% set system = message['content'] %}
{% else -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }}
{% endif -%}
{% if message['role'] == 'user' -%}
{% if system and system | length > 0 and loop.index0 == 1 -%}
{{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}}
{% else -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% endif %}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endif -%}
{% endfor -%}
```

Legacy completions do not support roles, so a much simpler template can
be used, in /etc/ai-router/templates/completions/mistral.j2:

```
[INST] {% for message in messages -%}
{{ message -}}
{% endfor %} [/INST]
```

As we use chat_completion models in the config for both chat completions
and legacy completions, configure a prompt_format for a model will
require you to place a template file for both chat completions and
legacy completions in the expected location. If one of them is missing,
ai-router will not start. The error message should point out why:

Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing

If you wish to only enable chat completions for a model, and disable
legacy completions, this can be done by simply raising an exception in
the template:

```
{{ raise_exception('Legacy completions are disabled for this model') }}
```

Closes: #4
stintel added a commit that referenced this issue May 16, 2024
And use it in Triton chat completions and legacy completions.

To activate, configure a prompt_format for the chat_completions model:

```
[models.chat_completions."Mistral-7B-Instruct-v0.2"]
...
prompt_format = "mistral"
```

This will look for templates in /etc/ai-router/templates. The template
for chat completions should go in the chat subdirectory, and for legacy
completions the template should go in the completions subdirectory.

Example templates Mistral-7B-Instruct-v0.2 (exclude the ```):

Chat, based on the template from the Hugging Face Hub
(https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/blob/main/tokenizer_config.json#L42),
which only supports the user and assistant roles, to be placed in
/etc/ai-router/templates/chat/mistral.j2:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
{% endif -%}
{% if message['role'] == 'user' -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endfor %}
```

Modified version of the above template that injects a system prompt
before the first user prompt:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{% set mod = 0 -%}
{% set system = '' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'system' and loop.index0 == 0) -%}
{% set mod = 1 -%}
{% set system = message['content'] %}
{% else -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }}
{% endif -%}
{% if message['role'] == 'user' -%}
{% if system and system | length > 0 and loop.index0 == 1 -%}
{{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}}
{% else -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% endif %}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endif -%}
{% endfor -%}
```

Legacy completions do not support roles, so a much simpler template can
be used, in /etc/ai-router/templates/completions/mistral.j2:

```
[INST] {% for message in messages -%}
{{ message -}}
{% endfor %} [/INST]
```

As we use chat_completion models in the config for both chat completions
and legacy completions, configure a prompt_format for a model will
require you to place a template file for both chat completions and
legacy completions in the expected location. If one of them is missing,
ai-router will not start. The error message should point out why:

Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing

If you wish to only enable chat completions for a model, and disable
legacy completions, this can be done by simply raising an exception in
the template:

```
{{ raise_exception('Legacy completions are disabled for this model') }}
```

Closes: #4
stintel added a commit that referenced this issue May 16, 2024
And use it in Triton chat completions and legacy completions.

To activate, configure a prompt_format for the chat_completions model:

```
[models.chat_completions."Mistral-7B-Instruct-v0.2"]
...
prompt_format = "mistral"
```

This will look for templates in /etc/ai-router/templates. The template
for chat completions should go in the chat subdirectory, and for legacy
completions the template should go in the completions subdirectory.

Example templates Mistral-7B-Instruct-v0.2 (exclude the ```):

Chat, based on the template from the Hugging Face Hub
(https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/blob/main/tokenizer_config.json#L42),
which only supports the user and assistant roles, to be placed in
/etc/ai-router/templates/chat/mistral.j2:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
{% endif -%}
{% if message['role'] == 'user' -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endfor %}
```

Modified version of the above template that injects a system prompt
before the first user prompt:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{% set mod = 0 -%}
{% set system = '' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'system' and loop.index0 == 0) -%}
{% set mod = 1 -%}
{% set system = message['content'] %}
{% else -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }}
{% endif -%}
{% if message['role'] == 'user' -%}
{% if system and system | length > 0 and loop.index0 == 1 -%}
{{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}}
{% else -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% endif %}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endif -%}
{% endfor -%}
```

Legacy completions do not support roles, so a much simpler template can
be used, in /etc/ai-router/templates/completions/mistral.j2:

```
[INST] {% for message in messages -%}
{{ message -}}
{% endfor %} [/INST]
```

As we use chat_completion models in the config for both chat completions
and legacy completions, configure a prompt_format for a model will
require you to place a template file for both chat completions and
legacy completions in the expected location. If one of them is missing,
ai-router will not start. The error message should point out why:

Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing

If you wish to only enable chat completions for a model, and disable
legacy completions, this can be done by simply raising an exception in
the template:

```
{{ raise_exception('Legacy completions are disabled for this model') }}
```

Closes: #4
stintel added a commit that referenced this issue May 16, 2024
And use it in Triton chat completions and legacy completions.

To activate, configure a prompt_format for the chat_completions model:

```
[models.chat_completions."Mistral-7B-Instruct-v0.2"]
...
prompt_format = "mistral"
```

This will look for templates in /etc/ai-router/templates. The template
for chat completions should go in the chat subdirectory, and for legacy
completions the template should go in the completions subdirectory.

Example templates Mistral-7B-Instruct-v0.2 (exclude the ```):

Chat, based on the template from the Hugging Face Hub
(https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/blob/main/tokenizer_config.json#L42),
which only supports the user and assistant roles, to be placed in
/etc/ai-router/templates/chat/mistral.j2:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
{% endif -%}
{% if message['role'] == 'user' -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endfor %}
```

Modified version of the above template that injects a system prompt
before the first user prompt:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{% set mod = 0 -%}
{% set system = '' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'system' and loop.index0 == 0) -%}
{% set mod = 1 -%}
{% set system = message['content'] %}
{% else -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }}
{% endif -%}
{% if message['role'] == 'user' -%}
{% if system and system | length > 0 and loop.index0 == 1 -%}
{{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}}
{% else -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% endif %}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endif -%}
{% endfor -%}
```

Legacy completions do not support roles, so a much simpler template can
be used, in /etc/ai-router/templates/completions/mistral.j2:

```
[INST] {% for message in messages -%}
{{ message -}}
{% endfor %} [/INST]
```

As we use chat_completion models in the config for both chat completions
and legacy completions, configure a prompt_format for a model will
require you to place a template file for both chat completions and
legacy completions in the expected location. If one of them is missing,
ai-router will not start. The error message should point out why:

Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing

If you wish to only enable chat completions for a model, and disable
legacy completions, this can be done by simply raising an exception in
the template:

```
{{ raise_exception('Legacy completions are disabled for this model') }}
```

Closes: #4
stintel added a commit that referenced this issue May 16, 2024
And use it in Triton chat completions and legacy completions.

To activate, configure a prompt_format for the chat_completions model:

```
[models.chat_completions."Mistral-7B-Instruct-v0.2"]
...
prompt_format = "mistral"
```

This will look for templates in /etc/ai-router/templates. The template
for chat completions should go in the chat subdirectory, and for legacy
completions the template should go in the completions subdirectory.

Example templates Mistral-7B-Instruct-v0.2 (exclude the ```):

Chat, based on the template from the Hugging Face Hub
(https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/blob/main/tokenizer_config.json#L42),
which only supports the user and assistant roles, to be placed in
/etc/ai-router/templates/chat/mistral.j2:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
{% endif -%}
{% if message['role'] == 'user' -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endfor %}
```

Modified version of the above template that injects a system prompt
before the first user prompt:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{% set mod = 0 -%}
{% set system = '' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'system' and loop.index0 == 0) -%}
{% set mod = 1 -%}
{% set system = message['content'] %}
{% else -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }}
{% endif -%}
{% if message['role'] == 'user' -%}
{% if system and system | length > 0 and loop.index0 == 1 -%}
{{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}}
{% else -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% endif %}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endif -%}
{% endfor -%}
```

Legacy completions do not support roles, so a much simpler template can
be used, in /etc/ai-router/templates/completions/mistral.j2:

```
[INST] {% for message in messages -%}
{{ message -}}
{% endfor %} [/INST]
```

As we use chat_completion models in the config for both chat completions
and legacy completions, configure a prompt_format for a model will
require you to place a template file for both chat completions and
legacy completions in the expected location. If one of them is missing,
ai-router will not start. The error message should point out why:

Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing

If you wish to only enable chat completions for a model, and disable
legacy completions, this can be done by simply raising an exception in
the template:

```
{{ raise_exception('Legacy completions are disabled for this model') }}
```

Closes: #4
stintel added a commit that referenced this issue May 22, 2024
And use it in Triton chat completions and legacy completions.

To activate, configure a prompt_format for the chat_completions model:

```
[models.chat_completions."Mistral-7B-Instruct-v0.2"]
...
prompt_format = "mistral"
```

This will look for templates in /etc/ai-router/templates. The template
for chat completions should go in the chat subdirectory, and for legacy
completions the template should go in the completions subdirectory.

Example templates Mistral-7B-Instruct-v0.2 (exclude the ```):

Chat, based on the template from the Hugging Face Hub
(https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/blob/main/tokenizer_config.json#L42),
which only supports the user and assistant roles, to be placed in
/etc/ai-router/templates/chat/mistral.j2:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
{% endif -%}
{% if message['role'] == 'user' -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endfor %}
```

Modified version of the above template that injects a system prompt
before the first user prompt:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{% set mod = 0 -%}
{% set system = '' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'system' and loop.index0 == 0) -%}
{% set mod = 1 -%}
{% set system = message['content'] %}
{% else -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }}
{% endif -%}
{% if message['role'] == 'user' -%}
{% if system and system | length > 0 and loop.index0 == 1 -%}
{{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}}
{% else -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% endif %}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endif -%}
{% endfor -%}
```

Legacy completions do not support roles, so a much simpler template can
be used, in /etc/ai-router/templates/completions/mistral.j2:

```
[INST] {% for message in messages -%}
{{ message -}}
{% endfor %} [/INST]
```

As we use chat_completion models in the config for both chat completions
and legacy completions, configure a prompt_format for a model will
require you to place a template file for both chat completions and
legacy completions in the expected location. If one of them is missing,
ai-router will not start. The error message should point out why:

Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing

If you wish to only enable chat completions for a model, and disable
legacy completions, this can be done by simply raising an exception in
the template:

```
{{ raise_exception('Legacy completions are disabled for this model') }}
```

Closes: #4
stintel added a commit that referenced this issue May 22, 2024
And use it in Triton chat completions and legacy completions.

To activate, configure a prompt_format for the chat_completions model:

```
[models.chat_completions."Mistral-7B-Instruct-v0.2"]
...
prompt_format = "mistral"
```

This will look for templates in /etc/ai-router/templates. The template
for chat completions should go in the chat subdirectory, and for legacy
completions the template should go in the completions subdirectory.

Example templates Mistral-7B-Instruct-v0.2 (exclude the ```):

Chat, based on the template from the Hugging Face Hub
(https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/blob/main/tokenizer_config.json#L42),
which only supports the user and assistant roles, to be placed in
/etc/ai-router/templates/chat/mistral.j2:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
{% endif -%}
{% if message['role'] == 'user' -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endfor %}
```

Modified version of the above template that injects a system prompt
before the first user prompt:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{% set mod = 0 -%}
{% set system = '' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'system' and loop.index0 == 0) -%}
{% set mod = 1 -%}
{% set system = message['content'] %}
{% else -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }}
{% endif -%}
{% if message['role'] == 'user' -%}
{% if system and system | length > 0 and loop.index0 == 1 -%}
{{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}}
{% else -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% endif %}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endif -%}
{% endfor -%}
```

Legacy completions do not support roles, so a much simpler template can
be used, in /etc/ai-router/templates/completions/mistral.j2:

```
[INST] {% for message in messages -%}
{{ message -}}
{% endfor %} [/INST]
```

As we use chat_completion models in the config for both chat completions
and legacy completions, configure a prompt_format for a model will
require you to place a template file for both chat completions and
legacy completions in the expected location. If one of them is missing,
ai-router will not start. The error message should point out why:

Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing

If you wish to only enable chat completions for a model, and disable
legacy completions, this can be done by simply raising an exception in
the template:

```
{{ raise_exception('Legacy completions are disabled for this model') }}
```

Closes: #4
stintel added a commit that referenced this issue May 31, 2024
And use it in Triton chat completions and legacy completions.

To activate, configure a prompt_format for the chat_completions model:

```
[models.chat_completions."Mistral-7B-Instruct-v0.2"]
...
prompt_format = "mistral"
```

This will look for templates in /etc/ai-router/templates. The template
for chat completions should go in the chat subdirectory, and for legacy
completions the template should go in the completions subdirectory.

Example templates Mistral-7B-Instruct-v0.2 (exclude the ```):

Chat, based on the template from the Hugging Face Hub
(https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/blob/main/tokenizer_config.json#L42),
which only supports the user and assistant roles, to be placed in
/etc/ai-router/templates/chat/mistral.j2:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
{% endif -%}
{% if message['role'] == 'user' -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endfor %}
```

Modified version of the above template that injects a system prompt
before the first user prompt:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{% set mod = 0 -%}
{% set system = '' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'system' and loop.index0 == 0) -%}
{% set mod = 1 -%}
{% set system = message['content'] %}
{% else -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }}
{% endif -%}
{% if message['role'] == 'user' -%}
{% if system and system | length > 0 and loop.index0 == 1 -%}
{{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}}
{% else -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% endif %}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endif -%}
{% endfor -%}
```

Legacy completions do not support roles, so a much simpler template can
be used, in /etc/ai-router/templates/completions/mistral.j2:

```
[INST] {% for message in messages -%}
{{ message -}}
{% endfor %} [/INST]
```

As we use chat_completion models in the config for both chat completions
and legacy completions, configure a prompt_format for a model will
require you to place a template file for both chat completions and
legacy completions in the expected location. If one of them is missing,
ai-router will not start. The error message should point out why:

Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing

If you wish to only enable chat completions for a model, and disable
legacy completions, this can be done by simply raising an exception in
the template:

```
{{ raise_exception('Legacy completions are disabled for this model') }}
```

Closes: #4
stintel added a commit that referenced this issue May 31, 2024
And use it in Triton chat completions and legacy completions.

To activate, configure a prompt_format for the chat_completions model:

```
[models.chat_completions."Mistral-7B-Instruct-v0.2"]
...
prompt_format = "mistral"
```

This will look for templates in /etc/ai-router/templates. The template
for chat completions should go in the chat subdirectory, and for legacy
completions the template should go in the completions subdirectory.

Example templates Mistral-7B-Instruct-v0.2 (exclude the ```):

Chat, based on the template from the Hugging Face Hub
(https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/blob/main/tokenizer_config.json#L42),
which only supports the user and assistant roles, to be placed in
/etc/ai-router/templates/chat/mistral.j2:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
{% endif -%}
{% if message['role'] == 'user' -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endfor %}
```

Modified version of the above template that injects a system prompt
before the first user prompt:

```
{%- set bos_token = '<s>' -%}
{% set eos_token = '</s>' -%}
{% set mod = 0 -%}
{% set system = '' -%}
{{ bos_token -}}
{%- for message in messages -%}
{% if (message['role'] == 'system' and loop.index0 == 0) -%}
{% set mod = 1 -%}
{% set system = message['content'] %}
{% else -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == mod) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/... ' ) }}
{% endif -%}
{% if message['role'] == 'user' -%}
{% if system and system | length > 0 and loop.index0 == 1 -%}
{{ '[INST] ' + system + '\n' + message['content'] + ' [/INST]' -}}
{% else -%}
{{ '[INST] ' + message['content'] + ' [/INST]' -}}
{% endif %}
{% elif message ['role'] == 'assistant' -%}
{{ ' ' + message['content'] + eos_token -}}
{% else -%}
{{ raise_exception('Only user and assistant roles are supported!') }}
{% endif -%}
{% endif -%}
{% endfor -%}
```

Legacy completions do not support roles, so a much simpler template can
be used, in /etc/ai-router/templates/completions/mistral.j2:

```
[INST] {% for message in messages -%}
{{ message -}}
{% endfor %} [/INST]
```

As we use chat_completion models in the config for both chat completions
and legacy completions, configure a prompt_format for a model will
require you to place a template file for both chat completions and
legacy completions in the expected location. If one of them is missing,
ai-router will not start. The error message should point out why:

Error: config file validation failed: model `meta-llama/Llama-2-70b-chat-hf` has prompt_format configured but template legacy completions (/etc/ai-router/templates/completions/llama.j2) is missing

If you wish to only enable chat completions for a model, and disable
legacy completions, this can be done by simply raising an exception in
the template:

```
{{ raise_exception('Legacy completions are disabled for this model') }}
```

Closes: #4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants