forked from janhq/jan
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0xSage
committed
Nov 21, 2023
1 parent
5691d9d
commit 384947d
Showing
1 changed file
with
72 additions
and
159 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,193 +1,106 @@ | ||
# Models Spec v1 | ||
:::warning | ||
--- | ||
title: Models | ||
--- | ||
|
||
Draft Specification: functionality has not been implemented yet. | ||
:::caution | ||
|
||
Feedback: [HackMD: Models Spec](https://hackmd.io/ulO3uB1AQCqLa5SAAMFOQw) | ||
Draft Specification: functionality has not been implemented yet. | ||
|
||
::: | ||
|
||
## Overview | ||
|
||
Jan's Model API aims to be as similar as possible to [OpenAI's Models API](https://platform.openai.com/docs/api-reference/models), with additional methods for managing and running models locally. | ||
|
||
### Objectives | ||
|
||
- Users can download, import and delete models | ||
- Users can use remote models (e.g. OpenAI, OpenRouter) | ||
- Users can start/stop models and use them in a thread (or via Chat Completions API) | ||
- User can configure default model parameters at the model level (to be overridden later at `chat/completions` or `assistant`/`thread` level) | ||
|
||
## Design Principle | ||
- Don't go for simplicity yet | ||
- Underlying abstractions are changing very frequently (e.g. ggufv3) | ||
- Provide a minimalist framework over the abstractions that takes care of coordination between tools | ||
- Show direct system state for now | ||
|
||
## KIVs to Model Spec v2 | ||
- OpenAI and Azure OpenAI | ||
- Importing via URL | ||
- Multiple Partitions | ||
|
||
## Models folder structure | ||
- Models in Jan are stored in the `/models` folder. | ||
- Models are stored and organized by folders, which are atomic representations of a model for easy packaging and version control. | ||
```sh | ||
/jan/ # Jan root folder | ||
/models/ | ||
llama2-70b-q4_k_m/ | ||
model-binary-1.gguf | ||
In Jan, models are primary entities with the following capabilities: | ||
|
||
- Users can import, configure, and run models locally. | ||
- An [OpenAI Model API](https://platform.openai.com/docs/api-reference/models) compatible endpoint at `localhost:3000/v1/models`. | ||
- Supported model formats: `ggufv3`, and more. | ||
|
||
## Folder Structure | ||
|
||
- Models are stored in the `/models` folder. | ||
- Models are organized by individual folders, each containing the binaries and configurations needed to run the model. This makes for easy packaging and sharing. | ||
- Model folder names are unique and used as `model_id` default values. | ||
|
||
```bash | ||
jan/ # Jan root folder | ||
models/ | ||
llama2-70b-q4_k_m/ # Example: standard GGUF model | ||
model.json | ||
mistral-7b-gguf-q3_k_l/ | ||
model-binary-1.gguf | ||
mistral-7b-gguf-q3_k_l/ # Example: quantizations are separate folders | ||
model.json | ||
mistral-7b-q3-K-L.gguf | ||
mistral-7b-gguf-q8_k_m./ | ||
mistral-7b-gguf-q8_k_m/ # Example: quantizations are separate folders | ||
model.json | ||
mistral-7b-q8_k_k.gguf | ||
random-model-q4_k_m/ | ||
random-model-q4_k_m.bin | ||
random-model-q4_k_m.json # (autogenerated) | ||
llava-ggml-Q5/ # Example: model with many partitions | ||
model.json | ||
mmprj.bin | ||
model_q5.ggml | ||
``` | ||
|
||
## Model Object | ||
- Jan represents models as `json`-based Model Object files, known colloquially as `model.json`. | ||
-Jan aims for rough equivalence with [OpenAI's Model Object](https://platform.openai.com/docs/api-reference/models/object) with additional properties to support local models. | ||
- Jan's models follow a `model.json` naming convention, and are built to be extremely lightweight, with the only mandatory field being a `source_url` to download the model binaries. | ||
|
||
### Types of Models | ||
## `model.json` | ||
|
||
There are 3 types of models. | ||
- Each `model` folder contains a `model.json` file, which is a representation of a model. | ||
- `model.json` contains metadata and default parameters used to run a model. | ||
- The only required field is `source_url`. | ||
|
||
- [x] Local model, yet-to-be downloaded (we have the URL) | ||
- [x] Local model (downloaded) | ||
### GGUF Example | ||
|
||
## Examples | ||
### Local Model | ||
Here's a standard example `model.json` for a GGUF model. | ||
|
||
- Model has 1 binary `model-zephyr-7B.json` | ||
- See [source](https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF/) | ||
- `source_url`: https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF/. | ||
|
||
#### `model.json` | ||
```json | ||
"type": "model", | ||
"version": "1", | ||
"id": "zephyr-7b" // used in chat-completions model_name, matches folder name | ||
"name": "Zephyr 7B" | ||
"owned_by": "" // OpenAI compatibility | ||
"created": 1231231 // unix timestamp | ||
"description": "..." | ||
"state": enum[null, "downloading", "available"] | ||
// KIV: remote: // Subsequent | ||
// KIV: type: "llm" // For future where there are different types | ||
"format": "ggufv3", // State format, rather than engine | ||
"source_url": "https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF/blob/main/zephyr-7b-beta.Q4_K_M.gguf", | ||
"settings" { | ||
"ctx_len": "2048", | ||
"ngl": "100", | ||
"embedding": "true", | ||
"n_parallel": "4", | ||
// KIV: "pre_prompt": "A chat between a curious user and an artificial intelligence", | ||
// KIV:"user_prompt": "USER: ", | ||
// KIV: "ai_prompt": "ASSISTANT: " | ||
"type": "model", // Defaults to "model" | ||
"version": "1", // Defaults to 1 | ||
"id": "zephyr-7b" // Defaults to foldername | ||
"name": "Zephyr 7B" // Defaults to foldername | ||
"owned_by": "you" // Defaults to you | ||
"created": 1231231 // Defaults to file creation time | ||
"description": "" | ||
"state": enum[null, "downloading", "ready", "starting", "stopping", ...] | ||
"format": "ggufv3", // Defaults to "ggufv3" | ||
"settings": { // Models are initialized with these settings | ||
"ctx_len": "2048", | ||
"ngl": "100", | ||
"embedding": "true", | ||
"n_parallel": "4", | ||
// KIV: "pre_prompt": "A chat between a curious user and an artificial intelligence", | ||
// KIV:"user_prompt": "USER: ", | ||
// KIV: "ai_prompt": "ASSISTANT: " | ||
} | ||
"parameters": { | ||
"temperature": "0.7", | ||
"token_limit": "2048", | ||
"top_k": "0", | ||
"top_p": "1", | ||
"stream": "true" | ||
}, | ||
"metadata": {} | ||
"assets": [ | ||
"file://.../zephyr-7b-q4_k_m.bin", | ||
"https://huggin" | ||
] | ||
``` | ||
|
||
### Deferred Download | ||
```sh | ||
models/ | ||
mistral-7b/ | ||
model.json | ||
hermes-7b/ | ||
model.json | ||
"parameters": { // Models are called with these parameters | ||
"temperature": "0.7", | ||
"token_limit": "2048", | ||
"top_k": "0", | ||
"top_p": "1", | ||
"stream": "true" | ||
}, | ||
"metadata": {} // Defaults to {} | ||
"assets": [ // Filepaths to model binaries; Defaults to current dir | ||
"file://.../zephyr-7b-q4_k_m.bin", | ||
] | ||
``` | ||
- Jan ships with a default model folders containing recommended models | ||
- Only the Model Object `json` files are included | ||
- Users must later explicitly download the model binaries | ||
|
||
### Multiple model partitions | ||
## API Reference | ||
|
||
```sh | ||
llava-ggml-Q5/ | ||
model.json | ||
mmprj.bin | ||
model_q5.ggml | ||
``` | ||
|
||
### Locally fine-tuned/ custom imported model | ||
|
||
```sh | ||
llama-70b-finetune/ | ||
llama-70b-finetune-q5.json | ||
.bin | ||
``` | ||
Jan's Model API is compatible with [OpenAI's Models API](https://platform.openai.com/docs/api-reference/models), with additional methods for managing and running models locally. | ||
|
||
## Models API | ||
|
||
| Method | API Call | OpenAI-equivalent | | ||
| -------------- | ------------------------------- | ----------------- | | ||
| List Models | GET /v1/models | true | | ||
| Get Model | GET /v1/models/{model_id} | true | | ||
| Delete Model | DELETE /v1/models/{model_id} | true | | ||
| Start Model | PUT /v1/models/{model_id}/start | no | | ||
| Stop Model | PUT /v1/models/{model_id}/start | no | | ||
| Download Model | POST /v1/models/ | no | | ||
See [Jan Models API](https://jan.ai/api-reference#tag/Models) | ||
|
||
## Importing Models | ||
|
||
:::warning | ||
|
||
- This has not been confirmed | ||
- Jan should auto-detect and create folders automatically | ||
- Jan's UI will allow users to rename folders and add metadata | ||
|
||
::: | ||
|
||
You can import a model by just dragging it into the `/models` folder, similar to Oobabooga. | ||
|
||
- Jan will detect and generate a corresponding `model.json` file based on model asset filename | ||
- Jan will move it into its own `/model-id` folder once you define a `model-id` via the UI | ||
- Jan will populate the model's `/model-id/model.json` as you add metadata through the UI | ||
|
||
### Jan Model Importers extension | ||
|
||
:::caution | ||
|
||
- This is only an idea, has not been confirmed as part of spec | ||
This is current under development. | ||
|
||
::: | ||
|
||
Jan builds "importers" for users to seamlessly import models from a single URL. | ||
|
||
We currently only provide this for [TheBloke models on Huggingface](https://huggingface.co/TheBloke) (i.e. one of the patron saints of llama.cpp), but we plan to add more in the future. | ||
|
||
Currently, pasting a TheBloke Huggingface link in the Explore Models page will fire an importer, resulting in an: | ||
|
||
- Nicely-formatted model card | ||
- Fully-annotated `model.json` file | ||
|
||
### ADR | ||
- `<model-id>.json`, i.e. the [Model Object](#model-object) | ||
- Why multiple folders? | ||
- Model Partitions (e.g. Llava in the future) | ||
- Why a folder and config file for each quantization? | ||
- Differently quantized models are completely different models | ||
- Milestone -1st December: | ||
- Catalogue of recommended models, anything else = mutate the filesystem | ||
- [@linh] Should we have an API to help quantize models? | ||
- Could be a really cool feature to have (i.e. import from HF, quantize model, run on CPU) | ||
- We should have a helper function to handle hardware compatibility | ||
- POST model/{model-id}/compatibility | ||
- [louis] We are combining states & manifest | ||
- Need to think through | ||
You can import a model by dragging the model binary or gguf file into the `/models` folder. | ||
|
||
- Jan automatically generates a corresponding `model.json` file based on the binary filename. | ||
- Jan automatically organizes it into its own `/models/model-id` folder. | ||
- Jan automatically populates the `model.json` properties, which you can subsequently modify. |