REST API? #26

Spiritdude · 2023-06-21T14:29:11Z

What's the best practice to interface ctransformers API and expose it as REST API?

I looked at https://github.com/jquesnelle/transformers-openai-api and tried to change it to use ctransformer, but then I stopped as the changes required increased making it hard(er) to keep in-sync with the original.

matthoffner · 2023-06-21T15:25:32Z

I've been writing some generic FastAPIs to try various ggml libraries using ctransformers:

https://huggingface.co/spaces/matthoffner/wizardcoder-ggml
https://huggingface.co/spaces/matthoffner/starchat-ggml

I've made some recent updates to make the API match the OpenAI response format better so it can be used like llama.cpp + OpenAI.

marella · 2023-06-22T21:40:10Z

Nice @matthoffner

@Spiritdude I have been thinking of adding an OpenAI-compatible API server but not getting the time to do it. For now if you want to create your own API server, you can use the examples provided by matthoffner as reference. One thing to note is that the models are not thread-safe so you will have to use a lock to prevent concurrent calls.

I'm also trying to make ctransformers compatible with 🤗 Transformers so that it can be used as a drop-in replacement in other projects but it is WIP. See #13 (comment)

Spiritdude · 2023-06-23T07:38:54Z

@marella thanks for ctransformer and all the efforts, very appreciated for the attention to details you have (for doing ctransformer and considerations of compatibility). I'm mainly using llama_python.server and keep all my apps using RESTAPI for compatibility.

@matthoffner great, thanks - will use your small code snippet in the meanwhile.

ParisNeo · 2023-06-23T18:23:48Z

If you want there is already a rest api that supports ctransformer:
https://github.com/ParisNeo/lollms

it allows you to generate text using a distributed or centralized architecture with multiple service nodes and you can connect to it from one or many clients.

It is a many to many server.

It also has some examples to use it as a library. I have creates a front end for it like playground:
https://github.com/ParisNeo/lollms-playground

When you install it, by just doing:

pip install lollms

then you set it up using :

lollms-settings

with this you can select binding (ctransformer for example)
and then you can select a model
then you can also select one of the preconditionned personalities (i have 260 of them)

then you run

lollms-server

This will run a localhost:9600 service

you can run many of them using this:

lollms-server --host 0.0.0.0 --port 9601

you can select any host you want, then you can run the playground or create your own code.
you can either use or not the personality preconditionning

I use socketio for generation in order to be able to send

socket.emit('generate_text', { prompt, personality: -1, n_predicts: 1024 ,
parameters: {
temperature: temperatureValue,
top_k: topKValue,
top_p: topPValue,
repeat_penalty: repeatPenaltyValue, // Update with desired repeat penalty value
repeat_last_n: repeatLastNValue, // Update with desired repeat_last_n value
seed: parseInt(seedValue)
}});

personality is the id of mounted personalities in the server. You can mount many allowing the user to choose.

matthoffner · 2023-06-27T21:02:02Z

Thanks @ParisNeo very cool

I'm hoping to help get ctransformers added to OpenLLM as well:

bentoml/OpenLLM#24

lucasjinreal · 2023-08-29T03:55:28Z

@matthoffner Hello, where is your code snippet using ctransfomers as openai-like server backend? Would like use it.

matthoffner · 2023-08-29T04:50:42Z

These should still work:

HuggingFace: https://huggingface.co/spaces/matthoffner/wizardcoder-ggml
Github: https://github.com/matthoffner/ggml-fastapi

lucasjinreal · 2023-08-29T08:18:17Z

@matthoffner thanks so much, just found it in files tab.

However, I found some weried issue.

The detokenized text return to my client dropped space:

I changed nothing but return the new_text from chat_chunk and streaming to client, do u got any idea?

matthoffner · 2023-08-29T17:20:28Z

Thanks @lucasjinreal, I haven't seen this issue.

Here are some recent UIs I built around WizardCoder if you are looking for some client side examples.

Live HTML Editor: https://github.com/matthoffner/wizardcoder-sandbox
Chatbot-ui: https://huggingface.co/spaces/matthoffner/starchat-ui

lucasjinreal · 2023-08-30T03:07:27Z

@matthoffner Can u tell me how did u resolve the white space issue? I printted out in stream one by one, it actually didn't have white space between. Meanwhile, it is not my client issue, this client can runs many openai-like server from mine, just not yet for ctransformers.

lucasjinreal · 2023-08-30T03:10:02Z

@matthoffner

async def stream_response(tokens, llm):
    try:
        iterator: Generator = llm.generate(tokens)
        for chat_chunk in iterator:
            print(llm.detokenize(chat_chunk), end='', flush=True)
            response = {
                'choices': [
                    {
                        'message': {
                            'role': 'system',
                            'content': llm.detokenize(chat_chunk)
                        },
                        'finish_reason': 'stop' if llm.is_eos_token(chat_chunk) else 'unknown'
                    }
                ]
            }

Normally, it should printout in stream with typewriter effect, but it doesn't include white space.....

lucasjinreal · 2023-08-30T03:11:10Z

BTW, I using ctransformers cli without any issue:

outputs = ''
        for text in llm(prompt, stream=True):
            print(text, end="", flush=True)
            outputs += text
        print()

shv07 mentioned this issue Jul 11, 2023

Floating point exception (core dumped) on concurrent requests to flask based api nomic-ai/gpt4all#1170

Closed

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REST API? #26

REST API? #26

Spiritdude commented Jun 21, 2023

matthoffner commented Jun 21, 2023

marella commented Jun 22, 2023

Spiritdude commented Jun 23, 2023

ParisNeo commented Jun 23, 2023 •

edited

Loading

matthoffner commented Jun 27, 2023

lucasjinreal commented Aug 29, 2023

matthoffner commented Aug 29, 2023

lucasjinreal commented Aug 29, 2023

matthoffner commented Aug 29, 2023

lucasjinreal commented Aug 30, 2023 •

edited

Loading

lucasjinreal commented Aug 30, 2023

lucasjinreal commented Aug 30, 2023

REST API? #26

REST API? #26

Comments

Spiritdude commented Jun 21, 2023

matthoffner commented Jun 21, 2023

marella commented Jun 22, 2023

Spiritdude commented Jun 23, 2023

ParisNeo commented Jun 23, 2023 • edited Loading

matthoffner commented Jun 27, 2023

lucasjinreal commented Aug 29, 2023

matthoffner commented Aug 29, 2023

lucasjinreal commented Aug 29, 2023

matthoffner commented Aug 29, 2023

lucasjinreal commented Aug 30, 2023 • edited Loading

lucasjinreal commented Aug 30, 2023

lucasjinreal commented Aug 30, 2023

ParisNeo commented Jun 23, 2023 •

edited

Loading

lucasjinreal commented Aug 30, 2023 •

edited

Loading