Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REST API? #26

Open
Spiritdude opened this issue Jun 21, 2023 · 12 comments
Open

REST API? #26

Spiritdude opened this issue Jun 21, 2023 · 12 comments

Comments

@Spiritdude
Copy link

What's the best practice to interface ctransformers API and expose it as REST API?

I looked at https://github.com/jquesnelle/transformers-openai-api and tried to change it to use ctransformer, but then I stopped as the changes required increased making it hard(er) to keep in-sync with the original.

@matthoffner
Copy link

I've been writing some generic FastAPIs to try various ggml libraries using ctransformers:

https://huggingface.co/spaces/matthoffner/wizardcoder-ggml
https://huggingface.co/spaces/matthoffner/starchat-ggml

I've made some recent updates to make the API match the OpenAI response format better so it can be used like llama.cpp + OpenAI.

@marella
Copy link
Owner

marella commented Jun 22, 2023

Nice @matthoffner


@Spiritdude I have been thinking of adding an OpenAI-compatible API server but not getting the time to do it. For now if you want to create your own API server, you can use the examples provided by matthoffner as reference. One thing to note is that the models are not thread-safe so you will have to use a lock to prevent concurrent calls.

I'm also trying to make ctransformers compatible with 🤗 Transformers so that it can be used as a drop-in replacement in other projects but it is WIP. See #13 (comment)

@Spiritdude
Copy link
Author

@marella thanks for ctransformer and all the efforts, very appreciated for the attention to details you have (for doing ctransformer and considerations of compatibility). I'm mainly using llama_python.server and keep all my apps using RESTAPI for compatibility.

@matthoffner great, thanks - will use your small code snippet in the meanwhile.

@ParisNeo
Copy link

ParisNeo commented Jun 23, 2023

If you want there is already a rest api that supports ctransformer:
https://github.com/ParisNeo/lollms

it allows you to generate text using a distributed or centralized architecture with multiple service nodes and you can connect to it from one or many clients.

It is a many to many server.

It also has some examples to use it as a library. I have creates a front end for it like playground:
https://github.com/ParisNeo/lollms-playground

When you install it, by just doing:

pip install lollms

then you set it up using :

lollms-settings

with this you can select binding (ctransformer for example)
and then you can select a model
then you can also select one of the preconditionned personalities (i have 260 of them)

then you run

lollms-server

This will run a localhost:9600 service

you can run many of them using this:

lollms-server --host 0.0.0.0 --port 9601

you can select any host you want, then you can run the playground or create your own code.
you can either use or not the personality preconditionning

I use socketio for generation in order to be able to send

socket.emit('generate_text', { prompt, personality: -1, n_predicts: 1024 ,
parameters: {
temperature: temperatureValue,
top_k: topKValue,
top_p: topPValue,
repeat_penalty: repeatPenaltyValue, // Update with desired repeat penalty value
repeat_last_n: repeatLastNValue, // Update with desired repeat_last_n value
seed: parseInt(seedValue)
}});

personality is the id of mounted personalities in the server. You can mount many allowing the user to choose.

@matthoffner
Copy link

Thanks @ParisNeo very cool

I'm hoping to help get ctransformers added to OpenLLM as well:

bentoml/OpenLLM#24

@lucasjinreal
Copy link

@matthoffner Hello, where is your code snippet using ctransfomers as openai-like server backend? Would like use it.

@matthoffner
Copy link

@lucasjinreal
Copy link

@matthoffner thanks so much, just found it in files tab.

However, I found some weried issue.

The detokenized text return to my client dropped space:

image

I changed nothing but return the new_text from chat_chunk and streaming to client, do u got any idea?

@matthoffner
Copy link

Thanks @lucasjinreal, I haven't seen this issue.

Here are some recent UIs I built around WizardCoder if you are looking for some client side examples.

Live HTML Editor: https://github.com/matthoffner/wizardcoder-sandbox
Chatbot-ui: https://huggingface.co/spaces/matthoffner/starchat-ui

@lucasjinreal
Copy link

lucasjinreal commented Aug 30, 2023

@matthoffner Can u tell me how did u resolve the white space issue? I printted out in stream one by one, it actually didn't have white space between. Meanwhile, it is not my client issue, this client can runs many openai-like server from mine, just not yet for ctransformers.

@lucasjinreal
Copy link

@matthoffner

async def stream_response(tokens, llm):
    try:
        iterator: Generator = llm.generate(tokens)
        for chat_chunk in iterator:
            print(llm.detokenize(chat_chunk), end='', flush=True)
            response = {
                'choices': [
                    {
                        'message': {
                            'role': 'system',
                            'content': llm.detokenize(chat_chunk)
                        },
                        'finish_reason': 'stop' if llm.is_eos_token(chat_chunk) else 'unknown'
                    }
                ]
            }

Normally, it should printout in stream with typewriter effect, but it doesn't include white space.....

@lucasjinreal
Copy link

BTW, I using ctransformers cli without any issue:

outputs = ''
        for text in llm(prompt, stream=True):
            print(text, end="", flush=True)
            outputs += text
        print()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants