-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REST API? #26
Comments
I've been writing some generic FastAPIs to try various ggml libraries using ctransformers: https://huggingface.co/spaces/matthoffner/wizardcoder-ggml I've made some recent updates to make the API match the OpenAI response format better so it can be used like llama.cpp + OpenAI. |
Nice @matthoffner @Spiritdude I have been thinking of adding an OpenAI-compatible API server but not getting the time to do it. For now if you want to create your own API server, you can use the examples provided by matthoffner as reference. One thing to note is that the models are not thread-safe so you will have to use a lock to prevent concurrent calls. I'm also trying to make ctransformers compatible with 🤗 Transformers so that it can be used as a drop-in replacement in other projects but it is WIP. See #13 (comment) |
@marella thanks for ctransformer and all the efforts, very appreciated for the attention to details you have (for doing ctransformer and considerations of compatibility). I'm mainly using llama_python.server and keep all my apps using RESTAPI for compatibility. @matthoffner great, thanks - will use your small code snippet in the meanwhile. |
If you want there is already a rest api that supports ctransformer: it allows you to generate text using a distributed or centralized architecture with multiple service nodes and you can connect to it from one or many clients. It is a many to many server. It also has some examples to use it as a library. I have creates a front end for it like playground: When you install it, by just doing: pip install lollms then you set it up using : lollms-settings with this you can select binding (ctransformer for example) then you run lollms-server This will run a localhost:9600 service you can run many of them using this: lollms-server --host 0.0.0.0 --port 9601 you can select any host you want, then you can run the playground or create your own code. I use socketio for generation in order to be able to send socket.emit('generate_text', { prompt, personality: -1, n_predicts: 1024 , personality is the id of mounted personalities in the server. You can mount many allowing the user to choose. |
Thanks @ParisNeo very cool I'm hoping to help get ctransformers added to OpenLLM as well: |
@matthoffner Hello, where is your code snippet using ctransfomers as openai-like server backend? Would like use it. |
These should still work: HuggingFace: https://huggingface.co/spaces/matthoffner/wizardcoder-ggml |
@matthoffner thanks so much, just found it in files tab. However, I found some weried issue. The detokenized text return to my client dropped space: I changed nothing but return the new_text from chat_chunk and streaming to client, do u got any idea? |
Thanks @lucasjinreal, I haven't seen this issue. Here are some recent UIs I built around WizardCoder if you are looking for some client side examples. Live HTML Editor: https://github.com/matthoffner/wizardcoder-sandbox |
@matthoffner Can u tell me how did u resolve the white space issue? I printted out in stream one by one, it actually didn't have white space between. Meanwhile, it is not my client issue, this client can runs many openai-like server from mine, just not yet for ctransformers. |
Normally, it should printout in stream with typewriter effect, but it doesn't include white space..... |
BTW, I using ctransformers cli without any issue:
|
What's the best practice to interface ctransformers API and expose it as REST API?
I looked at https://github.com/jquesnelle/transformers-openai-api and tried to change it to use ctransformer, but then I stopped as the changes required increased making it hard(er) to keep in-sync with the original.
The text was updated successfully, but these errors were encountered: