OpenCSGs / llm-inference Public

Notifications You must be signed in to change notification settings
Fork 16
Star 79

Code
Issues 13
Pull requests
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: OpenCSGs/llm-inference

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clear current search query, filters, and sorts

13 Open 21 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

Api server was blocked when LLM deployment scaling config beyond the cluster resouces

#134 by SeanHH86 was closed May 8, 2024

Auto load models from ./models for when api server start

#132 by SeanHH86 was closed May 8, 2024

Upgrade ray 2.20.0 enhancement

New feature or request

#130 by SeanHH86 was closed May 7, 2024

Error happen when do inference for wukong dtype=bfloat16 of use default transformer pipeline load model

#128 by SeanHH86 was closed Apr 30, 2024

avoid to ping huggingface when start serving to speed up the deployement

#117 by depenglee1707 was closed Apr 24, 2024

vllm, gguf, llamacpp, these integration cannot address local path of model

#116 by depenglee1707 was closed Apr 23, 2024

enable reset generate config on fly enhancement

New feature or request

#104 by depenglee1707 was closed Apr 23, 2024

The usage introduction of llm-serve is not correct in quick_start.md good first issue

Good for newcomers

#100 by depenglee1707 was closed Apr 16, 2024

API server startup slow bug

Something isn't working

#97 by SeanHH86 was closed May 7, 2024

support "revision" in yaml defination

#92 by depenglee1707 was closed Apr 16, 2024

vllm cannot address "runtime_env"

#87 by depenglee1707 was closed Apr 16, 2024

Failed to load qwen1_5-72b-chat-q5_k_m.gguf

#72 by SeanHH86 was closed Apr 5, 2024

Model streaming API enhancement

#67 by SeanHH86 was closed Apr 16, 2024

Generate incorrect text format when use pipeline defaulttransformers

#53 by SeanHH86 was closed Mar 27, 2024

Enhance inference API to support OpenAI style enhancement

New feature or request

#52 by SeanHH86 was closed May 7, 2024

Inference throw timeout sometime

#45 by SeanHH86 was closed Mar 25, 2024

Upgrade ray to 2.9.3

#23 by SeanHH86 was closed Mar 28, 2024

[BUG] Get error when try "translation" downstream model

#16 by depenglee1707 was closed Mar 11, 2024

Change model default download address

#15 by SeanHH86 was closed Apr 16, 2024

Better to implement streaming output feature

#14 by jasonhe258 was closed Apr 11, 2024

Wrong model id when there are -- in model id for do predict

#3 by SeanHH86 was closed Feb 29, 2024

ProTip! What’s not been updated in a month: updated:<2025-02-11.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly