Commits

Commits on Aug 7, 2024

[ BugFix ] Move zmq frontend to IPC instead of TCP (vllm-project#7222 )
robertgshaw2-redhat
authored

Commits on Aug 6, 2024

[ BugFix ] Fix ZMQ when VLLM_PORT is set (vllm-project#7205 )
robertgshaw2-redhat
authored

Commits on Aug 3, 2024

[ Frontend ] Multiprocessing for OpenAI Server with zeromq (vllm-project#6883 )

authored

Commits on Jul 25, 2024

[ Misc ] fp8-marlin channelwise via compressed-tensors (vllm-project#6524 )

robertgshaw2-redhat
and
mgoin
authored

Commits on Jul 18, 2024

Commits on Jul 15, 2024

[CI/Build] Cross python wheel (vllm-project#6394 )
robertgshaw2-redhat
authored

Commits on Jul 14, 2024

Commits on Jul 13, 2024

[ Misc ] More Cleanup of Marlin (vllm-project#6359 )
robertgshaw2-redhat
authored

Commits on Jul 12, 2024

Commits on Jul 11, 2024

Commits on Jul 7, 2024

[ Misc ] Support Fp8 via llm-compressor (vllm-project#6110 )

robertgshaw2-redhat
and
Robert Shaw
authored

Commits on Jul 3, 2024

[ Misc ] Clean Up CompressedTensorsW8A8 (vllm-project#6113 )
robertgshaw2-redhat
authored

Commits on Jul 2, 2024

Commits on Jul 1, 2024

[ CI ] Re-enable Large Model LM Eval (vllm-project#6031 )
robertgshaw2-redhat
authored

Commits on Jun 30, 2024

Commits on Jun 29, 2024

Commits on Jun 28, 2024

Commits on Jun 14, 2024

[ Misc ] Rs/compressed tensors cleanup (vllm-project#5432 )

authored

Commits on Jun 1, 2024

[BugFix] Prevent LLM.encode for non-generation Models (vllm-project#5184 )

robertgshaw2-redhat
and
mgoin
authored

Commits on May 31, 2024

[Bugfix] Avoid Warnings in SparseML Activation Quantization (vllm-project#5120 )
robertgshaw2-redhat
authored

Commits on May 30, 2024

[Bugfix] Automatically Detect SparseML models (vllm-project#5119 )
robertgshaw2-redhat
authored