Skip to content

Commit

Permalink
bump version to v0.5.0 (InternLM#1852)
Browse files Browse the repository at this point in the history
* bump version to v0.5.0

* update news

* update news

* update supported models

* update

* fix lint

* set LMDEPLOY_VERSION 0.5.0
  • Loading branch information
lvhan028 authored Jul 1, 2024
1 parent 5ceb464 commit 4cb3854
Show file tree
Hide file tree
Showing 10 changed files with 75 additions and 64 deletions.
5 changes: 4 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ ______________________________________________________________________
<details open>
<summary><b>2024</b></summary>

- \[2024/06\] PyTorch engine support DeepSeek-V2 and several VLMs, such as CogVLM2, Mini-InternVL, LlaVA-Next
- \[2024/05\] Balance vision model when deploying VLMs with multiple GPUs
- \[2024/05\] Support 4-bits weight-only quantization and inference on VMLs, such as InternVL v1.5, LLaVa, InternLMXComposer2
- \[2024/04\] Support Llama3 and more VLMs, such as InternVL v1.1, v1.2, MiniGemini, InternLMXComposer2.
Expand Down Expand Up @@ -112,6 +113,7 @@ For detailed inference benchmarks in more devices and more settings, please refe
<li>QWen (1.8B - 72B)</li>
<li>QWen1.5 (0.5B - 110B)</li>
<li>QWen1.5 - MoE (0.5B - 72B)</li>
<li>QWen2 (0.5B - 72B)</li>
<li>Baichuan (7B)</li>
<li>Baichuan2 (7B-13B)</li>
<li>Code Llama (7B - 34B)</li>
Expand All @@ -121,6 +123,7 @@ For detailed inference benchmarks in more devices and more settings, please refe
<li>YI (6B-34B)</li>
<li>Mistral (7B)</li>
<li>DeepSeek-MoE (16B)</li>
<li>DeepSeek-V2 (16B, 236B)</li>
<li>Mixtral (8x7B, 8x22B)</li>
<li>Gemma (2B - 7B)</li>
<li>Dbrx (132B)</li>
Expand Down Expand Up @@ -162,7 +165,7 @@ pip install lmdeploy
Since v0.3.0, The default prebuilt package is compiled on **CUDA 12**. However, if CUDA 11+ is required, you can install lmdeploy by:

```shell
export LMDEPLOY_VERSION=0.3.0
export LMDEPLOY_VERSION=0.5.0
export PYTHON_VERSION=38
pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
```
Expand Down
5 changes: 4 additions & 1 deletion README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ ______________________________________________________________________
<details open>
<summary><b>2024</b></summary>

- \[2024/06\] PyTorch engine 支持了 DeepSeek-V2 和若干 VLM 模型推理, 比如 CogVLM2,Mini-InternVL,LlaVA-Next
- \[2024/05\] 在多 GPU 上部署 VLM 模型时,支持把视觉部分的模型均分到多卡上
- \[2024/05\] 支持InternVL v1.5, LLaVa, InternLMXComposer2 等 VLMs 模型的 4bit 权重量化和推理
- \[2024/04\] 支持 Llama3 和 InternVL v1.1, v1.2,MiniGemini,InternLM-XComposer2 等 VLM 模型
Expand Down Expand Up @@ -113,6 +114,7 @@ LMDeploy TurboMind 引擎拥有卓越的推理能力,在各种规模的模型
<li>QWen (1.8B - 72B)</li>
<li>QWen1.5 (0.5B - 110B)</li>
<li>QWen1.5 - MoE (0.5B - 72B)</li>
<li>QWen2 (0.5B - 72B)</li>
<li>Baichuan (7B)</li>
<li>Baichuan2 (7B-13B)</li>
<li>Code Llama (7B - 34B)</li>
Expand All @@ -122,6 +124,7 @@ LMDeploy TurboMind 引擎拥有卓越的推理能力,在各种规模的模型
<li>YI (6B-34B)</li>
<li>Mistral (7B)</li>
<li>DeepSeek-MoE (16B)</li>
<li>DeepSeek-V2 (16B, 236B)</li>
<li>Mixtral (8x7B, 8x22B)</li>
<li>Gemma (2B - 7B)</li>
<li>Dbrx (132B)</li>
Expand Down Expand Up @@ -163,7 +166,7 @@ pip install lmdeploy
自 v0.3.0 起,LMDeploy 预编译包默认基于 CUDA 12 编译。如果需要在 CUDA 11+ 下安装 LMDeploy,请执行以下命令:

```shell
export LMDEPLOY_VERSION=0.3.0
export LMDEPLOY_VERSION=0.5.0
export PYTHON_VERSION=38
pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
```
Expand Down
2 changes: 1 addition & 1 deletion docs/en/get_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ pip install lmdeploy
The default prebuilt package is compiled on **CUDA 12**. However, if CUDA 11+ is required, you can install lmdeploy by:

```shell
export LMDEPLOY_VERSION=0.4.2
export LMDEPLOY_VERSION=0.5.0
export PYTHON_VERSION=38
pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
```
Expand Down
2 changes: 1 addition & 1 deletion docs/en/multi_modal/cogvlm.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ Install LMDeploy with pip (Python 3.8+). Refer to [Installation](https://lmdeplo
```shell
# cuda 11.8
# to get the latest version, run: pip index versions lmdeploy
export LMDEPLOY_VERSION=0.4.2
export LMDEPLOY_VERSION=0.5.0
export PYTHON_VERSION=38
pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
# cuda 12.1
Expand Down
57 changes: 30 additions & 27 deletions docs/en/supported_models/supported_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,8 @@
| InternLM-XComposer | 7B | Yes | Yes | Yes | Yes |
| InternLM-XComposer2 | 7B, 4khd-7B | Yes | Yes | Yes | Yes |
| QWen | 1.8B - 72B | Yes | Yes | Yes | Yes |
| QWen1.5 | 1.8B - 72B | Yes | Yes | Yes | Yes |
| QWen1.5 | 1.8B - 110B | Yes | Yes | Yes | Yes |
| QWen2 | 1.5B - 72B | Yes | Yes | Yes | Yes |
| Mistral | 7B | Yes | Yes | Yes | No |
| QWen-VL | 7B | Yes | Yes | Yes | Yes |
| DeepSeek-VL | 7B | Yes | Yes | Yes | Yes |
Expand All @@ -35,29 +36,31 @@ The TurboMind engine doesn't support window attention. Therefore, for models tha

## Models supported by PyTorch

| Model | Size | FP16/BF16 | KV INT8 | W8A8 |
| :-----------------: | :--------: | :-------: | :-----: | :--: |
| Llama | 7B - 65B | Yes | No | Yes |
| Llama2 | 7B - 70B | Yes | No | Yes |
| Llama3 | 8B, 70B | Yes | No | Yes |
| InternLM | 7B - 20B | Yes | No | Yes |
| InternLM2 | 7B - 20B | Yes | No | - |
| InternLM2.5 | 7B | Yes | No | - |
| Baichuan2 | 7B - 13B | Yes | No | Yes |
| ChatGLM2 | 6B | Yes | No | No |
| Falcon | 7B - 180B | Yes | No | No |
| YI | 6B - 34B | Yes | No | No |
| Mistral | 7B | Yes | No | No |
| Mixtral | 8x7B | Yes | No | No |
| QWen | 1.8B - 72B | Yes | No | No |
| QWen1.5 | 0.5B - 72B | Yes | No | No |
| QWen1.5-MoE | A2.7B | Yes | No | No |
| DeepSeek-MoE | 16B | Yes | No | No |
| Gemma | 2B-7B | Yes | No | No |
| Dbrx | 132B | Yes | No | No |
| StarCoder2 | 3B-15B | Yes | No | No |
| Phi-3-mini | 3.8B | Yes | No | No |
| CogVLM-Chat | 17B | Yes | No | No |
| CogVLM2-Chat | 19B | Yes | No | No |
| LLaVA(1.5,1.6) | 7B-34B | Yes | No | No |
| InternVL-Chat(v1.5) | 2B-26B | Yes | No | No |
| Model | Size | FP16/BF16 | KV INT8 | W8A8 |
| :-----------------: | :---------: | :-------: | :-----: | :--: |
| Llama | 7B - 65B | Yes | No | Yes |
| Llama2 | 7B - 70B | Yes | No | Yes |
| Llama3 | 8B, 70B | Yes | No | Yes |
| InternLM | 7B - 20B | Yes | No | Yes |
| InternLM2 | 7B - 20B | Yes | No | - |
| InternLM2.5 | 7B | Yes | No | - |
| Baichuan2 | 7B - 13B | Yes | No | Yes |
| ChatGLM2 | 6B | Yes | No | No |
| Falcon | 7B - 180B | Yes | No | No |
| YI | 6B - 34B | Yes | No | No |
| Mistral | 7B | Yes | No | No |
| Mixtral | 8x7B | Yes | No | No |
| QWen | 1.8B - 72B | Yes | No | No |
| QWen1.5 | 0.5B - 110B | Yes | No | No |
| QWen1.5-MoE | A2.7B | Yes | No | No |
| QWen2 | 0.5B - 72B | Yes | No | No |
| DeepSeek-MoE | 16B | Yes | No | No |
| DeepSeek-V2 | 16B, 236B | Yes | No | No |
| Gemma | 2B-7B | Yes | No | No |
| Dbrx | 132B | Yes | No | No |
| StarCoder2 | 3B-15B | Yes | No | No |
| Phi-3-mini | 3.8B | Yes | No | No |
| CogVLM-Chat | 17B | Yes | No | No |
| CogVLM2-Chat | 19B | Yes | No | No |
| LLaVA(1.5,1.6) | 7B-34B | Yes | No | No |
| InternVL-Chat(v1.5) | 2B-26B | Yes | No | No |
2 changes: 1 addition & 1 deletion docs/zh_cn/get_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ pip install lmdeploy
LMDeploy的预编译包默认是基于 CUDA 12 编译的。如果需要在 CUDA 11+ 下安装 LMDeploy,请执行以下命令:

```shell
export LMDEPLOY_VERSION=0.4.2
export LMDEPLOY_VERSION=0.5.0
export PYTHON_VERSION=38
pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
```
Expand Down
2 changes: 1 addition & 1 deletion docs/zh_cn/multi_modal/cogvlm.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ pip install torch==2.2.2 torchvision==0.17.2 xformers==0.0.26 --index-url https:

```shell
# cuda 11.8
export LMDEPLOY_VERSION=0.4.2
export LMDEPLOY_VERSION=0.5.0
export PYTHON_VERSION=38
pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
# cuda 12.1
Expand Down
57 changes: 30 additions & 27 deletions docs/zh_cn/supported_models/supported_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,8 @@
| InternLM-XComposer | 7B | Yes | Yes | Yes | Yes |
| InternLM-XComposer2 | 7B, 4khd-7B | Yes | Yes | Yes | Yes |
| QWen | 1.8B - 72B | Yes | Yes | Yes | Yes |
| QWen1.5 | 1.8B - 72B | Yes | Yes | Yes | Yes |
| QWen1.5 | 1.8B - 110B | Yes | Yes | Yes | Yes |
| QWen2 | 1.5B - 72B | Yes | Yes | Yes | Yes |
| Mistral | 7B | Yes | Yes | Yes | No |
| QWen-VL | 7B | Yes | Yes | Yes | Yes |
| DeepSeek-VL | 7B | Yes | Yes | Yes | Yes |
Expand All @@ -35,29 +36,31 @@ turbomind 引擎不支持 window attention。所以,对于应用了 window att

### PyTorch 支持的模型

| 模型 | 模型规模 | FP16/BF16 | KV INT8 | W8A8 |
| :-----------------: | :--------: | :-------: | :-----: | :--: |
| Llama | 7B - 65B | Yes | No | Yes |
| Llama2 | 7B - 70B | Yes | No | Yes |
| Llama3 | 8B, 70B | Yes | No | Yes |
| InternLM | 7B - 20B | Yes | No | Yes |
| InternLM2 | 7B - 20B | Yes | No | - |
| InternLM2.5 | 7B | Yes | No | - |
| Baichuan2 | 7B - 13B | Yes | No | Yes |
| ChatGLM2 | 6B | Yes | No | No |
| Falcon | 7B - 180B | Yes | No | No |
| YI | 6B - 34B | Yes | No | No |
| Mistral | 7B | Yes | No | No |
| Mixtral | 8x7B | Yes | No | No |
| QWen | 1.8B - 72B | Yes | No | No |
| QWen1.5 | 0.5B - 72B | Yes | No | No |
| QWen1.5-MoE | A2.7B | Yes | No | No |
| DeepSeek-MoE | 16B | Yes | No | No |
| Gemma | 2B-7B | Yes | No | No |
| Dbrx | 132B | Yes | No | No |
| StarCoder2 | 3B-15B | Yes | No | No |
| Phi-3-mini | 3.8B | Yes | No | No |
| CogVLM-Chat | 17B | Yes | No | No |
| CogVLM2-Chat | 19B | Yes | No | No |
| LLaVA(1.5,1.6) | 7B-34B | Yes | No | No |
| InternVL-Chat(v1.5) | 2B-26B | Yes | No | No |
| 模型 | 模型规模 | FP16/BF16 | KV INT8 | W8A8 |
| :-----------------: | :---------: | :-------: | :-----: | :--: |
| Llama | 7B - 65B | Yes | No | Yes |
| Llama2 | 7B - 70B | Yes | No | Yes |
| Llama3 | 8B, 70B | Yes | No | Yes |
| InternLM | 7B - 20B | Yes | No | Yes |
| InternLM2 | 7B - 20B | Yes | No | - |
| InternLM2.5 | 7B | Yes | No | - |
| Baichuan2 | 7B - 13B | Yes | No | Yes |
| ChatGLM2 | 6B | Yes | No | No |
| Falcon | 7B - 180B | Yes | No | No |
| YI | 6B - 34B | Yes | No | No |
| Mistral | 7B | Yes | No | No |
| Mixtral | 8x7B | Yes | No | No |
| QWen | 1.8B - 72B | Yes | No | No |
| QWen1.5 | 0.5B - 110B | Yes | No | No |
| QWen2 | 0.5B - 72B | Yes | No | No |
| QWen1.5-MoE | A2.7B | Yes | No | No |
| DeepSeek-MoE | 16B | Yes | No | No |
| DeepSeek-V2 | 16B, 236B | Yes | No | No |
| Gemma | 2B-7B | Yes | No | No |
| Dbrx | 132B | Yes | No | No |
| StarCoder2 | 3B-15B | Yes | No | No |
| Phi-3-mini | 3.8B | Yes | No | No |
| CogVLM-Chat | 17B | Yes | No | No |
| CogVLM2-Chat | 19B | Yes | No | No |
| LLaVA(1.5,1.6) | 7B-34B | Yes | No | No |
| InternVL-Chat(v1.5) | 2B-26B | Yes | No | No |
5 changes: 2 additions & 3 deletions lmdeploy/cli/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -379,9 +379,8 @@ def cache_max_entry_count(parser):
'--cache-max-entry-count',
type=float,
default=0.8,
help=
'The percentage of free gpu memory occupied by the k/v cache, excluding weights'
)
help='The percentage of free gpu memory occupied by the k/v '
'cache, excluding weights ')

@staticmethod
def adapters(parser):
Expand Down
2 changes: 1 addition & 1 deletion lmdeploy/version.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Copyright (c) OpenMMLab. All rights reserved.
from typing import Tuple

__version__ = '0.4.2'
__version__ = '0.5.0'
short_version = __version__


Expand Down

0 comments on commit 4cb3854

Please sign in to comment.