Skip to content

Commit

Permalink
[Serving]support uie model (PaddlePaddle#599)
Browse files Browse the repository at this point in the history
* serving support uie model

* serving support uie model

* delete comment
  • Loading branch information
heliqi authored Nov 17, 2022
1 parent bd53d48 commit 320e26d
Show file tree
Hide file tree
Showing 5 changed files with 493 additions and 0 deletions.
1 change: 1 addition & 0 deletions examples/text/uie/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,3 +37,4 @@

- [Python部署](python)
- [C++部署](cpp)
- [服务化部署](serving)
139 changes: 139 additions & 0 deletions examples/text/uie/serving/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
# UIE 服务化部署示例

## 准备模型

下载UIE-Base模型(如果有已训练好的模型,跳过此步骤):
```bash
# 下载UIE模型文件和词表,以uie-base模型为例
wget https://bj.bcebos.com/fastdeploy/models/uie/uie-base.tgz
tar -xvfz uie-base.tgz

# 将下载的模型移动到模型仓库目录
mv uie-base/* models/uie/1/
```

模型下载移动好之后,目录结构如下:
```
models
└── uie
├── 1
│   ├── inference.pdiparams
│   ├── inference.pdmodel
│   ├── model.py
│   └── vocab.txt
└── config.pbtxt
```

## 拉取并运行镜像
```bash
# CPU镜像, 仅支持Paddle/ONNX模型在CPU上进行服务化部署,支持的推理后端包括OpenVINO、Paddle Inference和ONNX Runtime
docker pull paddlepaddle/fastdeploy:0.6.0-cpu-only-21.10

# GPU 镜像, 支持Paddle/ONNX模型在GPU/CPU上进行服务化部署,支持的推理后端包括OpenVINO、TensorRT、Paddle Inference和ONNX Runtime
docker pull paddlepaddle/fastdeploy:0.6.0-gpu-cuda11.4-trt8.4-21.10

# 运行容器.容器名字为 fd_serving, 并挂载当前目录为容器的 /uie_serving 目录
docker run -it --net=host --name fastdeploy_server --shm-size="1g" -v `pwd`/:/uie_serving paddlepaddle/fastdeploy:0.6.0-gpu-cuda11.4-trt8.4-21.10 bash

# 启动服务(不设置CUDA_VISIBLE_DEVICES环境变量,会拥有所有GPU卡的调度权限)
CUDA_VISIBLE_DEVICES=0 fastdeployserver --model-repository=/uie_serving/models --backend-config=python,shm-default-byte-size=10485760
```

>> **注意**: 当出现"Address already in use", 请使用`--grpc-port`指定端口号来启动服务,同时更改grpc_client.py中的请求端口号
服务启动成功后, 会有以下输出:
```
......
I0928 04:51:15.784517 206 grpc_server.cc:4117] Started GRPCInferenceService at 0.0.0.0:8001
I0928 04:51:15.785177 206 http_server.cc:2815] Started HTTPService at 0.0.0.0:8000
I0928 04:51:15.826578 206 http_server.cc:167] Started Metrics Service at 0.0.0.0:8002
```


## 客户端请求
客户端请求可以在本地执行脚本请求;也可以在容器中执行。

本地执行脚本需要先安装依赖:
```
pip install grpcio
pip install tritonclient[all]
# 如果bash无法识别括号,可以使用如下指令安装:
pip install tritonclient\[all\]
# 发送请求
python3 grpc_client.py
```

发送请求成功后,会返回结果并打印输出:
```
1. Named Entity Recognition Task--------------
The extraction schema: ['时间', '选手', '赛事名称']
text= ['2月8日上午北京冬奥会自由式滑雪女子大跳台决赛中中国选手谷爱凌以188.25分获得金牌!']
results:
{'时间': {'end': 6,
'probability': 0.9857379794120789,
'start': 0,
'text': '2月8日上午'},
'赛事名称': {'end': 23,
'probability': 0.8503087162971497,
'start': 6,
'text': '北京冬奥会自由式滑雪女子大跳台决赛'},
'选手': {'end': 31,
'probability': 0.8981545567512512,
'start': 28,
'text': '谷爱凌'}}
================================================
text= ['2月7日北京冬奥会短道速滑男子1000米决赛中任子威获得冠军!']
results:
{'时间': {'end': 4,
'probability': 0.9921242594718933,
'start': 0,
'text': '2月7日'},
'赛事名称': {'end': 22,
'probability': 0.8171929121017456,
'start': 4,
'text': '北京冬奥会短道速滑男子1000米决赛'},
'选手': {'end': 26,
'probability': 0.9821093678474426,
'start': 23,
'text': '任子威'}}
2. Relation Extraction Task
The extraction schema: {'竞赛名称': ['主办方', '承办方', '已举办次数']}
text= ['2022语言与智能技术竞赛由中国中文信息学会和中国计算机学会联合主办,百度公司、中国中文信息学会评测工作委员会和中国计算机学会自然语言处理专委会承办,已连续举办4届,成为全球最热门的中文NLP赛事之一。']
results:
{'竞赛名称': {'end': 13,
'probability': 0.7825395464897156,
'relation': {'主办方': [{'end': 22,
'probability': 0.8421710729598999,
'start': 14,
'text': '中国中文信息学会'},
{'end': 30,
'probability': 0.7580801248550415,
'start': 23,
'text': '中国计算机学会'}],
'已举办次数': [{'end': 82,
'probability': 0.4671308398246765,
'start': 80,
'text': '4届'}],
'承办方': [{'end': 39,
'probability': 0.8292703628540039,
'start': 35,
'text': '百度公司'},
{'end': 55,
'probability': 0.7000497579574585,
'start': 40,
'text': '中国中文信息学会评测工作委员会'},
{'end': 72,
'probability': 0.6193480491638184,
'start': 56,
'text': '中国计算机学会自然语言处理专委会'}]},
'start': 0,
'text': '2022语言与智能技术竞赛'}}
```


## 配置修改

当前默认配置在GPU上运行Paddle引擎,如果要在CPU/GPU或其他推理引擎上运行, 需要修改配置,详情请参考[配置文档](../../../../serving/docs/zh_CN/model_configuration.md)
151 changes: 151 additions & 0 deletions examples/text/uie/serving/grpc_client.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import logging
import numpy as np
from typing import Optional
import json
import ast

from pprint import pprint
from tritonclient import utils as client_utils
from tritonclient.grpc import InferenceServerClient, InferInput, InferRequestedOutput, service_pb2_grpc, service_pb2

LOGGER = logging.getLogger("run_inference_on_triton")


class SyncGRPCTritonRunner:
DEFAULT_MAX_RESP_WAIT_S = 120

def __init__(
self,
server_url: str,
model_name: str,
model_version: str,
*,
verbose=False,
resp_wait_s: Optional[float]=None, ):
self._server_url = server_url
self._model_name = model_name
self._model_version = model_version
self._verbose = verbose
self._response_wait_t = self.DEFAULT_MAX_RESP_WAIT_S if resp_wait_s is None else resp_wait_s

self._client = InferenceServerClient(
self._server_url, verbose=self._verbose)
error = self._verify_triton_state(self._client)
if error:
raise RuntimeError(
f"Could not communicate to Triton Server: {error}")

LOGGER.debug(
f"Triton server {self._server_url} and model {self._model_name}:{self._model_version} "
f"are up and ready!")

model_config = self._client.get_model_config(self._model_name,
self._model_version)
model_metadata = self._client.get_model_metadata(self._model_name,
self._model_version)
LOGGER.info(f"Model config {model_config}")
LOGGER.info(f"Model metadata {model_metadata}")

self._inputs = {tm.name: tm for tm in model_metadata.inputs}
self._input_names = list(self._inputs)
self._outputs = {tm.name: tm for tm in model_metadata.outputs}
self._output_names = list(self._outputs)
self._outputs_req = [
InferRequestedOutput(name) for name in self._outputs
]

def Run(self, inputs):
"""
Args:
inputs: list, Each value corresponds to an input name of self._input_names
Returns:
results: dict, {name : numpy.array}
"""
infer_inputs = []
for idx, data in enumerate(inputs):
data = json.dumps(data)
data = np.array([[data], ], dtype=np.object_)
infer_input = InferInput(self._input_names[idx], data.shape,
"BYTES")
infer_input.set_data_from_numpy(data)
infer_inputs.append(infer_input)

results = self._client.infer(
model_name=self._model_name,
model_version=self._model_version,
inputs=infer_inputs,
outputs=self._outputs_req,
client_timeout=self._response_wait_t, )
# only one output
results = results.as_numpy(self._output_names[0])
return results

def _verify_triton_state(self, triton_client):
if not triton_client.is_server_live():
return f"Triton server {self._server_url} is not live"
elif not triton_client.is_server_ready():
return f"Triton server {self._server_url} is not ready"
elif not triton_client.is_model_ready(self._model_name,
self._model_version):
return f"Model {self._model_name}:{self._model_version} is not ready"
return None


if __name__ == "__main__":
model_name = "uie"
model_version = "1"
url = "localhost:8001"
runner = SyncGRPCTritonRunner(url, model_name, model_version)

print("1. Named Entity Recognition Task--------------")
schema = ["时间", "选手", "赛事名称"]
print(f"The extraction schema: {schema}")
text = ["2月8日上午北京冬奥会自由式滑雪女子大跳台决赛中中国选手谷爱凌以188.25分获得金牌!"]
print("text=", text)
print("results:")
results = runner.Run([text, schema])
for result in results:
result = result.decode('utf-8')
result = ast.literal_eval(result)
pprint(result)

print("================================================")
text = ["2月7日北京冬奥会短道速滑男子1000米决赛中任子威获得冠军!"]
print("text=", text)
# while schema is empty, use the schema set up last time.
schema = []
results = runner.Run([text, schema])
print("results:")
for result in results:
result = result.decode('utf-8')
result = ast.literal_eval(result)
pprint(result)

print("\n2. Relation Extraction Task")
schema = {"竞赛名称": ["主办方", "承办方", "已举办次数"]}
print(f"The extraction schema: {schema}")
text = [
"2022语言与智能技术竞赛由中国中文信息学会和中国计算机学会联合主办,百度公司、中国中文信息学会评测工作"
"委员会和中国计算机学会自然语言处理专委会承办,已连续举办4届,成为全球最热门的中文NLP赛事之一。"
]
print("text=", text)
print("results:")
results = runner.Run([text, schema])
for result in results:
result = result.decode('utf-8')
result = ast.literal_eval(result)
pprint(result)
Loading

0 comments on commit 320e26d

Please sign in to comment.