Skip to content

Commit

Permalink
Add request envelope to support Multiserving frameworks (pytorch#749)
Browse files Browse the repository at this point in the history
* - updated image classifier default handler
- updated custom resnet and mnist handlers
- update docs

* documentation update
- removed stale Transformer_readme.md

* updated docs

* Doc restructure and code fixes

* Updated example as per fix in default handlers

* Enhanced base and custom handler examples

* Missing checking for manifest

* Fixed some typos

* Removed commented code

* Refactor BaseHandler

* Adding in unit tests

* Fixed gitignore in this branch

* Fix a bug with Image Segmenter

* Updated Object Detector to reuse functionality; consistency

* Fix pylint errors

* Backwards compat for index_names.json

* Fixed Image Segmenter again

* Made the compat layer in text actually compat.

* Removed batching from text classifier

* Adding comments per review.

* Fixing doc feedback.

* Updating docs about batching.

* Initial commit of envelopes.

* Got the end-to-end working

* Undoing a change for local stuff.

* Fixing a few broken tests.

* Fixed error introduced due to conflict resolution via web based merge tool

* Corrected code comment

* - Updated Object detection & text classification handlers
- updated docs

* fixed python linting errors

* updated index to name json text classifier

* fixed object detector handler for batch support

* Fixed the batch inference output

* update expected output as per new handler changes

* updated text classification mar name in sanity suite

* updated text classifier mar name and removed bert scripted models

* updated model zoo with new text classification url

* added model_name in while registering model in sanity suite

* updated text classification model name

* added upgrade option for installing python dependencies in install utils

* added upgrade option for installing python dependencies and extra numpy package in regression suite

* refectored pytests in regression suite for better performance and reporting

* Merge upstream

* Got the end-to-end working

* Merge upstream (2)

* Fixing a few broken tests.

* Undoing a bad merge.

* minor fix in torch-archiver command

* reverted postprocess removal

* updated mar files in model zoo to use updated handlers

* updated regression suite to use updated mar files

* suppressed pylint warning in UT

* fixed resnet-152 mar name and expected output

* updated inference tests data
- added tolerence value for resent152 models

* Added custom handler in vgg11 example (pytorch#559)

* added custom handler for vgg11

* added readme for vgg11 example

* fixed typo in readme

* updated model zoo

* reverted back changes for scripted vgg11 mar file

* added vgg11 model to regression test suite

* disabled pylint check in UT

* updated expected response for vgg11 inference in regression suite

* updated expected response for vgg11 inference in regression suite

* updated expected for densenet scripted

* Fixing bad file format

* Fixed the 'no newman npm' issue in regression test suite, solution suggested in PR 757

* Fixed the metrics bug pytorch#772 for test_envelopes

Co-authored-by: Shivam Shriwas <[email protected]>
Co-authored-by: Harsh Bafna <[email protected]>
Co-authored-by: dhaniram kshirsagar <[email protected]>
Co-authored-by: dhaniram-kshirsagar <[email protected]>
Co-authored-by: Henry Tappen <[email protected]>
Co-authored-by: harshbafna <[email protected]>
Co-authored-by: Aaqib <[email protected]>
Co-authored-by: Henry Tappen <[email protected]>
Co-authored-by: Geeta Chauhan <[email protected]>
  • Loading branch information
10 people authored Nov 23, 2020
1 parent 40f4258 commit abcaf1a
Show file tree
Hide file tree
Showing 20 changed files with 351 additions and 48 deletions.
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ dist/
*__pycache__*
*.egg-info/
.idea
*htmlcov*
.coverage
.github/actions/
.github/.DS_Store
.DS_Store
.DS_Store
5 changes: 4 additions & 1 deletion docs/default_handlers.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,4 +46,7 @@ For more details see [examples](https://github.com/pytorch/serve/tree/master/exa
- [object_detector](https://github.com/pytorch/serve/tree/master/examples/object_detector/index_to_name.json)

# Contributing
If you'd like to edit or create a new default_handler class, make sure to run and update the unit tests in [unit_tests](https://github.com/pytorch/serve/tree/master/ts/torch_handler/unit_tests). As always, make sure to run [torchserve_sanity.py](https://github.com/pytorch/serve/tree/master/torchserve_sanity.py) before submitting.
If you'd like to edit or create a new default_handler class, you need to take the following steps:
1. Write a new class derived from BaseHandler. Add it as a separate file in `ts/torch_handler/`
1. Update `model-archiver/model_packaging.py` to add in your classes name
1. Run and update the unit tests in [unit_tests](https://github.com/pytorch/serve/tree/master/ts/torch_handler/unit_tests). As always, make sure to run [torchserve_sanity.py](https://github.com/pytorch/serve/tree/master/torchserve_sanity.py) before submitting.
18 changes: 18 additions & 0 deletions docs/request_envelopes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Introduction

Many model serving systems provide a signature for request bodies. Examples include:

- [Seldon](https://docs.seldon.io/projects/seldon-core/en/v1.1.0/graph/protocols.html)
- [KFServing](https://github.com/kubeflow/kfserving/tree/master/docs)
- [Google Cloud AI Platform](https://cloud.google.com/ai-platform/prediction/docs/online-predict)

Data scientists use these multi-framework systems to manage deployments of many different models, possibly written in different languages and frameworks. The platforms offer additional analytics on top of model serving, including skew detection, explanations and A/B testing. These platforms need a well-structured signature in order to both standardize calls across different frameworks and to understand the input data. To simplify support for many frameworks, though, these platforms will simply pass the request body along to the underlying model server.

Torchserve currently has no fixed request body signature. Envelopes allow you to automatically translate from the fixed signature required for your model orchestrator to a flat Python list.

# Usage
1. When you write a handler, always expect a plain Python list containing data ready to go into `preprocess`. Crucially, you should assume that your handler code looks the same locally or in your model orchestrator.
1. When you deploy Torchserve behind a model orchestrator, make sure to set the corresponding `service_envelope` in your `config.properties` file. For example, if you're using Google Cloud AI Platform, which has a JSON format, you'd add `service_envelope=json` to your `config.properties` file.

# Contributing
Add new files under `ts/torch_handler/request_envelope`. Only include one class per file. The key used in `config.properties` will be the name of the .py file you write your class in.
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ public static final class Model {
private String description;
private String modelVersion;
private String handler;
private String envelope;
private String requirementsFile;

public Model() {}
Expand Down Expand Up @@ -113,6 +114,14 @@ public String getHandler() {
public void setHandler(String handler) {
this.handler = handler;
}

public String getEnvelope() {
return envelope;
}

public void setEnvelope(String envelope) {
this.envelope = envelope;
}
}

public enum RuntimeType {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ public final class ConfigManager {
private static final String TS_MAX_REQUEST_SIZE = "max_request_size";
private static final String TS_MAX_RESPONSE_SIZE = "max_response_size";
private static final String TS_DEFAULT_SERVICE_HANDLER = "default_service_handler";
private static final String TS_SERVICE_ENVELOPE = "service_envelope";
private static final String TS_MODEL_SERVER_HOME = "model_server_home";
private static final String TS_MODEL_STORE = "model_store";
private static final String TS_SNAPSHOT_STORE = "snapshot_store";
Expand Down Expand Up @@ -314,6 +315,10 @@ public String getTsDefaultServiceHandler() {
return getProperty(TS_DEFAULT_SERVICE_HANDLER, null);
}

public String getTsServiceEnvelope() {
return getProperty(TS_SERVICE_ENVELOPE, null);
}

public Properties getConfiguration() {
return (Properties) prop.clone();
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -43,10 +43,23 @@ protected void encode(ChannelHandlerContext ctx, BaseModelRequest msg, ByteBuf o
buf = handler.getBytes(StandardCharsets.UTF_8);
}

// TODO: this might be a bug. If handler isn't specified, this
// will repeat the model path
out.writeInt(buf.length);
out.writeBytes(buf);

out.writeInt(request.getGpuId());

String envelope = request.getEnvelope();
if (envelope != null) {
buf = envelope.getBytes(StandardCharsets.UTF_8);
} else {
buf = new byte[0];
}

out.writeInt(buf.length);
out.writeBytes(buf);

} else if (msg instanceof ModelInferenceRequest) {
out.writeByte('I');
ModelInferenceRequest request = (ModelInferenceRequest) msg;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ public class ModelLoadModelRequest extends BaseModelRequest {
private String modelPath;

private String handler;
private String envelope;
private int batchSize;
private int gpuId;

Expand All @@ -19,6 +20,7 @@ public ModelLoadModelRequest(Model model, int gpuId) {
this.gpuId = gpuId;
modelPath = model.getModelDir().getAbsolutePath();
handler = model.getModelArchive().getManifest().getModel().getHandler();
envelope = model.getModelArchive().getManifest().getModel().getEnvelope();
batchSize = model.getBatchSize();
}

Expand All @@ -30,6 +32,10 @@ public String getHandler() {
return handler;
}

public String getEnvelope() {
return envelope;
}

public int getBatchSize() {
return batchSize;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -155,6 +155,8 @@ private ModelArchive createModelArchive(
archive.getManifest().getModel().setHandler(configManager.getTsDefaultServiceHandler());
}

archive.getManifest().getModel().setEnvelope(configManager.getTsServiceEnvelope());

archive.validate();

return archive;
Expand Down
2 changes: 1 addition & 1 deletion test/postman/increased_timeout_inference.json
Original file line number Diff line number Diff line change
Expand Up @@ -16,4 +16,4 @@
},
"tolerance":5
}
]
]
103 changes: 64 additions & 39 deletions ts/model_loader.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ class ModelLoader(object):
__metaclass__ = ABCMeta

@abstractmethod
def load(self, model_name, model_dir, handler, gpu_id, batch_size):
def load(self, model_name, model_dir, handler, gpu_id, batch_size, envelope=None):
"""
Load model from file.
Expand All @@ -43,6 +43,7 @@ def load(self, model_name, model_dir, handler, gpu_id, batch_size):
:param handler:
:param gpu_id:
:param batch_size:
:param envelope:
:return: Model
"""
# pylint: disable=unnecessary-pass
Expand All @@ -54,7 +55,7 @@ class TsModelLoader(ModelLoader):
TorchServe 1.0 Model Loader
"""

def load(self, model_name, model_dir, handler, gpu_id, batch_size):
def load(self, model_name, model_dir, handler, gpu_id, batch_size, envelope=None):
"""
Load TorchServe 1.0 model from file.
Expand All @@ -63,6 +64,7 @@ def load(self, model_name, model_dir, handler, gpu_id, batch_size):
:param handler:
:param gpu_id:
:param batch_size:
:param envelope:
:return:
"""
logging.debug("Loading model - working dir: %s", os.getcwd())
Expand All @@ -74,48 +76,71 @@ def load(self, model_name, model_dir, handler, gpu_id, batch_size):
with open(manifest_file) as f:
manifest = json.load(f)

function_name = None
try:
temp = handler.split(":", 1)
module_name = temp[0]
function_name = None if len(temp) == 1 else temp[1]
if module_name.endswith(".py"):
module_name = module_name[:-3]
module_name = module_name.split("/")[-1]
module = importlib.import_module(module_name)
# pylint: disable=unused-variable
except ImportError as e:
module_name = ".{0}".format(handler)
module = importlib.import_module(module_name, 'ts.torch_handler')
function_name = None
module, function_name = self._load_handler_file(handler)
except ImportError:
module = self._load_default_handler(handler)

if module is None:
raise ValueError("Unable to load module {}, make sure it is added to python path".format(module_name))
if function_name is None:
function_name = "handle"
if hasattr(module, function_name):
entry_point = getattr(module, function_name)
service = Service(model_name, model_dir, manifest, entry_point, gpu_id, batch_size)

service.context.metrics = metrics
# initialize model at load time
entry_point(None, service.context)
envelope_class = None
if envelope is not None:
envelope_class = self._load_default_envelope(envelope)

function_name = function_name or "handle"
if hasattr(module, function_name):
entry_point, initialize_fn = self._get_function_entry_point(module, function_name)
else:
model_class_definitions = list_classes_from_module(module)
if len(model_class_definitions) != 1:
raise ValueError("Expected only one class in custom service code or a function entry point {}".format(
model_class_definitions))

model_class = model_class_definitions[0]
model_service = model_class()
handle = getattr(model_service, "handle")
if handle is None:
raise ValueError("Expect handle method in class {}".format(str(model_class)))

service = Service(model_name, model_dir, manifest, model_service.handle, gpu_id, batch_size)
initialize = getattr(model_service, "initialize")
if initialize is not None:
model_service.initialize(service.context)
else:
raise ValueError("Expect initialize method in class {}".format(str(model_class)))
entry_point, initialize_fn = self._get_class_entry_point(module)

if envelope_class is not None:
envelope_instance = envelope_class(entry_point)
entry_point = envelope_instance.handle

service = Service(model_name, model_dir, manifest, entry_point, gpu_id, batch_size)
service.context.metrics = metrics
initialize_fn(service.context)

return service

def _load_handler_file(self, handler):
temp = handler.split(":", 1)
module_name = temp[0]
function_name = None if len(temp) == 1 else temp[1]
if module_name.endswith(".py"):
module_name = module_name[:-3]
module_name = module_name.split("/")[-1]
module = importlib.import_module(module_name)
return module, function_name

def _load_default_handler(self, handler):
module_name = ".{0}".format(handler)
module = importlib.import_module(module_name, 'ts.torch_handler')
return module

def _load_default_envelope(self, envelope):
module_name = ".{0}".format(envelope)
module = importlib.import_module(module_name, 'ts.torch_handler.request_envelope')
envelope_class = list_classes_from_module(module)[0]
return envelope_class

def _get_function_entry_point(self, module, function_name):
entry_point = getattr(module, function_name)
initialize_fn = lambda ctx: entry_point(None, ctx)
return entry_point, initialize_fn

def _get_class_entry_point(self, module):
model_class_definitions = list_classes_from_module(module)
if len(model_class_definitions) != 1:
raise ValueError("Expected only one class in custom service code or a function entry point {}".format(
model_class_definitions))

model_class = model_class_definitions[0]
model_service = model_class()

if not hasattr(model_service, "handle"):
raise ValueError("Expect handle method in class {}".format(str(model_class)))

return model_service.handle, model_service.initialize
6 changes: 5 additions & 1 deletion ts/model_service_worker.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@ def load_model(load_model_request):
"modelName" : "name", string
"gpu" : None if CPU else gpu_id, int
"handler" : service handler entry point if provided, string
"envelope" : name of wrapper/unwrapper of request data if provided, string
"batchSize" : batch size, int
}
Expand All @@ -73,6 +74,9 @@ def load_model(load_model_request):
model_dir = load_model_request["modelPath"].decode("utf-8")
model_name = load_model_request["modelName"].decode("utf-8")
handler = load_model_request["handler"].decode("utf-8") if load_model_request["handler"] else None
envelope = load_model_request["envelope"].decode("utf-8") if "envelope" in load_model_request else None
envelope = envelope if envelope is not None and len(envelope) > 0 else None

batch_size = None
if "batchSize" in load_model_request:
batch_size = int(load_model_request["batchSize"])
Expand All @@ -82,7 +86,7 @@ def load_model(load_model_request):
gpu = int(load_model_request["gpu"])

model_loader = ModelLoaderFactory.get_model_loader()
service = model_loader.load(model_name, model_dir, handler, gpu, batch_size)
service = model_loader.load(model_name, model_dir, handler, gpu, batch_size, envelope)

logging.debug("Model %s loaded.", model_name)

Expand Down
3 changes: 3 additions & 0 deletions ts/protocol/otf_message_handler.py
Original file line number Diff line number Diff line change
Expand Up @@ -192,6 +192,9 @@ def _retrieve_load_msg(conn):
if gpu_id >= 0:
msg["gpu"] = gpu_id

length = _retrieve_int(conn)
msg["envelope"] = _retrieve_buffer(conn, length)

return msg


Expand Down
3 changes: 2 additions & 1 deletion ts/tests/unit_tests/test_model_service_worker.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,8 @@ def socket_patches(mocker):
b"\x00\x00\x00\x0a", b"model_path",
b"\x00\x00\x00\x01",
b"\x00\x00\x00\x07", b"handler",
b"\x00\x00\x00\x01"
b"\x00\x00\x00\x01",
b"\x00\x00\x00\x08", b"envelope",
]
return mock_patch

Expand Down
12 changes: 8 additions & 4 deletions ts/tests/unit_tests/test_otf_codec_protocol.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,15 +33,17 @@ def test_retrieve_msg_unknown(self, socket_patches):

def test_retrieve_msg_load_gpu(self, socket_patches):
expected = {"modelName": b"model_name", "modelPath": b"model_path",
"batchSize": 1, "handler": b"handler", "gpu": 1}
"batchSize": 1, "handler": b"handler", "gpu": 1,
"envelope": b"envelope"}

socket_patches.socket.recv.side_effect = [
b"L",
b"\x00\x00\x00\x0a", b"model_name",
b"\x00\x00\x00\x0a", b"model_path",
b"\x00\x00\x00\x01",
b"\x00\x00\x00\x07", b"handler",
b"\x00\x00\x00\x01"
b"\x00\x00\x00\x01",
b"\x00\x00\x00\x08", b"envelope"
]
cmd, ret = codec.retrieve_msg(socket_patches.socket)

Expand All @@ -50,15 +52,17 @@ def test_retrieve_msg_load_gpu(self, socket_patches):

def test_retrieve_msg_load_no_gpu(self, socket_patches):
expected = {"modelName": b"model_name", "modelPath": b"model_path",
"batchSize": 1, "handler": b"handler"}
"batchSize": 1, "handler": b"handler",
"envelope": b"envelope"}

socket_patches.socket.recv.side_effect = [
b"L",
b"\x00\x00\x00\x0a", b"model_name",
b"\x00\x00\x00\x0a", b"model_path",
b"\x00\x00\x00\x01",
b"\x00\x00\x00\x07", b"handler",
b"\xFF\xFF\xFF\xFF"
b"\xFF\xFF\xFF\xFF",
b"\x00\x00\x00\x08", b"envelope"
]
cmd, ret = codec.retrieve_msg(socket_patches.socket)

Expand Down
Empty file.
Loading

0 comments on commit abcaf1a

Please sign in to comment.