forked from pytorch/serve
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: Add custom metrics example with mtail exporter
- Loading branch information
jagadeesh
committed
Sep 1, 2021
1 parent
8903ca1
commit 4197c59
Showing
2 changed files
with
123 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,100 @@ | ||
# Monitoring Torchserve custom metrics with mtail metrics exporter and prometheus | ||
|
||
In this example, we show how to use a pre-trained custom MNIST model and export the custom metrics using mtail and prometheus | ||
|
||
We used the following pytorch example to train the basic MNIST model for digit recognition : https://github.com/pytorch/examples/tree/master/mnist | ||
|
||
Run the commands given in following steps from the parent directory of the root of the repository. For example, if you cloned the repository into /home/my_path/serve, run the steps from /home/my_path | ||
|
||
## Steps | ||
|
||
- Step 1: By default prediction time and handler time are the two custom metrics added in the base handler. | ||
New metrics can be added to handler in the handle method. In this example we export the `HandlerTime` and `PredictionTime` using mtail. | ||
|
||
```python | ||
def handle(self, data, context): | ||
start_time = time.time() | ||
|
||
self.context = context | ||
metrics = self.context.metrics | ||
|
||
# <-------- Handler code --------> | ||
|
||
stop_time = time.time() | ||
metrics.add_time('HandlerTime', round((stop_time - start_time) * 1000, 2), None, 'ms') | ||
return output | ||
``` | ||
Base handler with `HandlerTime` custom metric - https://github.com/pytorch/serve/blob/master/ts/torch_handler/base_handler.py | ||
|
||
Refer: [Custom Metrics](https://github.com/pytorch/serve/blob/master/docs/metrics.md#custom-metrics-api) | ||
Refer: [Custom Handler](https://github.com/pytorch/serve/blob/master/docs/custom_service.md#custom-handlers) | ||
|
||
- Step 2: Create a torch model archive using the torch-model-archiver utility to archive the above files. | ||
|
||
```bash | ||
torch-model-archiver --model-name mnist --version 1.0 --model-file examples/image_classifier/mnist/mnist.py --serialized-file examples/image_classifier/mnist/mnist_cnn.pt --handler examples/image_classifier/mnist/mnist_handler.py | ||
``` | ||
|
||
- Step 3: Register the model on TorchServe using the above model archive file. | ||
|
||
```bash | ||
mkdir model_store | ||
mv mnist.mar model_store/ | ||
torchserve --start --model-store model_store --models mnist=mnist.mar | ||
``` | ||
|
||
- Step 4: Install [mtail](https://github.com/google/mtail/releases) | ||
|
||
```bash | ||
wget https://github.com/google/mtail/releases/download/v3.0.0-rc47/mtail_3.0.0-rc47_Linux_x86_64.tar.gz | ||
tar -xvzf mtail_3.0.0-rc47_Linux_x86_64.tar.gz | ||
chmod +x mtail | ||
``` | ||
|
||
- Step 5: Create a mtail program. In this example we using a program to export default custom metrics. | ||
|
||
Refer: [mtail Programming Guide](https://google.github.io/mtail/Programming-Guide.html). | ||
|
||
- Step 6: Start mtail export by running the below command | ||
|
||
```bash | ||
./mtail --progs examples/custom_metrics/torchserve_custom.mtail --logs logs/model_metrics.log | ||
``` | ||
|
||
The mtail program parses the log file extracts info by matching patterns and presents as JSON, Prometheus and other databases. https://google.github.io/mtail/Interoperability.html | ||
|
||
- Step 7: Make Inference request | ||
|
||
```bash | ||
curl http://127.0.0.1:8080/predictions/mnist -T examples/image_classifier/mnist/test_data/0.png | ||
``` | ||
|
||
The inference request logs the time taken for prediction to the model_metrics.log file. | ||
Mtail parses the file and is served at 3903 port | ||
|
||
http://localhost:3903 | ||
|
||
- Step 8: Sart Prometheus with mtailtarget added to scarpe config | ||
|
||
* Download [Prometheus](https://prometheus.io/download/) | ||
|
||
* Add mtail target added to scrape config in the config file | ||
|
||
```yaml | ||
scrape_configs: | ||
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config. | ||
- job_name: "prometheus" | ||
|
||
# metrics_path defaults to '/metrics' | ||
# scheme defaults to 'http'. | ||
|
||
static_configs: | ||
- targets: ["localhost:9090", "localhost:3903"] | ||
``` | ||
* Start Prometheus with config file | ||
```bash | ||
./prometheus --config.file prometheus.yml | ||
``` | ||
|
||
The exported logs from mtail are scraped by prometheus on 3903 port. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
counter request_count | ||
gauge prediction_time | ||
gauge model_name | ||
gauge level | ||
gauge host_name | ||
gauge request_id | ||
gauge time_stamp | ||
|
||
# Sample log | ||
# 2021-08-27 21:15:03,376 - PredictionTime.Milliseconds:109.74|#ModelName:bert,Level:Model|#hostname:ubuntu-ThinkPad-E14,requestID:09ed6c2c-9380-480d-a61a-66bfea958c1d,timestamp:1630079103 | ||
# 2021-08-27 21:15:03,376 - HandlerTime.Milliseconds:109.74|#ModelName:bert,Level:Model|#hostname:ubuntu-ThinkPad-E14,requestID:09ed6c2c-9380-480d-a61a-66bfea958c1d,timestamp:1630079103 | ||
|
||
const HANDLER_PATTERN /HandlerTime\.Milliseconds:(\d+\.\d+)\|#ModelName:([a-zA-Z]+),Level:([a-zA-Z]+)\|#hostname:([a-zA-Z0-9-]+),requestID:([a-zA-Z0-9-]+),timestamp:([0-9]+)/ | ||
|
||
HANDLER_PATTERN{ | ||
request_count++ | ||
prediction_time = $1 | ||
model_name = $2 | ||
level = $3 | ||
host_name = $4 | ||
request_id = $5 | ||
time_stamp = $6 | ||
} |