Skip to content

Commit

Permalink
feat: Add custom metrics example with mtail exporter
Browse files Browse the repository at this point in the history
  • Loading branch information
jagadeesh committed Sep 1, 2021
1 parent 8903ca1 commit 4197c59
Show file tree
Hide file tree
Showing 2 changed files with 123 additions and 0 deletions.
100 changes: 100 additions & 0 deletions examples/custom_metrics/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
# Monitoring Torchserve custom metrics with mtail metrics exporter and prometheus

In this example, we show how to use a pre-trained custom MNIST model and export the custom metrics using mtail and prometheus

We used the following pytorch example to train the basic MNIST model for digit recognition : https://github.com/pytorch/examples/tree/master/mnist

Run the commands given in following steps from the parent directory of the root of the repository. For example, if you cloned the repository into /home/my_path/serve, run the steps from /home/my_path

## Steps

- Step 1: By default prediction time and handler time are the two custom metrics added in the base handler.
New metrics can be added to handler in the handle method. In this example we export the `HandlerTime` and `PredictionTime` using mtail.

```python
def handle(self, data, context):
start_time = time.time()

self.context = context
metrics = self.context.metrics

# <-------- Handler code -------->

stop_time = time.time()
metrics.add_time('HandlerTime', round((stop_time - start_time) * 1000, 2), None, 'ms')
return output
```
Base handler with `HandlerTime` custom metric - https://github.com/pytorch/serve/blob/master/ts/torch_handler/base_handler.py

Refer: [Custom Metrics](https://github.com/pytorch/serve/blob/master/docs/metrics.md#custom-metrics-api)
Refer: [Custom Handler](https://github.com/pytorch/serve/blob/master/docs/custom_service.md#custom-handlers)

- Step 2: Create a torch model archive using the torch-model-archiver utility to archive the above files.

```bash
torch-model-archiver --model-name mnist --version 1.0 --model-file examples/image_classifier/mnist/mnist.py --serialized-file examples/image_classifier/mnist/mnist_cnn.pt --handler examples/image_classifier/mnist/mnist_handler.py
```

- Step 3: Register the model on TorchServe using the above model archive file.

```bash
mkdir model_store
mv mnist.mar model_store/
torchserve --start --model-store model_store --models mnist=mnist.mar
```

- Step 4: Install [mtail](https://github.com/google/mtail/releases)

```bash
wget https://github.com/google/mtail/releases/download/v3.0.0-rc47/mtail_3.0.0-rc47_Linux_x86_64.tar.gz
tar -xvzf mtail_3.0.0-rc47_Linux_x86_64.tar.gz
chmod +x mtail
```

- Step 5: Create a mtail program. In this example we using a program to export default custom metrics.

Refer: [mtail Programming Guide](https://google.github.io/mtail/Programming-Guide.html).

- Step 6: Start mtail export by running the below command

```bash
./mtail --progs examples/custom_metrics/torchserve_custom.mtail --logs logs/model_metrics.log
```

The mtail program parses the log file extracts info by matching patterns and presents as JSON, Prometheus and other databases. https://google.github.io/mtail/Interoperability.html

- Step 7: Make Inference request

```bash
curl http://127.0.0.1:8080/predictions/mnist -T examples/image_classifier/mnist/test_data/0.png
```

The inference request logs the time taken for prediction to the model_metrics.log file.
Mtail parses the file and is served at 3903 port

http://localhost:3903

- Step 8: Sart Prometheus with mtailtarget added to scarpe config

* Download [Prometheus](https://prometheus.io/download/)

* Add mtail target added to scrape config in the config file

```yaml
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"

# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.

static_configs:
- targets: ["localhost:9090", "localhost:3903"]
```
* Start Prometheus with config file
```bash
./prometheus --config.file prometheus.yml
```

The exported logs from mtail are scraped by prometheus on 3903 port.
23 changes: 23 additions & 0 deletions examples/custom_metrics/torchserve_custom.mtail
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
counter request_count
gauge prediction_time
gauge model_name
gauge level
gauge host_name
gauge request_id
gauge time_stamp

# Sample log
# 2021-08-27 21:15:03,376 - PredictionTime.Milliseconds:109.74|#ModelName:bert,Level:Model|#hostname:ubuntu-ThinkPad-E14,requestID:09ed6c2c-9380-480d-a61a-66bfea958c1d,timestamp:1630079103
# 2021-08-27 21:15:03,376 - HandlerTime.Milliseconds:109.74|#ModelName:bert,Level:Model|#hostname:ubuntu-ThinkPad-E14,requestID:09ed6c2c-9380-480d-a61a-66bfea958c1d,timestamp:1630079103

const HANDLER_PATTERN /HandlerTime\.Milliseconds:(\d+\.\d+)\|#ModelName:([a-zA-Z]+),Level:([a-zA-Z]+)\|#hostname:([a-zA-Z0-9-]+),requestID:([a-zA-Z0-9-]+),timestamp:([0-9]+)/

HANDLER_PATTERN{
request_count++
prediction_time = $1
model_name = $2
level = $3
host_name = $4
request_id = $5
time_stamp = $6
}

0 comments on commit 4197c59

Please sign in to comment.