feat: Add custom metrics example with mtail exporter

LaurinmyReha · Sep 1, 2021 · 4197c59 · 4197c59
1 parent 8903ca1
commit 4197c59
Show file tree

Hide file tree

Showing 2 changed files with 123 additions and 0 deletions.
diff --git a/examples/custom_metrics/README.md b/examples/custom_metrics/README.md
@@ -0,0 +1,100 @@
+# Monitoring Torchserve custom metrics with mtail metrics exporter and prometheus
+
+In this example, we show how to use a pre-trained custom MNIST model and export the custom metrics using mtail and prometheus
+
+We used the following pytorch example to train the basic MNIST model for digit recognition : https://github.com/pytorch/examples/tree/master/mnist
+
+Run the commands given in following steps from the parent directory of the root of the repository. For example, if you cloned the repository into /home/my_path/serve, run the steps from /home/my_path
+
+## Steps
+
+- Step 1: By default prediction time and handler time are the two custom metrics added in the base handler.
+New metrics can be added to handler in the handle method. In this example we export the `HandlerTime` and `PredictionTime` using mtail.
+
+  ```python
+  def handle(self, data, context):
+      start_time = time.time()
+
+      self.context = context
+      metrics = self.context.metrics
+
+      # <-------- Handler code -------->
+
+      stop_time = time.time()
+      metrics.add_time('HandlerTime', round((stop_time - start_time) * 1000, 2), None, 'ms')
+      return output
+  ```
+  Base handler with `HandlerTime` custom metric - https://github.com/pytorch/serve/blob/master/ts/torch_handler/base_handler.py
+
+  Refer: [Custom Metrics](https://github.com/pytorch/serve/blob/master/docs/metrics.md#custom-metrics-api)
+  Refer: [Custom Handler](https://github.com/pytorch/serve/blob/master/docs/custom_service.md#custom-handlers)
+
+- Step 2: Create a torch model archive using the torch-model-archiver utility to archive the above files.
+
+  ```bash
+  torch-model-archiver --model-name mnist --version 1.0 --model-file examples/image_classifier/mnist/mnist.py --serialized-file examples/image_classifier/mnist/mnist_cnn.pt --handler  examples/image_classifier/mnist/mnist_handler.py
+  ```
+
+- Step 3: Register the model on TorchServe using the above model archive file.
+
+  ```bash
+  mkdir model_store
+  mv mnist.mar model_store/
+  torchserve --start --model-store model_store --models mnist=mnist.mar
+  ```
+
+- Step 4: Install [mtail](https://github.com/google/mtail/releases)
+
+  ```bash
+  wget https://github.com/google/mtail/releases/download/v3.0.0-rc47/mtail_3.0.0-rc47_Linux_x86_64.tar.gz
+  tar -xvzf mtail_3.0.0-rc47_Linux_x86_64.tar.gz
+  chmod +x mtail
+  ```
+
+- Step 5: Create a mtail program. In this example we using a program to export default custom metrics.
+
+  Refer: [mtail Programming Guide](https://google.github.io/mtail/Programming-Guide.html). 
+
+- Step 6: Start mtail export by running the below command
+
+  ```bash
+  ./mtail --progs examples/custom_metrics/torchserve_custom.mtail --logs logs/model_metrics.log
+  ```
+
+  The mtail program parses the log file extracts info by matching patterns and presents as JSON, Prometheus and other databases. https://google.github.io/mtail/Interoperability.html
+
+- Step 7: Make Inference request
+
+  ```bash
+  curl http://127.0.0.1:8080/predictions/mnist -T examples/image_classifier/mnist/test_data/0.png
+  ```
+
+  The inference request logs the time taken for prediction to the model_metrics.log file.
+  Mtail parses the file and is served at 3903 port
+
+  http://localhost:3903
+
+- Step 8: Sart Prometheus with mtailtarget added to scarpe config
+
+  * Download [Prometheus](https://prometheus.io/download/) 
+
+  * Add mtail target added to scrape config in the config file
+
+  ```yaml
+  scrape_configs:
+  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
+  - job_name: "prometheus"
+
+    # metrics_path defaults to '/metrics'
+    # scheme defaults to 'http'.
+
+    static_configs:
+      - targets: ["localhost:9090", "localhost:3903"]
+  ```
+  * Start Prometheus with config file
+
+  ```bash
+  ./prometheus --config.file prometheus.yml
+  ```
+
+  The exported logs from mtail are scraped by prometheus on 3903 port.
diff --git a/examples/custom_metrics/torchserve_custom.mtail b/examples/custom_metrics/torchserve_custom.mtail
@@ -0,0 +1,23 @@
+counter request_count
+gauge prediction_time
+gauge model_name
+gauge level
+gauge host_name
+gauge request_id
+gauge time_stamp
+
+# Sample log
+# 2021-08-27 21:15:03,376 - PredictionTime.Milliseconds:109.74|#ModelName:bert,Level:Model|#hostname:ubuntu-ThinkPad-E14,requestID:09ed6c2c-9380-480d-a61a-66bfea958c1d,timestamp:1630079103
+# 2021-08-27 21:15:03,376 - HandlerTime.Milliseconds:109.74|#ModelName:bert,Level:Model|#hostname:ubuntu-ThinkPad-E14,requestID:09ed6c2c-9380-480d-a61a-66bfea958c1d,timestamp:1630079103
+
+const HANDLER_PATTERN /HandlerTime\.Milliseconds:(\d+\.\d+)\|#ModelName:([a-zA-Z]+),Level:([a-zA-Z]+)\|#hostname:([a-zA-Z0-9-]+),requestID:([a-zA-Z0-9-]+),timestamp:([0-9]+)/
+
+HANDLER_PATTERN{
+  request_count++
+  prediction_time = $1
+  model_name = $2
+  level = $3
+  host_name = $4
+  request_id = $5
+  time_stamp = $6
+}