Automate apache-bench benchmarking

Fix connection and spawning issues, refactor utils Workaround for apt-update at beginning of instance launch Fix random disconnect issue Minor fixes for error propagation Fix with vgg16 model
LaurinmyReha · May 17, 2021 · 2e7ab60 · 2e7ab60
1 parent a3da6d8
commit 2e7ab60
Show file tree

Hide file tree

Showing 25 changed files with 2,500 additions and 3 deletions.
diff --git a/.gitignore b/.gitignore
@@ -8,4 +8,7 @@ dist/
 .coverage
 .github/actions/
 .github/.DS_Store
-.DS_Store
+.DS_Store
+frontend/server/src/main/java/org/pytorch/serve/grpc/
+.vscode
+.scratch/
diff --git a/benchmarks/config.properties b/benchmarks/config.properties
@@ -2,4 +2,7 @@ inference_address=http://0.0.0.0:8080
 management_address=http://0.0.0.0:8081
 
 number_of_netty_threads=32
-job_queue_size=1000
+job_queue_size=1000
+
+vmargs=-Xmx4g -XX:+ExitOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError
+prefer_direct_buffer=True
diff --git a/docker/Dockerfile.dev b/docker/Dockerfile.dev
@@ -52,7 +52,7 @@ ARG CUDA_VERSION
 RUN if [ "$MACHINE_TYPE" = "gpu" ]; then export USE_CUDA=1; fi \
     && git clone https://github.com/pytorch/serve.git \
     && cd serve \
-    && git checkout ${BRANCH_NAME} \
+    && git checkout --track origin/release_0.4.0 \
     && if [ -z "$CUDA_VERSION" ]; then python ts_scripts/install_dependencies.py --environment=dev; else python ts_scripts/install_dependencies.py --environment=dev  --cuda $CUDA_VERSION; fi \
     && python ts_scripts/install_from_src.py \
     && useradd -m model-server \

diff --git a/test/benchmark/README.md b/test/benchmark/README.md
@@ -0,0 +1,96 @@
+Note: the following benchmarking suite requires an AWS setup, and is an expensive operation involving several high-compute ec2 instances.
+
+This apache-bench based benchmark aims to do a multiple batch-size based inference benchmarking as per provided config in a *.yaml file. Each config represents one model.
+
+Check out a sample vgg11 model config at the path: `tests/suite/vgg11.yaml`
+
+### Setup
+
+* Ensure you have access to an AWS account i.e. [setup](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html) your environment such that awscli can access your account via either an IAM user or an IAM role. An IAM role is recommended for use with AWS. For the purposes of testing in your personal account, the following managed permissions should suffice: <br>
+-- [AmazonEC2ContainerRegistryFullAccess](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryFullAccess) <br>
+-- [AmazonEC2FullAccess](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/AmazonEC2FullAccess) <br>
+-- [AmazonS3FullAccess](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/AmazonS3FullAccess) <br>
+* [Create](https://docs.aws.amazon.com/cli/latest/reference/ecr/create-repository.html) an ECR repository with the name “torchserve-benchmark” in the us-west-2 region
+* Ensure you have [docker](https://docs.docker.com/get-docker/) client set-up on your system - osx/ec2
+* Adjust the following global variables to your preference in the file serve/test/benchmark/tests/utils/__init__.py <br>
+-- IAM_INSTANCE_PROFILE :this role is attached to all ec2 instances created as part of the benchmarking process. Create this as described [here](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html#create-iam-role). Default role name is 'EC2Admin'.<br>
+-- S3_BUCKET_BENCHMARK_ARTIFACTS :all temporary benchmarking artifacts including server logs will be stored in this bucket: <br>
+-- DEFAULT_DOCKER_DEV_ECR_REPO :docker image used for benchmarking will be pushed to this repo <br>
+
+The following steps assume that the current working directory is serve/. 
+
+1. Create or use any python virtual environment
+```
+python3 -m venv bvenv
+source bvenv/bin/activate
+```
+2. Install requirements for the benchmarking 
+```
+pip install -r test/benchmark/requirements.txt
+```
+3. Make sure that you've setup AWS account correctly
+```
+aws sts get-caller-identity
+```
+4. For each of the test files under `test/benchmark/tests/`, e.g., test_vgg11.py, set the list of instance types you want to test on:
+```
+INSTANCE_TYPES_TO_TEST = ["p3.8xlarge"]
+```
+5. The automation scripts uses the ts-config from the following location: `benchmarks/config.properties`. Make changes to this file in the current local folder to use this across all the runs.
+6. Finally, start the benchmark run as follows (run this a pseudo shell such as tmux or screen, as this is a long-running script):
+```
+python test/benchmark/run_benchmark.py
+```
+7. To start test for a particual model, modify the `pytest_args` list in run_benchmark.py to include `["-k", "vgg11"]`, if that particular model is vgg11
+8. For generating benchmarking report, modify the argument to function `generate_comprehensive_report()` to point to the s3 bucket uri for the benchmark run. Run the script as:
+```
+python report.py
+```
+The final benchmark report will be available in markdown format as `report.md` in the `serve/` folder. 
+
+**Example report for vgg16 model**
+
+
+### Benchmark report
+
+**vgg11 | eager_mode | c5.18xlarge | batch size 1**
+ | Benchmark |Model |Concurrency |Requests |TS failed requests |TS throughput |TS latency P50 |TS latency P90 |TS latency P99 |TS latency mean |TS error rate |Model_p50 |Model_p90 |Model_p99 |predict_mean |handler_time_mean |waiting_time_mean |worker_thread_mean |
+ |--- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | 
+ | AB | vgg11 | 100 | 1000 | 0 | 2.05 | 47419 | 54745 | 58781 | 48852.156 | 0.0 | 589.16 | 709.42 | 709.42 | 1905.05 | 1904.91 | 44589.48 | 1.09 | 
+
+**vgg11 | eager_mode | c5.18xlarge | batch size 8**
+ | Benchmark |Model |Concurrency |Requests |TS failed requests |TS throughput |TS latency P50 |TS latency P90 |TS latency P99 |TS latency mean |TS error rate |Model_p50 |Model_p90 |Model_p99 |predict_mean |handler_time_mean |waiting_time_mean |worker_thread_mean |
+ |--- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | 
+ | AB | vgg11 | 100 | 1000 | 0 | 8.11 | 12205 | 13162 | 14772 | 12334.135 | 0.0 | 3431.05 | 3525.94 | 3525.94 | 3872.42 | 3872.04 | 7958.16 | 53.27 | 
+
+**vgg11 | eager_mode | c5.18xlarge | batch size 4**
+ | Benchmark |Model |Concurrency |Requests |TS failed requests |TS throughput |TS latency P50 |TS latency P90 |TS latency P99 |TS latency mean |TS error rate |Model_p50 |Model_p90 |Model_p99 |predict_mean |handler_time_mean |waiting_time_mean |worker_thread_mean |
+ |--- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | 
+ | AB | vgg11 | 100 | 1000 | 0 | 5.55 | 17891 | 18936 | 19965 | 18017.484 | 0.0 | 2304.79 | 2412.98 | 2412.98 | 2820.51 | 2820.24 | 14423.28 | 52.17 | 
+
+**vgg11 | eager_mode | c5.18xlarge | batch size 2**
+ | Benchmark |Model |Concurrency |Requests |TS failed requests |TS throughput |TS latency P50 |TS latency P90 |TS latency P99 |TS latency mean |TS error rate |Model_p50 |Model_p90 |Model_p99 |predict_mean |handler_time_mean |waiting_time_mean |worker_thread_mean |
+ |--- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | 
+ | AB | vgg11 | 100 | 1000 | 0 | 3.51 | 28732 | 29900 | 30531 | 28520.545 | 0.0 | 748.97 | 1431.84 | 1431.84 | 2226.96 | 2226.79 | 25045.02 | 49.16 | 
+
+**vgg11 | scripted_mode | c5.18xlarge | batch size 1**
+ | Benchmark |Model |Concurrency |Requests |TS failed requests |TS throughput |TS latency P50 |TS latency P90 |TS latency P99 |TS latency mean |TS error rate |Model_p50 |Model_p90 |Model_p99 |predict_mean |handler_time_mean |waiting_time_mean |worker_thread_mean |
+ |--- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | 
+ | AB | vgg11 | 100 | 1000 | 0 | 2.06 | 48058 | 50794 | 51760 | 48618.091 | 0.0 | 874.51 | 1012.23 | 1012.23 | 1900.22 | 1900.11 | 44363.84 | 1.07 | 
+
+**vgg11 | scripted_mode | c5.18xlarge | batch size 4**
+ | Benchmark |Model |Concurrency |Requests |TS failed requests |TS throughput |TS latency P50 |TS latency P90 |TS latency P99 |TS latency mean |TS error rate |Model_p50 |Model_p90 |Model_p99 |predict_mean |handler_time_mean |waiting_time_mean |worker_thread_mean |
+ |--- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | 
+ | AB | vgg11 | 100 | 1000 | 0 | 5.50 | 18055 | 19159 | 19844 | 18171.083 | 0.0 | 2230.75 | 2316.4 | 2316.4 | 2846.7 | 2846.34 | 14550.68 | 51.29 | 
+
+**vgg11 | scripted_mode | c5.18xlarge | batch size 3**
+ | Benchmark |Model |Concurrency |Requests |TS failed requests |TS throughput |TS latency P50 |TS latency P90 |TS latency P99 |TS latency mean |TS error rate |Model_p50 |Model_p90 |Model_p99 |predict_mean |handler_time_mean |waiting_time_mean |worker_thread_mean |
+ |--- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | 
+ | AB | vgg11 | 100 | 1000 | 1 | 4.51 | 22138 | 23074 | 23792 | 22165.721 | 0.1 | 1804.03 | 2160.02 | 2160.02 | 2597.17 | 2597.01 | 18563.88 | 50.2 | 
+
+**vgg11 | scripted_mode | c5.18xlarge | batch size 2**
+ | Benchmark |Model |Concurrency |Requests |TS failed requests |TS throughput |TS latency P50 |TS latency P90 |TS latency P99 |TS latency mean |TS error rate |Model_p50 |Model_p90 |Model_p99 |predict_mean |handler_time_mean |waiting_time_mean |worker_thread_mean |
+ |--- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | 
+ | AB | vgg11 | 100 | 1000 | 0 | 3.47 | 28765 | 29849 | 30488 | 28781.227 | 0.0 | 1576.24 | 1758.28 | 1758.28 | 2249.52 | 2249.34 | 25210.43 | 46.77 | 
+
+
diff --git a/test/benchmark/requirements.txt b/test/benchmark/requirements.txt
@@ -0,0 +1,13 @@
+pytest
+pyyaml
+pytest-xdist
+boto3
+pytest-timeout
+pytest-rerunfailures
+retrying
+fabric2
+black
+gitpython
+docker
+pandas
+matplotlib
diff --git a/test/benchmark/run_benchmark.py b/test/benchmark/run_benchmark.py
@@ -0,0 +1,36 @@
+import os
+import random
+import sys
+import logging
+import re
+import uuid
+
+import boto3
+import pytest
+
+from invoke import run
+from invoke.context import Context
+
+LOGGER = logging.getLogger(__name__)
+LOGGER.setLevel(logging.DEBUG)
+LOGGER.addHandler(logging.StreamHandler(sys.stdout))
+
+
+def main():
+    # Run this script from the root directory 'serve', it changes directory below as required
+    os.chdir(os.path.join(os.getcwd(), "test", "benchmark"))
+
+    execution_id = f"ts-benchmark-run-{str(uuid.uuid4())}"
+
+    test_path = os.path.join(os.getcwd(), "tests")
+    LOGGER.info(f"Running tests from directory: {test_path}")
+
+    pytest_args = ["-s", "-rA", test_path, "-n=4", "--disable-warnings", "-v", "--execution-id", execution_id]
+
+    LOGGER.info(f"Running pytest")
+
+    pytest.main(pytest_args)
+
+
+if __name__ == "__main__":
+    main()