Skip to content

Commit

Permalink
Automate apache-bench benchmarking
Browse files Browse the repository at this point in the history
Fix connection and spawning issues, refactor utils

Workaround for apt-update at beginning of instance launch

Fix random disconnect issue

Minor fixes for error propagation

Fix with vgg16 model
  • Loading branch information
Nikhil Kulkarni committed May 17, 2021
1 parent a3da6d8 commit 2e7ab60
Show file tree
Hide file tree
Showing 25 changed files with 2,500 additions and 3 deletions.
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,7 @@ dist/
.coverage
.github/actions/
.github/.DS_Store
.DS_Store
.DS_Store
frontend/server/src/main/java/org/pytorch/serve/grpc/
.vscode
.scratch/
5 changes: 4 additions & 1 deletion benchmarks/config.properties
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,7 @@ inference_address=http://0.0.0.0:8080
management_address=http://0.0.0.0:8081

number_of_netty_threads=32
job_queue_size=1000
job_queue_size=1000

vmargs=-Xmx4g -XX:+ExitOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError
prefer_direct_buffer=True
2 changes: 1 addition & 1 deletion docker/Dockerfile.dev
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ ARG CUDA_VERSION
RUN if [ "$MACHINE_TYPE" = "gpu" ]; then export USE_CUDA=1; fi \
&& git clone https://github.com/pytorch/serve.git \
&& cd serve \
&& git checkout ${BRANCH_NAME} \
&& git checkout --track origin/release_0.4.0 \
&& if [ -z "$CUDA_VERSION" ]; then python ts_scripts/install_dependencies.py --environment=dev; else python ts_scripts/install_dependencies.py --environment=dev --cuda $CUDA_VERSION; fi \
&& python ts_scripts/install_from_src.py \
&& useradd -m model-server \
Expand Down
96 changes: 96 additions & 0 deletions test/benchmark/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
Note: the following benchmarking suite requires an AWS setup, and is an expensive operation involving several high-compute ec2 instances.

This apache-bench based benchmark aims to do a multiple batch-size based inference benchmarking as per provided config in a *.yaml file. Each config represents one model.

Check out a sample vgg11 model config at the path: `tests/suite/vgg11.yaml`

### Setup

* Ensure you have access to an AWS account i.e. [setup](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html) your environment such that awscli can access your account via either an IAM user or an IAM role. An IAM role is recommended for use with AWS. For the purposes of testing in your personal account, the following managed permissions should suffice: <br>
-- [AmazonEC2ContainerRegistryFullAccess](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryFullAccess) <br>
-- [AmazonEC2FullAccess](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/AmazonEC2FullAccess) <br>
-- [AmazonS3FullAccess](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/AmazonS3FullAccess) <br>
* [Create](https://docs.aws.amazon.com/cli/latest/reference/ecr/create-repository.html) an ECR repository with the name “torchserve-benchmark” in the us-west-2 region
* Ensure you have [docker](https://docs.docker.com/get-docker/) client set-up on your system - osx/ec2
* Adjust the following global variables to your preference in the file serve/test/benchmark/tests/utils/__init__.py <br>
-- IAM_INSTANCE_PROFILE :this role is attached to all ec2 instances created as part of the benchmarking process. Create this as described [here](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html#create-iam-role). Default role name is 'EC2Admin'.<br>
-- S3_BUCKET_BENCHMARK_ARTIFACTS :all temporary benchmarking artifacts including server logs will be stored in this bucket: <br>
-- DEFAULT_DOCKER_DEV_ECR_REPO :docker image used for benchmarking will be pushed to this repo <br>

The following steps assume that the current working directory is serve/.

1. Create or use any python virtual environment
```
python3 -m venv bvenv
source bvenv/bin/activate
```
2. Install requirements for the benchmarking
```
pip install -r test/benchmark/requirements.txt
```
3. Make sure that you've setup AWS account correctly
```
aws sts get-caller-identity
```
4. For each of the test files under `test/benchmark/tests/`, e.g., test_vgg11.py, set the list of instance types you want to test on:
```
INSTANCE_TYPES_TO_TEST = ["p3.8xlarge"]
```
5. The automation scripts uses the ts-config from the following location: `benchmarks/config.properties`. Make changes to this file in the current local folder to use this across all the runs.
6. Finally, start the benchmark run as follows (run this a pseudo shell such as tmux or screen, as this is a long-running script):
```
python test/benchmark/run_benchmark.py
```
7. To start test for a particual model, modify the `pytest_args` list in run_benchmark.py to include `["-k", "vgg11"]`, if that particular model is vgg11
8. For generating benchmarking report, modify the argument to function `generate_comprehensive_report()` to point to the s3 bucket uri for the benchmark run. Run the script as:
```
python report.py
```
The final benchmark report will be available in markdown format as `report.md` in the `serve/` folder.

**Example report for vgg16 model**


### Benchmark report

**vgg11 | eager_mode | c5.18xlarge | batch size 1**
| Benchmark |Model |Concurrency |Requests |TS failed requests |TS throughput |TS latency P50 |TS latency P90 |TS latency P99 |TS latency mean |TS error rate |Model_p50 |Model_p90 |Model_p99 |predict_mean |handler_time_mean |waiting_time_mean |worker_thread_mean |
|--- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| AB | vgg11 | 100 | 1000 | 0 | 2.05 | 47419 | 54745 | 58781 | 48852.156 | 0.0 | 589.16 | 709.42 | 709.42 | 1905.05 | 1904.91 | 44589.48 | 1.09 |

**vgg11 | eager_mode | c5.18xlarge | batch size 8**
| Benchmark |Model |Concurrency |Requests |TS failed requests |TS throughput |TS latency P50 |TS latency P90 |TS latency P99 |TS latency mean |TS error rate |Model_p50 |Model_p90 |Model_p99 |predict_mean |handler_time_mean |waiting_time_mean |worker_thread_mean |
|--- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| AB | vgg11 | 100 | 1000 | 0 | 8.11 | 12205 | 13162 | 14772 | 12334.135 | 0.0 | 3431.05 | 3525.94 | 3525.94 | 3872.42 | 3872.04 | 7958.16 | 53.27 |

**vgg11 | eager_mode | c5.18xlarge | batch size 4**
| Benchmark |Model |Concurrency |Requests |TS failed requests |TS throughput |TS latency P50 |TS latency P90 |TS latency P99 |TS latency mean |TS error rate |Model_p50 |Model_p90 |Model_p99 |predict_mean |handler_time_mean |waiting_time_mean |worker_thread_mean |
|--- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| AB | vgg11 | 100 | 1000 | 0 | 5.55 | 17891 | 18936 | 19965 | 18017.484 | 0.0 | 2304.79 | 2412.98 | 2412.98 | 2820.51 | 2820.24 | 14423.28 | 52.17 |

**vgg11 | eager_mode | c5.18xlarge | batch size 2**
| Benchmark |Model |Concurrency |Requests |TS failed requests |TS throughput |TS latency P50 |TS latency P90 |TS latency P99 |TS latency mean |TS error rate |Model_p50 |Model_p90 |Model_p99 |predict_mean |handler_time_mean |waiting_time_mean |worker_thread_mean |
|--- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| AB | vgg11 | 100 | 1000 | 0 | 3.51 | 28732 | 29900 | 30531 | 28520.545 | 0.0 | 748.97 | 1431.84 | 1431.84 | 2226.96 | 2226.79 | 25045.02 | 49.16 |

**vgg11 | scripted_mode | c5.18xlarge | batch size 1**
| Benchmark |Model |Concurrency |Requests |TS failed requests |TS throughput |TS latency P50 |TS latency P90 |TS latency P99 |TS latency mean |TS error rate |Model_p50 |Model_p90 |Model_p99 |predict_mean |handler_time_mean |waiting_time_mean |worker_thread_mean |
|--- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| AB | vgg11 | 100 | 1000 | 0 | 2.06 | 48058 | 50794 | 51760 | 48618.091 | 0.0 | 874.51 | 1012.23 | 1012.23 | 1900.22 | 1900.11 | 44363.84 | 1.07 |

**vgg11 | scripted_mode | c5.18xlarge | batch size 4**
| Benchmark |Model |Concurrency |Requests |TS failed requests |TS throughput |TS latency P50 |TS latency P90 |TS latency P99 |TS latency mean |TS error rate |Model_p50 |Model_p90 |Model_p99 |predict_mean |handler_time_mean |waiting_time_mean |worker_thread_mean |
|--- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| AB | vgg11 | 100 | 1000 | 0 | 5.50 | 18055 | 19159 | 19844 | 18171.083 | 0.0 | 2230.75 | 2316.4 | 2316.4 | 2846.7 | 2846.34 | 14550.68 | 51.29 |

**vgg11 | scripted_mode | c5.18xlarge | batch size 3**
| Benchmark |Model |Concurrency |Requests |TS failed requests |TS throughput |TS latency P50 |TS latency P90 |TS latency P99 |TS latency mean |TS error rate |Model_p50 |Model_p90 |Model_p99 |predict_mean |handler_time_mean |waiting_time_mean |worker_thread_mean |
|--- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| AB | vgg11 | 100 | 1000 | 1 | 4.51 | 22138 | 23074 | 23792 | 22165.721 | 0.1 | 1804.03 | 2160.02 | 2160.02 | 2597.17 | 2597.01 | 18563.88 | 50.2 |

**vgg11 | scripted_mode | c5.18xlarge | batch size 2**
| Benchmark |Model |Concurrency |Requests |TS failed requests |TS throughput |TS latency P50 |TS latency P90 |TS latency P99 |TS latency mean |TS error rate |Model_p50 |Model_p90 |Model_p99 |predict_mean |handler_time_mean |waiting_time_mean |worker_thread_mean |
|--- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| AB | vgg11 | 100 | 1000 | 0 | 3.47 | 28765 | 29849 | 30488 | 28781.227 | 0.0 | 1576.24 | 1758.28 | 1758.28 | 2249.52 | 2249.34 | 25210.43 | 46.77 |


13 changes: 13 additions & 0 deletions test/benchmark/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
pytest
pyyaml
pytest-xdist
boto3
pytest-timeout
pytest-rerunfailures
retrying
fabric2
black
gitpython
docker
pandas
matplotlib
36 changes: 36 additions & 0 deletions test/benchmark/run_benchmark.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
import os
import random
import sys
import logging
import re
import uuid

import boto3
import pytest

from invoke import run
from invoke.context import Context

LOGGER = logging.getLogger(__name__)
LOGGER.setLevel(logging.DEBUG)
LOGGER.addHandler(logging.StreamHandler(sys.stdout))


def main():
# Run this script from the root directory 'serve', it changes directory below as required
os.chdir(os.path.join(os.getcwd(), "test", "benchmark"))

execution_id = f"ts-benchmark-run-{str(uuid.uuid4())}"

test_path = os.path.join(os.getcwd(), "tests")
LOGGER.info(f"Running tests from directory: {test_path}")

pytest_args = ["-s", "-rA", test_path, "-n=4", "--disable-warnings", "-v", "--execution-id", execution_id]

LOGGER.info(f"Running pytest")

pytest.main(pytest_args)


if __name__ == "__main__":
main()
Loading

0 comments on commit 2e7ab60

Please sign in to comment.