forked from pytorch/serve
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix connection and spawning issues, refactor utils Workaround for apt-update at beginning of instance launch Fix random disconnect issue Minor fixes for error propagation Fix with vgg16 model
- Loading branch information
Nikhil Kulkarni
committed
May 17, 2021
1 parent
a3da6d8
commit 2e7ab60
Showing
25 changed files
with
2,500 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,96 @@ | ||
Note: the following benchmarking suite requires an AWS setup, and is an expensive operation involving several high-compute ec2 instances. | ||
|
||
This apache-bench based benchmark aims to do a multiple batch-size based inference benchmarking as per provided config in a *.yaml file. Each config represents one model. | ||
|
||
Check out a sample vgg11 model config at the path: `tests/suite/vgg11.yaml` | ||
|
||
### Setup | ||
|
||
* Ensure you have access to an AWS account i.e. [setup](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html) your environment such that awscli can access your account via either an IAM user or an IAM role. An IAM role is recommended for use with AWS. For the purposes of testing in your personal account, the following managed permissions should suffice: <br> | ||
-- [AmazonEC2ContainerRegistryFullAccess](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryFullAccess) <br> | ||
-- [AmazonEC2FullAccess](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/AmazonEC2FullAccess) <br> | ||
-- [AmazonS3FullAccess](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/AmazonS3FullAccess) <br> | ||
* [Create](https://docs.aws.amazon.com/cli/latest/reference/ecr/create-repository.html) an ECR repository with the name “torchserve-benchmark” in the us-west-2 region | ||
* Ensure you have [docker](https://docs.docker.com/get-docker/) client set-up on your system - osx/ec2 | ||
* Adjust the following global variables to your preference in the file serve/test/benchmark/tests/utils/__init__.py <br> | ||
-- IAM_INSTANCE_PROFILE :this role is attached to all ec2 instances created as part of the benchmarking process. Create this as described [here](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html#create-iam-role). Default role name is 'EC2Admin'.<br> | ||
-- S3_BUCKET_BENCHMARK_ARTIFACTS :all temporary benchmarking artifacts including server logs will be stored in this bucket: <br> | ||
-- DEFAULT_DOCKER_DEV_ECR_REPO :docker image used for benchmarking will be pushed to this repo <br> | ||
|
||
The following steps assume that the current working directory is serve/. | ||
|
||
1. Create or use any python virtual environment | ||
``` | ||
python3 -m venv bvenv | ||
source bvenv/bin/activate | ||
``` | ||
2. Install requirements for the benchmarking | ||
``` | ||
pip install -r test/benchmark/requirements.txt | ||
``` | ||
3. Make sure that you've setup AWS account correctly | ||
``` | ||
aws sts get-caller-identity | ||
``` | ||
4. For each of the test files under `test/benchmark/tests/`, e.g., test_vgg11.py, set the list of instance types you want to test on: | ||
``` | ||
INSTANCE_TYPES_TO_TEST = ["p3.8xlarge"] | ||
``` | ||
5. The automation scripts uses the ts-config from the following location: `benchmarks/config.properties`. Make changes to this file in the current local folder to use this across all the runs. | ||
6. Finally, start the benchmark run as follows (run this a pseudo shell such as tmux or screen, as this is a long-running script): | ||
``` | ||
python test/benchmark/run_benchmark.py | ||
``` | ||
7. To start test for a particual model, modify the `pytest_args` list in run_benchmark.py to include `["-k", "vgg11"]`, if that particular model is vgg11 | ||
8. For generating benchmarking report, modify the argument to function `generate_comprehensive_report()` to point to the s3 bucket uri for the benchmark run. Run the script as: | ||
``` | ||
python report.py | ||
``` | ||
The final benchmark report will be available in markdown format as `report.md` in the `serve/` folder. | ||
|
||
**Example report for vgg16 model** | ||
|
||
|
||
### Benchmark report | ||
|
||
**vgg11 | eager_mode | c5.18xlarge | batch size 1** | ||
| Benchmark |Model |Concurrency |Requests |TS failed requests |TS throughput |TS latency P50 |TS latency P90 |TS latency P99 |TS latency mean |TS error rate |Model_p50 |Model_p90 |Model_p99 |predict_mean |handler_time_mean |waiting_time_mean |worker_thread_mean | | ||
|--- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | ||
| AB | vgg11 | 100 | 1000 | 0 | 2.05 | 47419 | 54745 | 58781 | 48852.156 | 0.0 | 589.16 | 709.42 | 709.42 | 1905.05 | 1904.91 | 44589.48 | 1.09 | | ||
|
||
**vgg11 | eager_mode | c5.18xlarge | batch size 8** | ||
| Benchmark |Model |Concurrency |Requests |TS failed requests |TS throughput |TS latency P50 |TS latency P90 |TS latency P99 |TS latency mean |TS error rate |Model_p50 |Model_p90 |Model_p99 |predict_mean |handler_time_mean |waiting_time_mean |worker_thread_mean | | ||
|--- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | ||
| AB | vgg11 | 100 | 1000 | 0 | 8.11 | 12205 | 13162 | 14772 | 12334.135 | 0.0 | 3431.05 | 3525.94 | 3525.94 | 3872.42 | 3872.04 | 7958.16 | 53.27 | | ||
|
||
**vgg11 | eager_mode | c5.18xlarge | batch size 4** | ||
| Benchmark |Model |Concurrency |Requests |TS failed requests |TS throughput |TS latency P50 |TS latency P90 |TS latency P99 |TS latency mean |TS error rate |Model_p50 |Model_p90 |Model_p99 |predict_mean |handler_time_mean |waiting_time_mean |worker_thread_mean | | ||
|--- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | ||
| AB | vgg11 | 100 | 1000 | 0 | 5.55 | 17891 | 18936 | 19965 | 18017.484 | 0.0 | 2304.79 | 2412.98 | 2412.98 | 2820.51 | 2820.24 | 14423.28 | 52.17 | | ||
|
||
**vgg11 | eager_mode | c5.18xlarge | batch size 2** | ||
| Benchmark |Model |Concurrency |Requests |TS failed requests |TS throughput |TS latency P50 |TS latency P90 |TS latency P99 |TS latency mean |TS error rate |Model_p50 |Model_p90 |Model_p99 |predict_mean |handler_time_mean |waiting_time_mean |worker_thread_mean | | ||
|--- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | ||
| AB | vgg11 | 100 | 1000 | 0 | 3.51 | 28732 | 29900 | 30531 | 28520.545 | 0.0 | 748.97 | 1431.84 | 1431.84 | 2226.96 | 2226.79 | 25045.02 | 49.16 | | ||
|
||
**vgg11 | scripted_mode | c5.18xlarge | batch size 1** | ||
| Benchmark |Model |Concurrency |Requests |TS failed requests |TS throughput |TS latency P50 |TS latency P90 |TS latency P99 |TS latency mean |TS error rate |Model_p50 |Model_p90 |Model_p99 |predict_mean |handler_time_mean |waiting_time_mean |worker_thread_mean | | ||
|--- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | ||
| AB | vgg11 | 100 | 1000 | 0 | 2.06 | 48058 | 50794 | 51760 | 48618.091 | 0.0 | 874.51 | 1012.23 | 1012.23 | 1900.22 | 1900.11 | 44363.84 | 1.07 | | ||
|
||
**vgg11 | scripted_mode | c5.18xlarge | batch size 4** | ||
| Benchmark |Model |Concurrency |Requests |TS failed requests |TS throughput |TS latency P50 |TS latency P90 |TS latency P99 |TS latency mean |TS error rate |Model_p50 |Model_p90 |Model_p99 |predict_mean |handler_time_mean |waiting_time_mean |worker_thread_mean | | ||
|--- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | ||
| AB | vgg11 | 100 | 1000 | 0 | 5.50 | 18055 | 19159 | 19844 | 18171.083 | 0.0 | 2230.75 | 2316.4 | 2316.4 | 2846.7 | 2846.34 | 14550.68 | 51.29 | | ||
|
||
**vgg11 | scripted_mode | c5.18xlarge | batch size 3** | ||
| Benchmark |Model |Concurrency |Requests |TS failed requests |TS throughput |TS latency P50 |TS latency P90 |TS latency P99 |TS latency mean |TS error rate |Model_p50 |Model_p90 |Model_p99 |predict_mean |handler_time_mean |waiting_time_mean |worker_thread_mean | | ||
|--- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | ||
| AB | vgg11 | 100 | 1000 | 1 | 4.51 | 22138 | 23074 | 23792 | 22165.721 | 0.1 | 1804.03 | 2160.02 | 2160.02 | 2597.17 | 2597.01 | 18563.88 | 50.2 | | ||
|
||
**vgg11 | scripted_mode | c5.18xlarge | batch size 2** | ||
| Benchmark |Model |Concurrency |Requests |TS failed requests |TS throughput |TS latency P50 |TS latency P90 |TS latency P99 |TS latency mean |TS error rate |Model_p50 |Model_p90 |Model_p99 |predict_mean |handler_time_mean |waiting_time_mean |worker_thread_mean | | ||
|--- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | ||
| AB | vgg11 | 100 | 1000 | 0 | 3.47 | 28765 | 29849 | 30488 | 28781.227 | 0.0 | 1576.24 | 1758.28 | 1758.28 | 2249.52 | 2249.34 | 25210.43 | 46.77 | | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
pytest | ||
pyyaml | ||
pytest-xdist | ||
boto3 | ||
pytest-timeout | ||
pytest-rerunfailures | ||
retrying | ||
fabric2 | ||
black | ||
gitpython | ||
docker | ||
pandas | ||
matplotlib |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
import os | ||
import random | ||
import sys | ||
import logging | ||
import re | ||
import uuid | ||
|
||
import boto3 | ||
import pytest | ||
|
||
from invoke import run | ||
from invoke.context import Context | ||
|
||
LOGGER = logging.getLogger(__name__) | ||
LOGGER.setLevel(logging.DEBUG) | ||
LOGGER.addHandler(logging.StreamHandler(sys.stdout)) | ||
|
||
|
||
def main(): | ||
# Run this script from the root directory 'serve', it changes directory below as required | ||
os.chdir(os.path.join(os.getcwd(), "test", "benchmark")) | ||
|
||
execution_id = f"ts-benchmark-run-{str(uuid.uuid4())}" | ||
|
||
test_path = os.path.join(os.getcwd(), "tests") | ||
LOGGER.info(f"Running tests from directory: {test_path}") | ||
|
||
pytest_args = ["-s", "-rA", test_path, "-n=4", "--disable-warnings", "-v", "--execution-id", execution_id] | ||
|
||
LOGGER.info(f"Running pytest") | ||
|
||
pytest.main(pytest_args) | ||
|
||
|
||
if __name__ == "__main__": | ||
main() |
Oops, something went wrong.