benchmarks

auto-format files + maxdup check in main

Sep 30, 2023

d2bcb72 · Sep 30, 2023

Name		Name	Last commit message	Last commit date
parent directory ..
02_11_2021_18_53_25_bench_results		02_11_2021_18_53_25_bench_results
04_07_2022_02_31_09_bench_results_t4_intel_4		04_07_2022_02_31_09_bench_results_t4_intel_4
04_07_2022_02_46_36_bench_results_a10g_amd_4		04_07_2022_02_46_36_bench_results_a10g_amd_4
12_11_2021_12_47_44_bench_results		12_11_2021_12_47_44_bench_results
2022-12-18-02:01:45		2022-12-18-02:01:45
2022-12-22-20:09:12		2022-12-22-20:09:12
25_10_2021_bench_results		25_10_2021_bench_results
29_01_2022_08_11_49_bench_results		29_01_2022_08_11_49_bench_results
a10g_amd_dec_1		a10g_amd_dec_1
a10g_amd_june30		a10g_amd_june30
a10g_epyc_27_11_2021		a10g_epyc_27_11_2021
a10g_intel_dec_1		a10g_intel_dec_1
a10g_intel_interleave_8_july1		a10g_intel_interleave_8_july1
a10g_intel_july2_lz4		a10g_intel_july2_lz4
a10g_membench_july1		a10g_membench_july1
a10g_membench_july1_volatile		a10g_membench_july1_volatile
a10g_membench_july3_fixed		a10g_membench_july3_fixed
agressive_part_03_11_2021_14_43_07_bench_results		agressive_part_03_11_2021_14_43_07_bench_results
average		average
cuda-blockhash		cuda-blockhash
cuda-blur		cuda-blur
genpdf		genpdf
hello_go		hello_go
imageblur-bmp		imageblur-bmp
imageblur		imageblur
imagehash-modified		imagehash-modified
imagehash		imagehash
json-compression-lz4		json-compression-lz4
json-compression		json-compression
jstest		jstest
nlp-assemblyscript		nlp-assemblyscript
nlp-count-vectorizer		nlp-count-vectorizer
nlp-go		nlp-go
normal_part_27_10_2021_00_00_00_bench_results		normal_part_27_10_2021_00_00_00_bench_results
pbkdf2		pbkdf2
rsa-decrypt		rsa-decrypt
rsa-keygen		rsa-keygen
rust-pdfwriter		rust-pdfwriter
scrypt		scrypt
ssmconfig		ssmconfig
syscallbench		syscallbench
t4_amd_dec_1		t4_amd_dec_1
t4_amd_interleave8_july1		t4_amd_interleave8_july1
t4_amd_july2_lz4		t4_amd_july2_lz4
t4_amd_june_28_2022		t4_amd_june_28_2022
t4_intel_dec_1		t4_intel_dec_1
t4_intel_july1		t4_intel_july1
t4_intel_july2_latencytest		t4_intel_july2_latencytest
t4_july1_membench		t4_july1_membench
t4_membench_july1_volatile		t4_membench_july1_volatile
t4_membench_july3_fixed		t4_membench_july3_fixed
README.md		README.md
a10g_compile_opt.sh		a10g_compile_opt.sh
a10g_save_cached_bin.sh		a10g_save_cached_bin.sh
amd_compile_opt.sh		amd_compile_opt.sh
amd_save_cached_bin.sh		amd_save_cached_bin.sh
eval_profiler.py		eval_profiler.py
indirect-opt.txt		indirect-opt.txt
indirect.txt		indirect.txt
latency_breakdown.png		latency_breakdown.png
latency_throughput.png		latency_throughput.png
local_cached_bin.sh		local_cached_bin.sh
make_figures.py		make_figures.py
make_image.py		make_image.py
make_image_amd.py		make_image_amd.py
nvbin.backup		nvbin.backup
nvcache.backup		nvcache.backup
run_all.sh		run_all.sh
run_all_nvidia.sh		run_all_nvidia.sh
run_benchmarks_aws.py		run_benchmarks_aws.py
run_cached_bin.sh		run_cached_bin.sh
run_cuda.py		run_cuda.py
save_cached_bin.sh		save_cached_bin.sh
slowcalls-opt.txt		slowcalls-opt.txt
slowcalls.txt		slowcalls.txt
start_lz4_server.sh		start_lz4_server.sh
start_lz4_server_wasmtime.sh		start_lz4_server_wasmtime.sh
t4_compile_opt.sh		t4_compile_opt.sh
t4_save_cached_bin.sh		t4_save_cached_bin.sh

README.md

VectorVisor Evaluation

This subdirectory contains the evaluation materials for the USENIX ATC 2023 paper "VectorVisor: A Binary Translation Scheme for Throughput-Oriented GPU Acceleration".

There are two primary components to our evaluation:

VectorVisor, the vectorizing binary translator for GPUs (https://github.com/SamGinzburg/VectorVisor)
Our PGO (profile-guided optimization) instrumentation tool (https://github.com/SamGinzburg/vv-pgo-instrument)

VectorVisor can be built directly from source, although we also offer prepackaged Amazon AWS AMIs for cloud evaluation.

Building from source (local testing)

Prerequisites:

Ubuntu 18.04 LTS
CUDA 12 (NVIDIA driver version 525)
OpenCL C development headers & libraries
Stable Rust 1.6+

To confirm that the GPU driver and OpenCL setup is complete, run "clinfo" and/or "nvidia-smi", and ensure that "cargo" is in your $PATH.

Examples of how we build VectorVisor from source can be seen in the "make_image.py" script (used to generate AWS AMI images).

Configuring the cloud environment

Our evaluation is conducted primarily on AWS, using the instances specified in the paper. To replicate our results, a valid AWS account (with an associated credit card or billing) must be configured. The account must also request a quota increase for g4dn.* and g5.* class instances to at least 8 vCPUs for our benchmarks to run. A valid IAM role with permissions to invoke the AWS Systems Manager must be configured before running the benchmarks.

AWS CLI tools must be installed (https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html), and "us-east-1" should be set as the default region.

Configuring the IAM Role

We automate our evaluation using Amazon SSM (Systems Manager), which requires a custom IAM role to be configured before running. The following steps can be performed to create the role for your account:

Create a new role by navigating to the IAM AWS service

Select the correct entity type (AWS service, EC2, Systems Manager)

Enter the role name, and set the correct SSM permission (AmazonSSMManagedInstanceCore)

Create the role

After the role is created, you can select the role from the list of IAM roles on your account, and obtain the Instance profile ARN. This ARN is the one needed as a CLI argument for the scripts that create AWS AMIs and run the benchmarks.

Image Generation

We provide two pregenerated AMI images to save time (T4, A10G). The v520 AMI contains product code, which prevents us from publically sharing the generated AMI. However all of these images (including the v520) can be regenerated from the command line.

Generating T4 or A10G images (we provide these images for you):

python3 make_image.py --gpu=t4 --awsarn=<your AWS ARN you obtained previously>
python3 make_image.py --gpu=a10g --awsarn=<your AWS ARN you obtained previously>

Generating the v520 image (mandatory for complete results):

python3 make_image_amd.py --awsarn=<your AWS ARN you obtained previously>

Generating the v520 image should take <12 hours to complete. Once complete, the AMI ID can be found in the EC2 Service-->Images-->AMIs menu. The v520 AMI will have the name "vectorvisor-bench-image-amd".

Cloud Evaluation

For the USENIX ATC '23 AEC, we provide three precompiled AWS AMIs---with VectorVisor and our PGO tools preinstalled. All benchmarks are precompiled and included in the machine image as well. Precompiled NVIDIA AMIs (e.g., T4 and A10G instances) run Ubuntu 18.04 LTS and precompiled AMD AMIs use Amazon Linux.

The python library boto3 must be installed via pip3 install boto3 (used for automating AWS commands).

T4 AMI = "ami-0900b09acc9fa9fcf"

A10G AMI = "ami-0ab8694e2c508a127"

v520 AMI = "your v520 AMI image"

The throughput and throughput/$ evaluation can be reproduced by running:

./run_all.sh <AMD v520 AMI> <T4 AMI> <A10G AMI> <AWS ARN>
./run_all.sh <AMD v520 AMI> ami-0900b09acc9fa9fcf ami-0ab8694e2c508a127 <AWS ARN>

where the AMI IDs are those given above, and the AWS instance ARN corresponds to the IAM role configured previously. The output will be written to a directory with the current data as the name.

If you wish to skip generating the v520 AMI you can instead run the following command to replicate the NVIDIA results:

./run_all_nvidia.sh <T4 AMI> <T4 AMI> <A10G AMI> <AWS ARN>
./run_all_nvidia.sh ami-0900b09acc9fa9fcf ami-0900b09acc9fa9fcf ami-0ab8694e2c508a127 <AWS ARN>

The end-to-end evaluation will take ~24 hours to run to completion.

Generating figures

The make_figures.py script can be used to regenerate all figures in the paper. It requires boto3, numpy, and matplotlib to all be installed via pip3 in advance. The script takes in the directory generated by the run_all.sh script (which looks like a timestamp in the form of 2022-12-22-20:09:12). The 2022-12-22-20:09:12 directory contains the data used in the paper to generate all figures.

Example usage:

python3 make_figures.py --input=/path/to/directory/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

benchmarks

benchmarks

README.md

VectorVisor Evaluation

Building from source (local testing)

Prerequisites:

Configuring the cloud environment

Configuring the IAM Role

Image Generation

Cloud Evaluation

Generating figures

Files

benchmarks

Directory actions

More options

Directory actions

More options

Latest commit

History

benchmarks

Folders and files

parent directory

README.md

VectorVisor Evaluation

Building from source (local testing)

Prerequisites:

Configuring the cloud environment

Configuring the IAM Role

Image Generation

Cloud Evaluation

Generating figures