ytopt
is a machine learning-based autotuning software package that uses Bayesian Optimization to find the best input parameter configurations for a given kernel, miniapp, or application with the best system configurations for a given HPC system.
ytopt
accepts as input:
- A code-evaluation wrapper for performance measurement
- Tunable system parameters
- The corresponding parameter search space
By sampling and evaluating a small number of input configurations, ytopt
gradually builds a surrogate model of the input-output space. This process continues until the user-specified time or the maximum number of evaluations is reached.
ytopt
handles both unconstrained and constrained optimization problems, searches asynchronously, and can look-ahead on iterations to more effectively adapt to new evaluations and adjust the search towards promising configurations, leading to a more efficient and faster convergence on the best solutions.
Internally, ytopt
uses a manager-worker computational paradigm, where one node fits the surrogate model and generates new input configurations, and other nodes perform the computationally expensive evaluations and return the results to the manager node.
Additional documentation is available on Read the Docs.
ytopt
requires the following components: ConfigSpace
, CConfigSpace (optional), dh-scikit-optimize
, and autotune
.
- We recommend creating isolated Python environments on your local machine using conda, for example:
conda create --name ytune python=3.10
conda activate ytune
- Create a directory for
ytopt
:
mkdir ytopt
cd ytopt
- Install ConfigSpace:
git clone https://github.com/ytopt-team/ConfigSpace.git
cd ConfigSpace
pip install -e .
cd ..
- Install dh-scikit-optimize:
git clone https://github.com/ytopt-team/scikit-optimize.git
cd scikit-optimize
pip install -e .
cd ..
- Install autotune:
git clone -b version1 https://github.com/ytopt-team/autotune.git
cd autotune
pip install -e .
cd ..
- Install ytopt:
git clone -b main https://github.com/ytopt-team/ytopt.git
cd ytopt
pip install -e .
After installing ConfigSpace, Scikit-optimize, autotune, and ytopt successfully, the autotuning framework ytopt is ready to use.
- If needed, downgrade the
protobuf
package to 3.20.x or lower
pip install protobuf==3.20
- If needed, install packaging
pip install packaging
- If needed, uninstall scikit-optimize to prevent import confusion with dh-scikit-optimize
pip uninstall scikit-optimize
- If you encounter installation error about the package grpcio (1.51.1), just install its old version, it should work.
pip install grpcio==1.43.0
- If you encounter installation errors, install psutil, setproctitle, mpich, mpi4py first as follows:
conda install -c conda-forge psutil
conda install -c conda-forge setproctitle
conda install -c conda-forge mpich
conda install -c conda-forge mpi4py
pip install -e .
-
[Optional] Install CConfigSpace:
- Prerequisites:
autotools
andgsl
-
Ubuntu
sudo apt-get install autoconf automake libtool libgsl-dev
-
MacOS
brew install autoconf automake libtool gsl
-
- Build and Install the library and python bindings:
the
configure
command can take an optional--prefix=
parameter to specify a different install path than the default one (/usr/local
). Depending on the chosen location you may need elevated previleges to runmake install
.git clone [email protected]:argonne-lcf/CCS.git cd CCS ./autogen.sh mkdir build cd build ../configure make make install cd ../bindings/python pip install parglare==0.12.0 pip install -e .
- Setup environment:
in order for the python binding to find the CConfigSpace library, the path to
the library install location (
/usr/local/lib
by default) must be appended to theLD_LIBRARY_PATH
environment variable on Linux, while on MacOS theDYLD_LIBRARY_PATH
environment variable serves the same purpose. Alternatively theLIBCCONFIGSPACE_SO_
environment variable can be made to point to the installedlibcconfigspace.so
file on Linux or to the installedlibcconfigspace.dylib
on MacOS.
- Prerequisites:
-
[Optional] Install Online tuning:
- Online tuning with transfer learning interface is built on Synthetic Data Vault (SDV):
- Install SDV:
cd ytopt pip install -e .[online]
- For macOS it may need to do:
pip install -e ".[online]"
docs/
Sphinx documentation files
test/
scipts for running benchmark problems in the problems directory
ytopt/
scripts that contain the search implementations
ytopt/benchmark/
a set of problems the user can use to compare our different search algorithms or as examples to build their own problems
ytopt
is typically run from the command-line in the following example manner:
python -m ytopt.search.ambs --evaluator ray --problem problem.Problem --max-evals=10 --learner RF
Where:
- The search variant is one of
ambs
(Asynchronous Model-Based Search) orasync_search
(run as an MPI process). - The evaluator is the method of concurrent evaluations, and can be
ray
orsubprocess
. - The problem is typically an
autotune.TuningProblem
instance. Specify the module path and instance name. --max-evals
is self explanatory.
Depending on the search variant chosen, other command-line options may be provided. For example, the ytopt.search.ambs
search
method above was further customized by specifying the RF
learning strategy.
See the autotune
docs for basic information on getting started with creating a TuningProblem
instance.
See the ConfigSpace
docs for guidance on defining input/output parameter spaces for problems.
Otherwise, browse the ytopt/benchmark
directory for an extensive collection of examples.
- Autotuning the block matrix multiplication
- Autotuning the OpenMP version of XSBench
- Autotuning the OpenMP version of XSBench with constraints
- Autotuning the hybrid MPI/OpenMP version of XSBench
- Autotuning the hybrid MPI/OpenMP version of XSBench with constraints
- Autotuning the OpenMP version of convolution-2d with constraints
- (Optinal) Autotuning the OpenMP version of XSBench online
The core ytopt
team is at Argonne National Laboratory:
- Prasanna Balaprakash [email protected]
- Romain Egele [email protected]
- Paul Hovland [email protected]
- Xingfu Wu [email protected]
- Jaehoon Koo [email protected]
- Brice Videau [email protected]
The convolution-2d tutorial (source and python scripts) is contributed by:
- David Fridlander [email protected]
- T. Randall, J. Koo, B. Videau, M. Kruse, X. Wu, P. Hovland, M. Hall, R. Ge, and P. Balaprakash. "Transfer-Learning-Based Autotuning Using Gaussian Copula". In 2023 International Conference on Supercomputing (ICS ’23), June 21–23, 2023, Orlando, FL, USA. ACM, New York, NY, USA, 13 pages. https://doi.org/10.1145/3577193.3593712.
- X. Wu, P. Balaprakash, M. Kruse, J. Koo, B. Videau, P. Hovland, V. Taylor, B. Geltz, S. Jana, and M. Hall, "ytopt: Autotuning Scientific Applications for Energy Efficiency at Large Scales", Cray User Group Conference 2023 (CUG’23), Helsinki, Finland, May 7-11, 2023. DOI: 10.48550/arXiv.2303.16245
- X. Wu, M. Kruse, P. Balaprakash, H. Finkel, P. Hovland, V. Taylor, and M. Hall, "Autotuning PolyBench benchmarks with LLVM Clang/Polly loop optimization pragmas using Bayesian optimization (extended version)," Concurrency and Computation. Practice and Experience, Volume 34, Issue 20, 2022. ISSN 1532-0626 DOI: 10.1002/cpe.6683
- J. Koo, P. Balaprakash, M. Kruse, X. Wu, P. Hovland, and M. Hall, "Customized Monte Carlo Tree Search for LLVM/Polly's Composable Loop Optimization Transformations," in Proceedings of 12th IEEE International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS21), pages 82–93, 2021. DOI: 10.1109/PMBS54543.2021.00015
- X. Wu, M. Kruse, P. Balaprakash, H. Finkel, P. Hovland, V. Taylor, and M. Hall, "Autotuning PolyBench Benchmarks with LLVM Clang/Polly Loop Optimization Pragmas Using Bayesian Optimization," in Proceedings of 11th IEEE International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS20), pages 61–70, 2020. DOI: 10.1109/PMBS51919.2020.00012
- P. Balaprakash, J. Dongarra, T. Gamblin, M. Hall, J. K. Hollingsworth, B. Norris, and R. Vuduc, "Autotuning in High-Performance Computing Applications," Proceedings of the IEEE, vol. 106, no. 11, 2018. DOI: 10.1109/JPROC.2018.2841200
- T. Nelson, A. Rivera, P. Balaprakash, M. Hall, P. Hovland, E. Jessup, and B. Norris, "Generating efficient tensor contractions for GPUs," in Proceedings of 44th International Conference on Parallel Processing, pages 969–978, 2015. DOI: 10.1109/ICPP.2015.106
- PROTEAS-TUNE, U.S. Department of Energy ASCR Exascale Computing Project (2018--Present)
- YTune: Autotuning Compiler Technology for Cross-Architecture Transformation and Code Generation, U.S. Department of Energy Exascale Computing Project (2016--2018)
- Scalable Data-Efficient Learning for Scientific Domains, U.S. Department of Energy 2018 Early Career Award funded by the Advanced Scientific Computing Research program within the DOE Office of Science (2018--Present)