Merge pull request huawei-noah#41 from huawei-noah/random_decompositions

Added RDUCB code
lezule · May 24, 2023 · e164b78 · e164b78
2 parents f050865 + 282efef
commit e164b78
Show file tree

Hide file tree

Showing 122 changed files with 15,170 additions and 1 deletion.
diff --git a/RDUCB/LICENSE b/RDUCB/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2020 Eric Han
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/RDUCB/MLproject b/RDUCB/MLproject
@@ -0,0 +1,36 @@
+name: HDBO
+
+conda_env: hdbo.yml
+
+entry_points:
+  main:
+    parameters:
+      param_file: {type: path, default: config/default.yml}
+    command: "LOGGING_TYPE=local CUDA_VISIBLE_DEVICES=-1 nice -1 python hdbo/main.py {param_file}"
+  start:
+    parameters:
+      param_file: {type: path}
+    command: "LOGGING_TYPE=server CUDA_VISIBLE_DEVICES=-1 python hdbo/main.py {param_file} /dev/shm"
+  stop:
+    parameters:
+      exe_hash: {type: str}
+    command: "kill `cat /tmp/mlflow-pid/{exe_hash}.pid`"
+  stop_all:
+    command: "for p in /tmp/mlflow-pid/*.pid; do kill `cat $p`; done"
+  status:
+    command: "ls /tmp/mlflow-pid"
+  test:
+    command: "mlflow run test"
+  doc:
+    command: "mlflow run doc"
+  generate:
+    command: "python hdbo/generation_script.py"
+  profile:
+    parameters:
+      param_file: {type: path, default: config/default.yml}
+      profile_file: {type: string, default: program.profile}
+    command: "LOGGING_TYPE=local python -m cProfile -o {profile_file} hdbo/main.py {param_file}"
+  view-profile:
+    parameters:
+      profile_file: {type: string, default: program.profile}
+    command: "snakeviz {profile_file}"
diff --git a/RDUCB/README.md b/RDUCB/README.md
@@ -0,0 +1,73 @@
+# Are Random Decomositions all we need in High Dimensional Bayesian Optimisation
+<p float="center">
+  <img src="./RDUCB/figures/Ablation_plot.pdf" width="400" />
+  <img src="./RDUCB/figures/Adversarial_function.pdf" width="400" /> 
+</p>
+
+This repository accompanies a [ICML 2023 publication](https://arxiv.org/pdf/2301.12844.pdf) by Juliusz Ziomek and Haitham Bou-Ammar.
+The repository is largely based on code from [High-Dimensional Bayesian Optimization via Tree-Structured Additive Models ](https://github.com/eric-vader/HD-BO-Additive-Models), as such the code in this repository is released under the original MIT license (in the LICENSE file) giving copyright to Eric Han, except for the parts that have been added or substantially modified, which are released under MIT licence giving copyrights to Huawei Technologies Co., Ltd. Such parts are clearly marked in code by comments.
+
+## Acknowledgements
+
+1. The code here is largely derived from the [High-Dimensional Bayesian Optimization via Tree-Structured Additive Models ](https://github.com/eric-vader/HD-BO-Additive-Models) repository by Eric Han, Ishank Arora and Jonathan Scarlett.
+2. The code in this repository is derived from the code base from [High-Dimensional Bayesian Optimization via Additive Models with Overlapping Groups](https://arxiv.org/pdf/1802.07028.pdf), supplied by Paul Rolland.
+3. The code included in hdbo/febo is taken from [LineBO](https://github.com/kirschnj/LineBO). The paper accompanying the code is [Adaptive and Safe Bayesian Optimization in High Dimensions via One-Dimensional Subspaces](https://arxiv.org/abs/1902.03229).
+4. The NAS-Bench-101 datasets included in `data/fcnet` is taken from [nas_benchmarks](https://github.com/automl/nas_benchmarks). The paper accompanying the code is [NAS-Bench-101: Towards Reproducible Neural Architecture Search](https://arxiv.org/pdf/1902.09635.pdf).
+5. The lpsolve datasets included in `data/mps` is taken from the benchmark dataset in [MIPLIB 2017](https://miplib.zib.de/download.html). 
+
+## Installation
+
+We implemented all algorithms in Python 3.8.3. 
+
+Minimum System requirements:
+
+* `Linux 5.4.12-100`
+* 5GB of space (Because of NAS-Bench-101 and lpsolve, if not just 100MB)
+
+Prepare your environment:
+
+1. If you don't have it already, [Install Conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html)
+2. [Install MLflow](https://mlflow.org/) either on your system or in the base environment of Conda - `pip install mlflow`
+3. Build and test environment by using the command `mlflow run .` (this may take a while).
+
+Optional steps:
+1. (Optional) Run `bash data/setup.sh` to download data from [NAS Benchmarks](https://github.com/automl/nas_benchmarks) and  into your home directory. You may skip this step if you are not running lpsolve and NAS-Bench-101 datasets.
+2. (Optional) Run `pip install -U pandas matplotlib seaborn` in your base environment for plotting the results.  You may skip this step if you do not need the plotting script.
+
+## Running experiments from Paper
+
+You may run the configurations by using the command to specify the configuration, for example to run RDUCB on LassoBench:
+
+```
+mlflow run . -P param_file=config/LassoBench/rducb.yml
+```
+
+All the parameters of the run should be specified in the `.yml` file. However, to facilitate running batch jobs, one can include command line arguments to override the seed or subproblem. For example:
+
+```
+mlflow run . -P param_file=config/LassoBench/rducb.yml -P seed=1 -P sub_benchmark="pick_data:diabetes"
+```
+will run the experiment as specified in `config/LassoBench/rducb.yml` but will overwrite the seed to 1 and pick_data to diabetes. See an example of workflow in `example.sh`
+
+
+## Visualising the results
+In order to visualize the experiment, you can run `mlflow ui` and click on the experiments to visualize the metrics.
+
+Alternatively to include multiple baselines in a plot and to produce confidence intervals, you can use the provided `plot.py` script. See `example.sh` for an example of a work flow running and plotting multiple runs.
+
+## Troubleshooting
+If you have problems with the SSL cetificate, a quick fix is to comment line 39 in hdbo.yml, which will skip the installation of LassoBench. If you want to use this benchmark, you can install LassoBench manually. To do that, first activate your mflow environment (you can find it on the list of all conda environments by typing `conda info --env`). Then run:
+```
+git clone https://github.com/ksehic/LassoBench.git
+cd LassoBench/
+```
+Next comment line 7 in setup.py (in the LassoBech folder) and run:
+```
+wget https://github.com/QB3/sparse-ho/archive/master.zip
+unzip master.zip
+cd sparse-ho-master
+python3 -m pip install -e.
+cd ..
+python3 -m pip install -e.
+```
+Everything should now be installed.
diff --git a/RDUCB/config/LPSolve/rducb.yml b/RDUCB/config/LPSolve/rducb.yml
@@ -0,0 +1,27 @@
+algorithm_type:
+  Algorithm:
+    algorithm: RDUCB
+    algorithm_random_seed: 0
+    eps: -1
+    exploration_weight: 'lambda t: 0.5 * np.log(2*t)'
+    graphSamplingNumIter: 100
+    initial_kernel_params:
+      lengthscale: 0.1
+      variance: 0.5
+    learnDependencyStructureRate: 1
+    lengthscaleNumIter: 2
+    max_eval: -4
+    noise_var: 0.1
+    param_n_iter: 16
+    size_of_random_graph: 0.2
+data_type:
+  LPSolve:
+    infinite: 1.0e+30
+    max_floor: 500.0
+    mps_filename: mtest4ma
+    problem_type: MpsLoader
+    time_limit: 5
+    # input the home directory of the LPSolve data if it is not at root
+    #home_dir:  
+n_iter: 200
+n_rand: 10
diff --git a/RDUCB/config/LPSolve/rembo.yml b/RDUCB/config/LPSolve/rembo.yml
@@ -0,0 +1,26 @@
+algorithm_type:
+  Algorithm:
+    algorithm: Rembo
+    algorithm_random_seed: 0
+    eps: -1
+    exploration_weight: 'lambda t: 0.5 * np.log(2*t)'
+    graphSamplingNumIter: 100
+    initial_kernel_params:
+      lengthscale: 0.1
+      variance: 0.5
+    learnDependencyStructureRate: 15
+    lengthscaleNumIter: 2
+    max_eval: -4
+    noise_var: 0.1
+    param_n_iter: 16
+data_type:
+  LPSolve:
+    infinite: 1.0e+30
+    max_floor: 500.0
+    mps_filename: mtest4ma
+    problem_type: MpsLoader
+    time_limit: 5
+    # input the home directory of the LPSolve data if it is not at root
+    #home_dir:
+n_iter: 200
+n_rand: 10
diff --git a/RDUCB/config/LPSolve/tree.yml b/RDUCB/config/LPSolve/tree.yml
@@ -0,0 +1,26 @@
+algorithm_type:
+  Algorithm:
+    algorithm: Tree
+    algorithm_random_seed: 0
+    eps: -1
+    exploration_weight: 'lambda t: 0.5 * np.log(2*t)'
+    graphSamplingNumIter: 100
+    initial_kernel_params:
+      lengthscale: 0.1
+      variance: 0.5
+    learnDependencyStructureRate: 15
+    lengthscaleNumIter: 2
+    max_eval: -4
+    noise_var: 0.1
+    param_n_iter: 16
+data_type:
+  LPSolve:
+    infinite: 1.0e+30
+    max_floor: 500.0
+    mps_filename: mtest4ma
+    problem_type: MpsLoader
+    time_limit: 5
+    # input the home directory of the LPSolve data if it is not at root
+    #home_dir:
+n_iter: 200
+n_rand: 10
diff --git a/RDUCB/config/LassoBench/rducb.yml b/RDUCB/config/LassoBench/rducb.yml
@@ -0,0 +1,24 @@
+algorithm_type:
+  Algorithm:
+    algorithm: RDUCB
+    algorithm_random_seed: 2
+    eps: -1
+    exploration_weight: 'lambda t: 0.5 * np.log(2*t)'
+    graphSamplingNumIter: 100
+    initial_kernel_params:
+      lengthscale: 0.1
+      variance: 0.5
+    learnDependencyStructureRate: 1
+    lengthscaleNumIter: 2
+    max_eval: -4
+    noise_var: 0.1
+    param_n_iter: 16
+    size_of_random_graph: 0.2
+data_type:
+  LassoBenchlib:
+    problem_type: LassoBenchLoader
+    fidelity: 4
+    pick_data: 'dna'
+    grid_size: 1000
+n_iter: 1000
+n_rand: 10
diff --git a/RDUCB/config/LassoBench/rembo.yml b/RDUCB/config/LassoBench/rembo.yml
@@ -0,0 +1,23 @@
+algorithm_type:
+  Algorithm:
+    algorithm: Rembo
+    algorithm_random_seed: 2
+    eps: -1
+    exploration_weight: 'lambda t: 0.5 * np.log(2*t)'
+    graphSamplingNumIter: 100
+    initial_kernel_params:
+      lengthscale: 0.1
+      variance: 0.5
+    learnDependencyStructureRate: 15
+    lengthscaleNumIter: 2
+    max_eval: -4
+    noise_var: 0.1
+    param_n_iter: 16
+data_type:
+  LassoBenchlib:
+    problem_type: LassoBenchLoader
+    fidelity: 4
+    pick_data: 'dna'
+    grid_size: 1000
+n_iter: 1000
+n_rand: 10
diff --git a/RDUCB/config/LassoBench/tree.yml b/RDUCB/config/LassoBench/tree.yml
@@ -0,0 +1,23 @@
+algorithm_type:
+  Algorithm:
+    algorithm: Tree
+    algorithm_random_seed: 2
+    eps: -1
+    exploration_weight: 'lambda t: 0.5 * np.log(2*t)'
+    graphSamplingNumIter: 100
+    initial_kernel_params:
+      lengthscale: 0.1
+      variance: 0.5
+    learnDependencyStructureRate: 15
+    lengthscaleNumIter: 2
+    max_eval: -4
+    noise_var: 0.1
+    param_n_iter: 16
+data_type:
+  LassoBenchlib:
+    problem_type: LassoBenchLoader
+    fidelity: 4
+    pick_data: 'dna'
+    grid_size: 1000
+n_iter: 1000
+n_rand: 10
diff --git a/RDUCB/config/NAS/random.yml b/RDUCB/config/NAS/random.yml
@@ -0,0 +1,24 @@
+algorithm_type:
+  Algorithm:
+    algorithm: Random
+    algorithm_random_seed: 1
+    eps: -1
+    exploration_weight: 'lambda t: 0.5 * np.log(2*t)'
+    graphSamplingNumIter: 100
+    initial_kernel_params:
+      lengthscale: 0.1
+      variance: 0.5
+    learnDependencyStructureRate: -1
+    lengthscaleNumIter: 2
+    max_eval: -4
+    noise_var: 0.1
+    param_n_iter: 16
+data_type:
+  NAS:
+    bench_type: FcnetLoader
+    data_random_seed: 3
+    fcnet_filename: protein_structure
+    # input the home directory of the NAS data if it is not at root
+    #home_dir:
+n_iter: 200
+n_rand: 10
diff --git a/RDUCB/config/NAS/rducb.yml b/RDUCB/config/NAS/rducb.yml
@@ -0,0 +1,25 @@
+algorithm_type:
+  Algorithm:
+    algorithm: RDUCB
+    algorithm_random_seed: 1
+    eps: -1
+    exploration_weight: 'lambda t: 0.5 * np.log(2*t)'
+    graphSamplingNumIter: 100
+    initial_kernel_params:
+      lengthscale: 0.1
+      variance: 0.5
+    learnDependencyStructureRate: 1
+    lengthscaleNumIter: 2
+    max_eval: -4
+    noise_var: 0.1
+    param_n_iter: 16
+    size_of_random_graph: 0.2
+data_type:
+  NAS:
+    bench_type: FcnetLoader
+    data_random_seed: 3
+    fcnet_filename: slice_localization
+    # input the home directory of the NAS data if it is not at root
+    #home_dir:
+n_iter: 200
+n_rand: 10
diff --git a/RDUCB/config/NAS/tree.yml b/RDUCB/config/NAS/tree.yml
@@ -0,0 +1,24 @@
+algorithm_type:
+  Algorithm:
+    algorithm: Tree
+    algorithm_random_seed: 1
+    eps: -1
+    exploration_weight: 'lambda t: 0.5 * np.log(2*t)'
+    graphSamplingNumIter: 100
+    initial_kernel_params:
+      lengthscale: 0.1
+      variance: 0.5
+    learnDependencyStructureRate: 15
+    lengthscaleNumIter: 2
+    max_eval: -4
+    noise_var: 0.1
+    param_n_iter: 16
+data_type:
+  NAS:
+    bench_type: FcnetLoader
+    data_random_seed: 3
+    fcnet_filename: protein_structure
+    # input the home directory of the NAS data if it is not at root
+    #home_dir:
+n_iter: 200
+n_rand: 10
diff --git a/RDUCB/config/default.yml b/RDUCB/config/default.yml
@@ -0,0 +1,24 @@
+algorithm_type:
+  Algorithm:
+    algorithm: Rembo
+    algorithm_random_seed: 1
+    eps: -1
+    exploration_weight: 'lambda t: 0.5 * np.log(2*t)'
+    graphSamplingNumIter: 250
+    initial_kernel_params:
+      lengthscale: 0.1
+      variance: 0.5
+    learnDependencyStructureRate: 15
+    lengthscaleNumIter: 2
+    max_eval: -4
+    noise_var: 0.01
+    param_n_iter: 16
+data_type:
+  Hpolib:
+    aug_dimension: 14
+    data_random_seed: 2
+    fn_noise_var: 0
+    grid_size: 150
+    hpo_fn: Hartmann6Aug
+n_iter: 10
+n_rand: 10