Skip to content

Commit

Permalink
Merge pull request huawei-noah#41 from huawei-noah/random_decompositions
Browse files Browse the repository at this point in the history
Added RDUCB code
  • Loading branch information
AntGro authored May 24, 2023
2 parents f050865 + 282efef commit e164b78
Show file tree
Hide file tree
Showing 122 changed files with 15,170 additions and 1 deletion.
21 changes: 21 additions & 0 deletions RDUCB/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2020 Eric Han

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
36 changes: 36 additions & 0 deletions RDUCB/MLproject
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
name: HDBO

conda_env: hdbo.yml

entry_points:
main:
parameters:
param_file: {type: path, default: config/default.yml}
command: "LOGGING_TYPE=local CUDA_VISIBLE_DEVICES=-1 nice -1 python hdbo/main.py {param_file}"
start:
parameters:
param_file: {type: path}
command: "LOGGING_TYPE=server CUDA_VISIBLE_DEVICES=-1 python hdbo/main.py {param_file} /dev/shm"
stop:
parameters:
exe_hash: {type: str}
command: "kill `cat /tmp/mlflow-pid/{exe_hash}.pid`"
stop_all:
command: "for p in /tmp/mlflow-pid/*.pid; do kill `cat $p`; done"
status:
command: "ls /tmp/mlflow-pid"
test:
command: "mlflow run test"
doc:
command: "mlflow run doc"
generate:
command: "python hdbo/generation_script.py"
profile:
parameters:
param_file: {type: path, default: config/default.yml}
profile_file: {type: string, default: program.profile}
command: "LOGGING_TYPE=local python -m cProfile -o {profile_file} hdbo/main.py {param_file}"
view-profile:
parameters:
profile_file: {type: string, default: program.profile}
command: "snakeviz {profile_file}"
73 changes: 73 additions & 0 deletions RDUCB/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# Are Random Decomositions all we need in High Dimensional Bayesian Optimisation
<p float="center">
<img src="./RDUCB/figures/Ablation_plot.pdf" width="400" />
<img src="./RDUCB/figures/Adversarial_function.pdf" width="400" />
</p>

This repository accompanies a [ICML 2023 publication](https://arxiv.org/pdf/2301.12844.pdf) by Juliusz Ziomek and Haitham Bou-Ammar.
The repository is largely based on code from [High-Dimensional Bayesian Optimization via Tree-Structured Additive Models ](https://github.com/eric-vader/HD-BO-Additive-Models), as such the code in this repository is released under the original MIT license (in the LICENSE file) giving copyright to Eric Han, except for the parts that have been added or substantially modified, which are released under MIT licence giving copyrights to Huawei Technologies Co., Ltd. Such parts are clearly marked in code by comments.

## Acknowledgements

1. The code here is largely derived from the [High-Dimensional Bayesian Optimization via Tree-Structured Additive Models ](https://github.com/eric-vader/HD-BO-Additive-Models) repository by Eric Han, Ishank Arora and Jonathan Scarlett.
2. The code in this repository is derived from the code base from [High-Dimensional Bayesian Optimization via Additive Models with Overlapping Groups](https://arxiv.org/pdf/1802.07028.pdf), supplied by Paul Rolland.
3. The code included in hdbo/febo is taken from [LineBO](https://github.com/kirschnj/LineBO). The paper accompanying the code is [Adaptive and Safe Bayesian Optimization in High Dimensions via One-Dimensional Subspaces](https://arxiv.org/abs/1902.03229).
4. The NAS-Bench-101 datasets included in `data/fcnet` is taken from [nas_benchmarks](https://github.com/automl/nas_benchmarks). The paper accompanying the code is [NAS-Bench-101: Towards Reproducible Neural Architecture Search](https://arxiv.org/pdf/1902.09635.pdf).
5. The lpsolve datasets included in `data/mps` is taken from the benchmark dataset in [MIPLIB 2017](https://miplib.zib.de/download.html).

## Installation

We implemented all algorithms in Python 3.8.3.

Minimum System requirements:

* `Linux 5.4.12-100`
* 5GB of space (Because of NAS-Bench-101 and lpsolve, if not just 100MB)

Prepare your environment:

1. If you don't have it already, [Install Conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html)
2. [Install MLflow](https://mlflow.org/) either on your system or in the base environment of Conda - `pip install mlflow`
3. Build and test environment by using the command `mlflow run .` (this may take a while).

Optional steps:
1. (Optional) Run `bash data/setup.sh` to download data from [NAS Benchmarks](https://github.com/automl/nas_benchmarks) and into your home directory. You may skip this step if you are not running lpsolve and NAS-Bench-101 datasets.
2. (Optional) Run `pip install -U pandas matplotlib seaborn` in your base environment for plotting the results. You may skip this step if you do not need the plotting script.

## Running experiments from Paper

You may run the configurations by using the command to specify the configuration, for example to run RDUCB on LassoBench:

```
mlflow run . -P param_file=config/LassoBench/rducb.yml
```

All the parameters of the run should be specified in the `.yml` file. However, to facilitate running batch jobs, one can include command line arguments to override the seed or subproblem. For example:

```
mlflow run . -P param_file=config/LassoBench/rducb.yml -P seed=1 -P sub_benchmark="pick_data:diabetes"
```
will run the experiment as specified in `config/LassoBench/rducb.yml` but will overwrite the seed to 1 and pick_data to diabetes. See an example of workflow in `example.sh`


## Visualising the results
In order to visualize the experiment, you can run `mlflow ui` and click on the experiments to visualize the metrics.

Alternatively to include multiple baselines in a plot and to produce confidence intervals, you can use the provided `plot.py` script. See `example.sh` for an example of a work flow running and plotting multiple runs.

## Troubleshooting
If you have problems with the SSL cetificate, a quick fix is to comment line 39 in hdbo.yml, which will skip the installation of LassoBench. If you want to use this benchmark, you can install LassoBench manually. To do that, first activate your mflow environment (you can find it on the list of all conda environments by typing `conda info --env`). Then run:
```
git clone https://github.com/ksehic/LassoBench.git
cd LassoBench/
```
Next comment line 7 in setup.py (in the LassoBech folder) and run:
```
wget https://github.com/QB3/sparse-ho/archive/master.zip
unzip master.zip
cd sparse-ho-master
python3 -m pip install -e.
cd ..
python3 -m pip install -e.
```
Everything should now be installed.
27 changes: 27 additions & 0 deletions RDUCB/config/LPSolve/rducb.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
algorithm_type:
Algorithm:
algorithm: RDUCB
algorithm_random_seed: 0
eps: -1
exploration_weight: 'lambda t: 0.5 * np.log(2*t)'
graphSamplingNumIter: 100
initial_kernel_params:
lengthscale: 0.1
variance: 0.5
learnDependencyStructureRate: 1
lengthscaleNumIter: 2
max_eval: -4
noise_var: 0.1
param_n_iter: 16
size_of_random_graph: 0.2
data_type:
LPSolve:
infinite: 1.0e+30
max_floor: 500.0
mps_filename: mtest4ma
problem_type: MpsLoader
time_limit: 5
# input the home directory of the LPSolve data if it is not at root
#home_dir:
n_iter: 200
n_rand: 10
26 changes: 26 additions & 0 deletions RDUCB/config/LPSolve/rembo.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
algorithm_type:
Algorithm:
algorithm: Rembo
algorithm_random_seed: 0
eps: -1
exploration_weight: 'lambda t: 0.5 * np.log(2*t)'
graphSamplingNumIter: 100
initial_kernel_params:
lengthscale: 0.1
variance: 0.5
learnDependencyStructureRate: 15
lengthscaleNumIter: 2
max_eval: -4
noise_var: 0.1
param_n_iter: 16
data_type:
LPSolve:
infinite: 1.0e+30
max_floor: 500.0
mps_filename: mtest4ma
problem_type: MpsLoader
time_limit: 5
# input the home directory of the LPSolve data if it is not at root
#home_dir:
n_iter: 200
n_rand: 10
26 changes: 26 additions & 0 deletions RDUCB/config/LPSolve/tree.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
algorithm_type:
Algorithm:
algorithm: Tree
algorithm_random_seed: 0
eps: -1
exploration_weight: 'lambda t: 0.5 * np.log(2*t)'
graphSamplingNumIter: 100
initial_kernel_params:
lengthscale: 0.1
variance: 0.5
learnDependencyStructureRate: 15
lengthscaleNumIter: 2
max_eval: -4
noise_var: 0.1
param_n_iter: 16
data_type:
LPSolve:
infinite: 1.0e+30
max_floor: 500.0
mps_filename: mtest4ma
problem_type: MpsLoader
time_limit: 5
# input the home directory of the LPSolve data if it is not at root
#home_dir:
n_iter: 200
n_rand: 10
24 changes: 24 additions & 0 deletions RDUCB/config/LassoBench/rducb.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
algorithm_type:
Algorithm:
algorithm: RDUCB
algorithm_random_seed: 2
eps: -1
exploration_weight: 'lambda t: 0.5 * np.log(2*t)'
graphSamplingNumIter: 100
initial_kernel_params:
lengthscale: 0.1
variance: 0.5
learnDependencyStructureRate: 1
lengthscaleNumIter: 2
max_eval: -4
noise_var: 0.1
param_n_iter: 16
size_of_random_graph: 0.2
data_type:
LassoBenchlib:
problem_type: LassoBenchLoader
fidelity: 4
pick_data: 'dna'
grid_size: 1000
n_iter: 1000
n_rand: 10
23 changes: 23 additions & 0 deletions RDUCB/config/LassoBench/rembo.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
algorithm_type:
Algorithm:
algorithm: Rembo
algorithm_random_seed: 2
eps: -1
exploration_weight: 'lambda t: 0.5 * np.log(2*t)'
graphSamplingNumIter: 100
initial_kernel_params:
lengthscale: 0.1
variance: 0.5
learnDependencyStructureRate: 15
lengthscaleNumIter: 2
max_eval: -4
noise_var: 0.1
param_n_iter: 16
data_type:
LassoBenchlib:
problem_type: LassoBenchLoader
fidelity: 4
pick_data: 'dna'
grid_size: 1000
n_iter: 1000
n_rand: 10
23 changes: 23 additions & 0 deletions RDUCB/config/LassoBench/tree.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
algorithm_type:
Algorithm:
algorithm: Tree
algorithm_random_seed: 2
eps: -1
exploration_weight: 'lambda t: 0.5 * np.log(2*t)'
graphSamplingNumIter: 100
initial_kernel_params:
lengthscale: 0.1
variance: 0.5
learnDependencyStructureRate: 15
lengthscaleNumIter: 2
max_eval: -4
noise_var: 0.1
param_n_iter: 16
data_type:
LassoBenchlib:
problem_type: LassoBenchLoader
fidelity: 4
pick_data: 'dna'
grid_size: 1000
n_iter: 1000
n_rand: 10
24 changes: 24 additions & 0 deletions RDUCB/config/NAS/random.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
algorithm_type:
Algorithm:
algorithm: Random
algorithm_random_seed: 1
eps: -1
exploration_weight: 'lambda t: 0.5 * np.log(2*t)'
graphSamplingNumIter: 100
initial_kernel_params:
lengthscale: 0.1
variance: 0.5
learnDependencyStructureRate: -1
lengthscaleNumIter: 2
max_eval: -4
noise_var: 0.1
param_n_iter: 16
data_type:
NAS:
bench_type: FcnetLoader
data_random_seed: 3
fcnet_filename: protein_structure
# input the home directory of the NAS data if it is not at root
#home_dir:
n_iter: 200
n_rand: 10
25 changes: 25 additions & 0 deletions RDUCB/config/NAS/rducb.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
algorithm_type:
Algorithm:
algorithm: RDUCB
algorithm_random_seed: 1
eps: -1
exploration_weight: 'lambda t: 0.5 * np.log(2*t)'
graphSamplingNumIter: 100
initial_kernel_params:
lengthscale: 0.1
variance: 0.5
learnDependencyStructureRate: 1
lengthscaleNumIter: 2
max_eval: -4
noise_var: 0.1
param_n_iter: 16
size_of_random_graph: 0.2
data_type:
NAS:
bench_type: FcnetLoader
data_random_seed: 3
fcnet_filename: slice_localization
# input the home directory of the NAS data if it is not at root
#home_dir:
n_iter: 200
n_rand: 10
24 changes: 24 additions & 0 deletions RDUCB/config/NAS/tree.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
algorithm_type:
Algorithm:
algorithm: Tree
algorithm_random_seed: 1
eps: -1
exploration_weight: 'lambda t: 0.5 * np.log(2*t)'
graphSamplingNumIter: 100
initial_kernel_params:
lengthscale: 0.1
variance: 0.5
learnDependencyStructureRate: 15
lengthscaleNumIter: 2
max_eval: -4
noise_var: 0.1
param_n_iter: 16
data_type:
NAS:
bench_type: FcnetLoader
data_random_seed: 3
fcnet_filename: protein_structure
# input the home directory of the NAS data if it is not at root
#home_dir:
n_iter: 200
n_rand: 10
24 changes: 24 additions & 0 deletions RDUCB/config/default.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
algorithm_type:
Algorithm:
algorithm: Rembo
algorithm_random_seed: 1
eps: -1
exploration_weight: 'lambda t: 0.5 * np.log(2*t)'
graphSamplingNumIter: 250
initial_kernel_params:
lengthscale: 0.1
variance: 0.5
learnDependencyStructureRate: 15
lengthscaleNumIter: 2
max_eval: -4
noise_var: 0.01
param_n_iter: 16
data_type:
Hpolib:
aug_dimension: 14
data_random_seed: 2
fn_noise_var: 0
grid_size: 150
hpo_fn: Hartmann6Aug
n_iter: 10
n_rand: 10
Loading

0 comments on commit e164b78

Please sign in to comment.