Implementation of ALOE: Learning Discrete Energy-based Models via Auxiliary-variable Local Exploration
This code implements ALOE. The following demonstration shows how to perform training and inference.
Navigate to the root of project, and perform:
pip install -r requirements.txt
pip install -e .
All the data and pretrained model dumps can be downloaded via the following link:
https://console.cloud.google.com/storage/browser/research_public_share/aloe_neurips2020
Please place the downloaded folder into gcloud
folder, and organize the folders as:
aloe/
|___aloe/ # source code
| |___common # common implementations
| |___...
|
|___setup.py
|
|___gcloud/
| |___data/ # data
| |___results/ # model dumps
|
|...
First nagivate to the synthetic experiment folder, then run the training script
cd aloe/synthetic
./run_varlen.sh
You can also modify the above script with different dataset names, by changing data=
field.
After training, you can draw samples from the sampler, as well as visualize the learned heat map with the following command (suppose you have trained for 500 epochs)
./run_varlen.sh -phase plot -epoch_load 500
You can also check the MMD on held-out set, via
./run_varlen.sh -phase check_{KERNEL} -epoch_load 500
where KERNEL
can be either linear
or hamming
.
The first step is to obtain raw seed inputs from OSS-Fuzz. Please follow the instructions there to setup the docker and obtain the corresponding LibFuzzer binary and seed inputs.
For simplicity, we show how to run our code with libpng
target.
Please create a folder with the target name under aloe/fuzz/data/fuzz-seeds
. We have included seed inputs of libpng
for convenience.
Next we run the following script to cook the raw data into binary format that can be used by ALOE:
cd aloe/fuzz
./run_binary_data.sh
It will create a cooked data folder under aloe/fuzz/data/fuzz-cooked
Use the following script for training:
cd aloe/fuzz
./run_varlen.sh
You can also modify the field data=
to train for other softwares.
Use the following script for generating inputs:
cd aloe/fuzz
./run_fuzz.sh
You can also modify the script to specify the number of files to be generated.
You need to go through the docker image built by OSS-Fuzz to evaluate the coverage of generated inputs. Please follow the guideline here: https://github.com/google/oss-fuzz
The synthetic training/validation/test datasets are provided. Please see the content under gcloud/data
folder.
We have provided the model dumps of the learned sampler. Please run
./evaluate_editor.sh
You can also modify the eval_method
arg to use different beam sizes, or eval_topk
arg to evaluate different top-k accuracies.
If you want to train the sampler from scratch, please first run the supervised learning to pretrain the initial sampler q0:
cd aloe/rfill
./run_supervised.sh
You can stop the script when the performance gets plateau.
Move the pretrained models (encoder-xxx.ckpt, decoder-xxx.ckpt) into the result folder of editor. Rename the epoch index to -1, i.e., encoder--1.ckpt
, decoder--1.ckpt
.
Then train the editor in the following way:
./run_editor.sh
It will load the pretrained q0 and train the editor part.