This repository contains code for training and evaluating ML architectures on the Compositional Freebase Questions (CFQ) dataset.
The dataset can be downloaded from the following URL:
This library requires Python3 and the following Python3 libraries:
We recommend getting pip3 and then running the following command, which will install all required libraries in one go:
sudo pip3 install absl-py tensorflow tensor2tensor
First download the CFQ dataset (link above), and ensure the dataset and the splits directory are in the same directory as this library (e.g. by unpacking the file in the library directory). In order to train and evaluate a model, run the following:
bash run_experiment.sh
This will download the dataset, run preprocessing on the dataset, and train an LSTM model with attention on the random split of the CFQ dataset, after which it will directly be evaluated.
NOTE This may take quite long and consume a lot of memory. It is tested on a machine with 6-core/12-hyperthread CPUs at 3.7Ghz, and 64Gb RAM, which took about 20 hours. Also note that this will consume roughly 20Gb of RAM during preprocessing. The run-time can be sped up significantly by running Tensorflow with GPU support.
The expected accuracy using the default setting of the script is 97.4 +/- 0.3. For the other expected accuracies of the other splits, please see the paper.
In order to run a different model or try a different split, simply modify the
parameters in run_experiment.sh
. See that file for additional details.