MultiETSC implements Automated Machine Learing (AutoML) for Early Time Series Classification (ETSC). This is done by simultaneously optimizing for both earliness and accuracy using the multi-objective algorithm configurator MO-ParamILS. The search space of this optimization consists of the set of ETSC algorithms and their hyper-parameters.
MultiETSC produces a set of configurations (i.e.: algorithm choice and hyper-parameter setting) that optimally trade off earliness and accuracy for a specific data set. This allows you to choose the best trade-off point for the problem at hand, knowing exactly how much earliness a bit more accuracy would cost and vice versa.
For a more detailed description of MultiETSC, check out our Project Page and look for the "Papers" section.
When using our code or results, please make sure to cite our work using the following .bib
entry:
@article{OttEtAl21,
author = "Ottervanger, Gilles and Baratchi, Mitra and Hoos, Holger H.",
year = "2021, to appear",
pages = "(25 manuscript pages)",
title = "{MultiETSC}: Automated Machine Learning for Early Time Series Classification",
journal = "Accepted for publication in Data Mining and Knowledge Discovery",
publisher = "Springer",
tags = "Early Classification, Time Series Classification, Automated Machine Learning"
}
The following ETSC algorithms are included:
MultiETSC is built upon a set of original implementations of ETSC algorithms as well as algorithm configurators, each algorithm and configurator having its own dependencies. In order to set up all dependencies of MultiETSC run the following command:
$ make build
This script might instruct to install required software or packages.
Included is the main script that can be used to find optimal algorithm configurations for a specific dataset. MultiETSC uses the training set for algorithm configuration and can provide test performance on a specified test set. MultiETSC is designed to use a 5 fold crossvalidation protocol for the algorithm configuration phase, which requires the training set to include at least 5 examples of each class. MultiETSC, having been developed with the UCR Archive in mind, will be able to run on by far the most UCR datasets out of the box.
We have included a few UCR datasets for testing which can be used to run a simple instance of MultiETSC.
The following command will run on the Coffee
dataset using 60 seconds for algorithm configuration with
a max running time per algorithm of 1 second.
Note that this is meant as a very short example, in our experiments we used 7200s configurator time
with a cutoff of 180s which might still be considered as little time.
$ MultiETSC/main --dataset test/data/Coffee_TRAIN.tsv --test test/data/Coffee_TEST.tsv --timeout 60 --cutoff 1
This command, after multiple lines of progress output, returns the following result:
Running test evaluation:
Result: status, time, [earliness, error rate], 0, configuration
Result: SUCCESS, 0.0142195, [0, 0.464286], 0, -algorithm 'fixed/run.py' -percLen '0.0'
Result: SUCCESS, 0.010000, [0.719905, 0.000000], 0, -algorithm 'ECTS/bin/ects' -min_support '0.0' -version 'loose'
Result: SUCCESS, 0.119657, [0.101399, 0.107143], 0, -algorithm 'fixed/run.py' -percLen '0.1'
Result: SUCCESS, 0.119354, [0.178322, 0.178571], 0, -algorithm 'fixed/run.py' -percLen '0.18'
What can be seen here is the test evaluation of the four selected algorithm combinations. Note that, while all four are non-dominated on the validation data, some might be dominated when evaluated on the test data. The fourth configuration is an example of this.