atoml is a software for generating smoke and metamorphic tests for machine learning software that is currently under development. The aim of atoml is to make machine learning software more robust, i.e., ensure correct working, even if the data is extreme.
Because this is an early research prototype, the code is partially very unrefined, e.g., the configuration of the MySQL database we use is hardcoded.
atoml is a command line tool that can be used with the following options.
-f,--file <arg> input file in yaml format (mandatory)
-i,--iterations <arg> number of iterations used by smoke tester
(default: 1)
-m,--features <arg> number of features for each generated test set
(default: 10)
-mysql the results are stored in a local MySQL database
if this flag is used
-n,--instances <arg> number of instances generated for each test set
(default: 100)
-t,--timeout <arg> timeout parameter for the created tests in seconds
(default: 21600)
-nomorph no metamorphic tests are generated if this flag
is used
-nosmoke no smoke tests are generated if this flag is used
A call to atoml may look like this:
java -jar atoml.jar -f testdata/descriptions.yml
Atoml can generate tests for Weka, Scikit-Learn, and Apache Spark MLlib. Other frameworks may follow (see Experimental Features). Atoml currently works with all classification and clustering algorithms defined by those frameworks.
Atoml automatically generates tests from a description of the algorithm. Algorithms are described by the machine learning framework, the package and class in which they are implemented, the features they support (e.g., double, float, categorical), a set of properties that they should fulfill, and the definition of their hyperparameters. The supported features and the properties are used to decided which test cases are generated for the algorithm, i.e., which input data is suitable, which smoke tests can be executed, and which metamorphic relations should be fulfilled. The hyperparameters are used to derive tests for different combinations of hyperparameters. A grid search of the hyperparameters is due to the exponential nature of the number of allowed combinations not possible. Currently, atoml uses the default value for all except one parameter. For the remaining parameters, different values are tested.
All the above information is defined in a YAML file. The descriptions.yml file contains examples for classifier definitions as well as a description of the YAML dialect. Here is an example for the definition of tests for the DecisionTreeClassifier from scikit-learn.
name: SKLEARN_DecisionTreeClassifier
framework: sklearn
type: classification
package: sklearn.tree
class: DecisionTreeClassifier
features: double
properties:
same: score_exact
scramble: score_exact
reorder: score_exact
const: score_exact
opposite: score_exact
parameters:
criterion:
type: values
values: [gini, entropy]
default: gini
splitter:
type: values
values: [best, random]
default: best
min_samples_split:
type: integer
min: 2
max: 10
stepsize: 2
default: 2
max_depth:
type: integer
min: 1
max: 5
stepsize: 1
default: 2
min_samples_leaf:
type: integer
min: 1
max: 13
stepsize: 4
default: 1
min_weight_fraction_leaf:
type: double
min: 0.0
max: 0.5
stepsize: 0.25
default: 0.0
max_features:
type: values
values: [auto, sqrt, log2, 0.1, 0.5, 0.8, None]
default: None
max_leaf_nodes:
type: integer
min: 10
max: 20
stepsize: 5
default: None
min_impurity_decrease:
type: double
min: 0.0
max: 0.4
stepsize: 0.2
default: 0.0
class_weight:
type: values
values: [balanced, None]
default: None
ccp_alpha:
type: double
min: 0.0
max: 0.4
stepsize: 0.2
default: 0.0
Atoml generates the tests such that they can be executed with the test runner that is already available for the programming languange in which the tests are defined, i.e., JUnit for Weka and Spark MLlib, and the built-in Python unittest for sklearn.
Atoml also provides the option to store results in a MySQL/Maria database. In this mode, the assertions of the test runners are replaced with code that stores the results in the MySQL database. The create_mysql_schema.sql contains the code to generate the schema for the database (notice that atoml currently requires a predefined user to write test results into the database). There is also a simple dashboard that can be used to visualize the results. We recommend the creation of a virtual environment for the dashboard as follows.
cd dashboard
python3 -m venv .
source bin/activate
pip install -r requirements.txt
deactivate
The dashboard can then be started.
source bin/activate
python main.py
In case of SQL errors regarding Public Key Retrieval it might be required to add the client option "allowPublicKeyRetrieval=true" to the mysql-connector to allow the client to automatically request the public key from the server (do not use this, if the database is not a local database, since it can be used for a Man-In-The-Middle-Attack).
Caret support: The test generation for the R-based Caret framework is currently only available for the generation of smoke tests for classification algorithms.
Predictions output: By adding the flag '-predictions' to the function call, the generated smoke tests save the predictions on the test data, and separately the predictions on the training data itself, in two csv files. This is currently only available for classification algorithms from caret, sklearn, spark, and weka.
AIToolBox support: Test generation for classification and clustering algorithms supported on Linux is available. '-mysql' option is currently not supported.
atoml is licensed under the Apache License, Version 2.0.