What is? Automated Machine Learning provides methods and processes to make Machine Learning available for non-Machine Learning experts, to improve efficiency of Machine Learning and to accelerate research on Machine Learning.
Personal note: automl algorithms in this field will bridge the gap and automate several key processes, but it will not allow a practitioner to do serious research or solve business or product problems easily. The importance of this field is to advance each subfield, whether HPO, NAS, etc. these selective novelties can help us solve specific issues, i.e, lets take HPO, we can use it to save time and money on redundant parameter searches, especially when it comes to resource heavy algorithms such as Deep learning (think GPU costs).
Personal thoughts on optimizations: be advised that optimizing problems will not guarantee a good result, you may over fit your problem in ways you are not aware of, beyond traditional overfitting and better accuracy doesn't guarantee a better result (for example if your dataset is unbalanced, needs imputing, cleaning, etc.).
Always examine the data and results in order to see if they are correct.
Automl.org’s github - it has a backup for the following projects.
Automl.org is a joint effort between two universitie, freiburg and hannover, their website curates information regarding:
- HPO - hyper parameter optimization
- NAS - neural architecture search
- Meta Learning - learning across datasets, warmstarting of HPO and NAS etc.
Automl aims to automate these processes:
- Preprocess and clean the data.
- Select and construct appropriate features.
- Select an appropriate model family.
- Optimize model hyperparameters.
- Postprocess machine learning models.
- Critically analyze the results obtained.
Historically, AFAIK AutoML’s birth started with several methods to optimize each one of the previous processes in ml. IINM, weka’s paper (2012) was the first step in aggregating these ideas into a first public working solution.
The following is referenced from AutoML.org:
- AutoWEKA is an approach for the simultaneous selection of a machine learning algorithm and its hyperparameters; combined with the WEKA package it automatically yields good models for a wide variety of data sets.
- Auto-sklearn is an extension of AutoWEKA using the Python library scikit-learn which is a drop-in replacement for regular scikit-learn classifiers and regressors.
- TPOT is a data-science assistant which optimizes machine learning pipelines using genetic programming.
- (google) H2O AutoML provides automated model selection and ensembling for the H2O machine learning and data analytics platform. (git)
- TransmogrifAI is an AutoML library running on top of Spark.
- MLBoX is an AutoML library with three components: preprocessing, optimisation and prediction
- MLJar (git) medium, 2 - Automated Machine Learning for tabular data mljar builds a complete Machine Learning Pipeline. Perform exploratory analysis, search for a signal in the data, and discover relationships between features in your data with AutoML. Train top ML models with advanced feature engineering, many algorithms, hyper-parameters tuning, Ensembling, and Stacking. Stay ahead of competitors and predict the future with advanced ML. Deploy your models in the cloud or use them locally
- + advanced feature engineering
- + algorithms selection and tuning
- + automatic documentation
- + ML explanations
- Hyperopt, including the TPE algorithm
- Sequential Model-based Algorithm Configuration (SMAC)
- Spearmint
- BOHB: Bayesian Optimization combined with HyperBand
- RoBO – Robust Bayesian Optimization framework
- SMAC3 – a python re-implementation of the SMAC algorithm
- Auto-PyTorch
- AutoKeras
- DEvol
- HyperAS: a combination of Keras and Hyperopt
- talos: Hyperparameter Scanning and Optimization for Keras