GitHub - Huaxu007/Hyperactive: A hyperparameter optimization and meta-learning toolbox for convenient and fast prototyping of machine-/deep-learning models.

A hyperparameter optimization and meta-learning toolbox for convenient and fast prototyping of machine-learning models.

Master status:
Dev status:
Code quality:
Latest versions:

Hyperactive is primarly a hyperparameter optimization toolkit, that aims to simplify the model-selection and -tuning process. You can use any machine- or deep-learning package and it is not necessary to learn new syntax. Hyperactive offers high versatility in model optimization because of two characteristics:

You can define any kind of model in the objective function. It just has to return a score/metric that gets maximized.
The search space accepts not just 'int', 'float' or 'str' as data types but even functions, classes or any python objects.

Main features • Installation • API reference • Roadmap • Citation • License

Hyperactive features a collection of optimization algorithms that can be used for a variety of optimization problems. The following table shows listings of the capabilities of Hyperactive, where each of the items links to an example:

Optimization Techniques

Tested and Supported Packages

Optimization Applications

Local Search:

Global Search:

Population Methods:

Sequential Methods:

Machine Learning:

Deep Learning:

Parallel Computing:

Feature Engineering:

Machine Learning:

Deep Learning:

Meta-data:

Miscellaneous:

Test Functions
Fit Gaussian Curves

The examples above are not necessarly done with realistic datasets or training procedures. The purpose is fast execution of the solution proposal and giving the user ideas for interesting usecases.

Installation

The most recent version of Hyperactive is available on PyPi:

pip install hyperactive

Minimal example

from sklearn.model_selection import cross_val_score
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.datasets import load_boston
from hyperactive import Hyperactive

data = load_boston()
X, y = data.data, data.target

""" define the model in a function """
def model(opt):
    """ pass the suggested parameter to the machine learning model """
    gbr = GradientBoostingRegressor(
        n_estimators=opt["n_estimators"]
    )
    scores = cross_val_score(gbr, X, y, cv=3)

    """ return a single numerical value, which gets maximized """
    return scores.mean()


""" 
create the search space 
determines the ranges of parameters you want the optimizer to search through
"""
search_space = {"n_estimators": list(range(10, 200, 5))}

""" start the optimization run """
hyper = Hyperactive()
hyper.add_search(model, search_space, n_iter=50)
hyper.run()

Hyperactive API reference

Hyperactive(verbosity, distribution)

verbosity = ["progress_bar", "print_results", "print_times"]
- (list, False)
- The verbosity list determines what part of the optimization information will be printed in the command line.

distribution = {"multiprocessing": {"initializer": tqdm.set_lock, "initargs": (tqdm.get_lock(),),}}

(str, dict, callable)

Access the parallel processing in three ways:

Via a str "multiprocessing" or "joblib" to choose one of the two.
Via a dictionary with one key "multiprocessing" or "joblib" and a value that is the input argument of Pool and Parallel. The default argument is a good example of this.
Via your own parallel processing function that will be used instead of those for multiprocessing and joblib. The wrapper-function must work similar to the following two functions:

Multiprocessing:

def multiprocessing_wrapper(process_func, search_processes_paras, **kwargs):
  n_jobs = len(search_processes_paras)

  pool = Pool(n_jobs, **kwargs)
  results = pool.map(process_func, search_processes_paras)

  return results

Joblib:

def joblib_wrapper(process_func, search_processes_paras, **kwargs):
    n_jobs = len(search_processes_paras)

    jobs = [
        delayed(process_func)(**info_dict)
        for info_dict in search_processes_paras
    ]
    results = Parallel(n_jobs=n_jobs, **kwargs)(jobs)

    return results

.add_search(objective_function, search_space, n_iter, optimizer, n_jobs, initialize, max_score, random_state, memory, memory_warm_start)

objective_function
- (callable)
- The objective function defines the optimization problem. The optimization algorithm will try to maximize the numerical value that is returned by the objective function by trying out different parameters from the search space.
search_space
- (dict)
- Defines the space were the optimization algorithm can search for the best parameters for the given objective function.
n_iter
- (int)
- The number of iterations that will be performed during the optimiation run. The entire iteration consists of the optimization-step, which decides the next parameter that will be evaluated and the evaluation-step, which will run the objective function with the chosen parameter and return the score.
optimizer = "default"
- (object)
- Instance of optimization class that can be imported from Hyperactive. "default" corresponds to the random search optimizer. The following classes can be imported and used:
  - HillClimbingOptimizer
  - StochasticHillClimbingOptimizer
  - RepulsingHillClimbingOptimizer
  - RandomSearchOptimizer
  - RandomRestartHillClimbingOptimizer
  - RandomAnnealingOptimizer
  - SimulatedAnnealingOptimizer
  - ParallelTemperingOptimizer
  - ParticleSwarmOptimizer
  - EvolutionStrategyOptimizer
  - BayesianOptimizer
  - TreeStructuredParzenEstimators
  - DecisionTreeOptimizer
  - EnsembleOptimizer
- Example:
```
...

opt_hco = HillClimbingOptimizer(epsilon=0.08)
hyper = Hyperactive()
hyper.add_search(..., optimizer=opt_hco)
hyper.run()

...
```
n_jobs = 1
- (int)
- Number of jobs to run in parallel. Those jobs are optimization runs that work independend from another (no information sharing). If n_jobs == -1 the maximum available number of cpu cores is used.
initialize = {"grid": 4, "random": 2, "vertices": 4}
- (dict)
- The initialization dictionary automatically determines a number of parameters that will be evaluated in the first n iterations (n is the sum of the values in initialize). The initialize keywords are the following:
  - grid
    - Initializes positions in a grid like pattern. Positions that cannot be put into a grid are randomly positioned.
  - vertices
    - Initializes positions at the vertices of the search space. Positions that cannot be put into a vertices are randomly positioned.
  - random
    - Number of random initialized positions
  - warm_start
    - List of parameter dictionaries that marks additional start points for the optimization run.
max_score = None
- (float, None)
- Maximum score until the optimization stops. The score will be checked after each completed iteration.
random_state = None
- (int, None)
- Random state for random processes in the random, numpy and scipy module.
memory = True
- (bool)
- Whether or not to use the "memory"-feature. The memory is a dictionary, which gets filled with parameters and scores during the optimization run. If the optimizer encounters a parameter that is already in the dictionary it just extracts the score instead of reevaluating the objective function (which can take a long time).

memory_warm_start = None

(pandas dataframe, None)

Pandas dataframe that contains score and paramter information that will be automatically loaded into the memory-dictionary.

example:

score	x1	x2	x...
0.756	0.1	0.2	...
0.823	0.3	0.1	...
...	...	...	...
...	...	...	...

.run(max_time)

max_time = None
- (float, None)
- Maximum number of seconds until the optimization stops. The time will be checked after each completed iteration.

.best_para(objective_function)

objective_function
- (callable)
returnes: dictionary
Parameter dictionary of the best score of the given objective_function found in the previous optimization run.

example:
```
{
  'x1': 0.2, 
  'x2': 0.3,
}
```

.best_score(objective_function)

objective_function
- (callable)
returnes: int or float
Numerical value of the best score of the given objective_function found in the previous optimization run.

.results(objective_function)

objective_function
- (callable)
returnes: Pandas dataframe

The dataframe contains score, paramter information, iteration times and evaluation times of the given objective_function found in the previous optimization run.

example:

score	x1	x2	x...	eval_times	iter_times
0.756	0.1	0.2	...	0.953	1.123
0.823	0.3	0.1	...	0.948	1.101
...	...	...	...	...	...
...	...	...	...	...	...

Optimizers

HillClimbingOptimizer

- epsilon=0.05
- distribution="normal"
- n_neighbours=3
- rand_rest_p=0.03

RepulsingHillClimbingOptimizer

- epsilon=0.05
- distribution="normal"
- n_neighbours=3
- rand_rest_p=0.03
- repulsion_factor=3

SimulatedAnnealingOptimizer

- epsilon=0.05
- distribution="normal"
- n_neighbours=3
- rand_rest_p=0.03
- p_accept=0.1
- norm_factor="adaptive"
- annealing_rate=0.975
- start_temp=1

RandomSearchOptimizer

RandomRestartHillClimbingOptimizer

- epsilon=0.05
- distribution="normal"
- n_neighbours=3
- rand_rest_p=0.03
- n_iter_restart=10

RandomAnnealingOptimizer

- epsilon=0.05
- distribution="normal"
- n_neighbours=3
- rand_rest_p=0.03
- annealing_rate=0.975
- start_temp=1

ParallelTemperingOptimizer

- n_iter_swap=10
- rand_rest_p=0.03

ParticleSwarmOptimizer

- inertia=0.5
- cognitive_weight=0.5
- social_weight=0.5
- temp_weight=0.2
- rand_rest_p=0.03

EvolutionStrategyOptimizer

- mutation_rate=0.7
- crossover_rate=0.3
- rand_rest_p=0.03

BayesianOptimizer

- gpr=gaussian_process["gp_nonlinear"]
- xi=0.03
- warm_start_smbo=None
- rand_rest_p=0.03

TreeStructuredParzenEstimators

- gamma_tpe=0.5
- warm_start_smbo=None
- rand_rest_p=0.03

DecisionTreeOptimizer

- tree_regressor="extra_tree"
- xi=0.01
- warm_start_smbo=None
- rand_rest_p=0.03

Roadmap

v2.0.0 ✔️

Change API

v2.1.0 ✔️

Save memory of evaluations for later runs (long term memory)
Warm start sequence based optimizers with long term memory
Gaussian process regressors from various packages (gpy, sklearn, GPflow, ...) via wrapper

v2.2.0 ✔️

Add basic dataset meta-features to long term memory
Add helper-functions for memory
- connect two different model/dataset hashes
- split two different model/dataset hashes
- delete memory of model/dataset
- return best known model for dataset
- return search space for best model
- return best parameter for best model

v2.3.0 ✔️

Tree-structured Parzen Estimator
Decision Tree Optimizer
add "max_sample_size" and "skip_retrain" parameter for sbom to decrease optimization time

v3.0.0 ✔️

New API
- expand usage of objective-function
- No passing of training data into Hyperactive
- Removing "long term memory"-support (better to do in separate package)
- More intuitive selection of optimization strategies and parameters
- Separate optimization algorithms into other package
- expand api so that optimizer parameter can be changed at runtime
- add extensive testing procedure (similar to Gradient-Free-Optimizers)

v3.1.0

New implementation of dashboard for visualization of search-data

v3.2.0

New implementation of "long term memory" for search-data storage and usage

Experimental algorithms

The following algorithms are of my own design and, to my knowledge, do not yet exist in the technical literature. If any of these algorithms already exist I would like you to share it with me in an issue.

Random Annealing

A combination between simulated annealing and random search.

References

[dto] Scikit-Optimize

Citing Hyperactive

@Misc{hyperactive2019,
  author =   {{Simon Blanke}},
  title =    {{Hyperactive}: A hyperparameter optimization and meta-learning toolbox for machine-/deep-learning models.},
  howpublished = {\url{https://github.com/SimonBlanke}},
  year = {since 2019}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1,752 Commits
docs/images		docs/images
examples		examples
hyperactive		hyperactive
requirements		requirements
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A hyperparameter optimization and meta-learning toolbox for convenient and fast prototyping of machine-learning models.

Main features • Installation • API reference • Roadmap • Citation • License

Hyperactive features a collection of optimization algorithms that can be used for a variety of optimization problems. The following table shows listings of the capabilities of Hyperactive, where each of the items links to an example:

Installation

Minimal example

Hyperactive API reference

Optimizers

Roadmap

Experimental algorithms

Random Annealing

References

[dto] Scikit-Optimize

Citing Hyperactive

License

About

Releases

Packages

Languages

License

Huaxu007/Hyperactive

Folders and files

Latest commit

History

Repository files navigation

A hyperparameter optimization and meta-learning toolbox for convenient and fast prototyping of machine-learning models.

Main features • Installation • API reference • Roadmap • Citation • License

Hyperactive features a collection of optimization algorithms that can be used for a variety of optimization problems. The following table shows listings of the capabilities of Hyperactive, where each of the items links to an example:

Installation

Minimal example

Hyperactive API reference

Optimizers

Roadmap

Experimental algorithms

Random Annealing

References

[dto] Scikit-Optimize

Citing Hyperactive

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages