H2O.ai provides an open source platform for automated machine learning: H2O-3
H2O is an open source, in-memory, distributed, fast, and scalable machine learning and predictive analytics platform that allows you to build machine learning models on big data and provides easy productionalization of those models in an enterprise environment.
- Follow steps to deploy FfDL from the user guide
- Add some data, either follow the user guide to store the data locally or host the data in a cloud storage bucket and pull it at runtime.
- Change the manifest.yaml to the settings that you want
- NOTE: It is recommended that you allocate at least 4x the amount of memory as the size of the dataset you are trying to run H2O with.
- EXAMPLE: 1.5GB Dataset --> 6.0 GB Memory allocated
- Once FfDL is deployed in your Kubernetes cluster, use the CLI or GUI to deploy H2O
sample deployment scripts are hosted under: FfDL/community/FfDL-H2Oai
If you need a sample dataset, you can pull this toy dataset: Train Set: s3://h2o-public-test-data/smalldata/higgs/higgs_train_10k.csv Test Set: s3://h2o-public-test-data/smalldata/higgs/higgs_test_5k.csv