QCon 2015

Materials for the San Francisco QConf 2015 Workshop. The goal for the day is to learn to use Spark, H2O and Sparkling Water to build smart applications driven by machine learning models. The tutorials will go over:

How to clean and munge data in Spark and H2O.
How to read in multiple datasets and join them to provide more features to the machine learning process.
How to use MLlib in conjunction with H2O's library or algorithms to take the best of platforms using Sparkling Water.
How to integrate the scoring engine from your Sparkling Water script into Spark Streaming to produce real-time predictions.
How to deploy smarter applications on top of Spark.
How to deploy simple models

Outline

Spark & Sparkling Water Introduction
- H2O and Spark intro
- Sparkling Water intro
- Installation and setup of Spark
  - Running Spark shell
- Installation and setup of Sparkling Water
- Basic architecture and overview of functionalities
- Hands on demonstration of Sparkling Water
  - Running Sparkling Shell
Simple Spam Detector
- Use Spark to tokenize text
- Use MLlib's TF-IDF model to transform the data into a table
- Build GBM model to label incoming text as spam or not spam (ham)
Ask Craig(list) Application
- Build a classifier to label job description into appropriate industry categories
- Deploy it as Spark application
Standalone application concepts
- Deploy the classification model inside Spark Streaming
Spark Streaming and Model Deployment
- Loading a saved H2O binary model
- Exposing the model via Spark stream
Spark Streaming and Model Deployment #2
- Using exported POJO model in Spark stream
Final Application
- Assembling the final application: combining the front end and back end
Lending Club Example
- A smart app predicting loan interest
- Off-line training pipeline driven from R
- POJO models exposed via REST API

Requirements

Mac OS X or Linux
Java 7
Spark 1.5+
Sparkling Water 1.5.6
IntelliJ IDEA development environment
Scala SDK 2.10.4 for IDEA (can be fetch from Ivy cache)
Maven dependencies (fetch by Gradle)

Goals

Get familiar with Spark
Understand Sparkling Water
Combine power of Spark MLLib and Sparkling Water library to write machine learning flows
Write Spark/Sparkling Water standalone application
Deploy applications on Spark cluster
Deploy models

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
01-sparkling-water-intro		01-sparkling-water-intro
02-ham-or-spam		02-ham-or-spam
03-ask-craig		03-ask-craig
04-standalone-app		04-standalone-app
05-spark-streaming		05-spark-streaming
06-pojo-model-deployment		06-pojo-model-deployment
07-final-app		07-final-app
08-lending-club-app		08-lending-club-app
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

QCon 2015

Outline

Requirements

Goals

About

Releases

Packages

Contributors 2

Languages

License

h2oai/qcon2015

Folders and files

Latest commit

History

Repository files navigation

QCon 2015

Outline

Requirements

Goals

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages