Skip to content

Commit

Permalink
PUB-511: Add tutorial for GBM
Browse files Browse the repository at this point in the history
  • Loading branch information
arnocandel committed May 3, 2014
1 parent 9889302 commit c233d98
Show file tree
Hide file tree
Showing 11 changed files with 107 additions and 6 deletions.
10 changes: 10 additions & 0 deletions lib/resources/tutorials/gbm.iris/step1.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
<p>This tutorial shows how to use the GBM method in H<sub>2</sub>O for model training and classification.</p>

<p>GBM is a method for regression and classification that builds a forest of gradient boosted trees. The trees are built from a training dataset and can be used to make predictions on a test dataset.</p>

<p>
Vocabulary:
<ul>
<li><a href="http://en.wikipedia.org/wiki/Gradient_boosting">Gradient boosting</a></li>
</ul>
</p>
16 changes: 16 additions & 0 deletions lib/resources/tutorials/gbm.iris/step2.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
<p>
We will use the common <a href='http://archive.ics.uci.edu/ml/datasets/Iris'>Iris dataset</a> for training.
</p>

<p>
The iris dataset is provided <a href="/datasets/iris.csv">here</a>. Please, download it and save it on your disk.
</p>

<p>
To upload the dataset into H<sub>2</sub>O application, please, use the menu option <span class='label mref'>Data &gt; Upload</span> or direct <a href="/Upload.html" target="_blank">link</a>.

<p>
Alternatively, Data can be imported from a URL directly into H<sub>2</sub>O application.
Use the menu option <span class='label mref'>Data &gt; Import Files</span> or direct
<a href="/2/ImportFiles2.html?path=https%3A%2F%2Fraw.github.com%2F0xdata%2Fh2o%2Fmaster%2Fsmalldata%2Firis%2Firis.csv" target="_blank">link</a>.
</p>
5 changes: 5 additions & 0 deletions lib/resources/tutorials/gbm.iris/step3.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
<p>
<!--To parse the uploaded dataset (most likely already done), use the menu option <span class='label mref'>Data &gt; Parse</span> or direct <a href="/2/Parse2.query?source_key=iris.csv" target="_blank">link</a>.-->
The uploaded dataset was automatically parsed during the previous step.
You can always use the menu option <span class='label mref'>Data &gt; Parse</span> to parse data.
</p>
6 changes: 6 additions & 0 deletions lib/resources/tutorials/gbm.iris/step4.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
<p>
The parsed dataset can be inspected by the menu option <span class='label mref'>Data &gt; Inspect</span> or by direct
<a href="/2/Inspect2.html?src_key=iris.hex" target="_blank">link</a>.
</p>

<p>The inspect view shows the columns and rows for the dataset. For each column it displays its type, arity, basic statistical information (min/max/min/variance), and number of missing or invalid rows. The whole dataset can be explored row by row if desired.</p>
12 changes: 12 additions & 0 deletions lib/resources/tutorials/gbm.iris/step5.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
<p>
Now that we have loaded the iris dataset into H<sub>2</sub>O, we can build a GBM
model. For this purpose, please use the menu option <span class='label
mref'>Model &gt; GBM</span> or direct <a
href="/2/GBM.query?source=iris.hex&destination_key=model&response=4&ntrees=20&ignore=&learn_rate=0.2"
target="_blank">link</a>.
<p>
</p>
The GBM method has multiple tuning parameters that can affect the model it will
build. For example, you can increase the number of trees, the maximum tree
depth or the learning rate.
</p>
4 changes: 4 additions & 0 deletions lib/resources/tutorials/gbm.iris/step6.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
<p>
Running GBM produces a model and its progress is displayed
<a href='2/GBMModelView.html?_modelKey=model' target='_blank'>here</a>.
</p>
4 changes: 4 additions & 0 deletions lib/resources/tutorials/gbm.iris/step7.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
<p>
The model can be used to make a prediction on a test set via
the menu option <span class='label mref'>Score &gt; Predict</span> or direct <a href="/2/Predict.query?model=model&data=iris.hex&prediction=pred" target="_blank">link</a>.
</p>
4 changes: 4 additions & 0 deletions lib/resources/tutorials/gbm.iris/step8.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
<p>
The prediction can now be scored via
the menu option <span class='label mref'>Score &gt; Confusion Matrix</span> or direct <a href="/2/ConfusionMatrix.query?actual=iris.hex&vactual=4&predict=pred&vpredict=predict" target="_blank">link</a>.
</p>
1 change: 1 addition & 0 deletions src/main/java/water/api/RequestServer.java
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,7 @@ public enum API_VERSION {
// Help and Tutorials
Request.addToNavbar(registerRequest(new Documentation()), "H2O Documentation", "Help", USE_NEW_TAB);
Request.addToNavbar(registerRequest(new Tutorials()), "Tutorials Home", "Help", USE_NEW_TAB);
Request.addToNavbar(registerRequest(new TutorialGBM()), "GBM Tutorial", "Help", USE_NEW_TAB);
Request.addToNavbar(registerRequest(new TutorialDeepLearning()),"Deep Learning Tutorial", "Help", USE_NEW_TAB);
Request.addToNavbar(registerRequest(new TutorialRFIris()), "Random Forest Tutorial", "Help", USE_NEW_TAB);
Request.addToNavbar(registerRequest(new TutorialGLMProstate()), "GLM Tutorial", "Help", USE_NEW_TAB);
Expand Down
33 changes: 33 additions & 0 deletions src/main/java/water/api/TutorialGBM.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
package water.api;

/**
* Basic page introducing tutorial for GBM on Iris
*/
public class TutorialGBM extends TutorialWorkflow {

private final transient TutorWorkflow _wf;
private final static String[][] TUTORIAL_STEPS = new String[][]{
/* Title Short Summary File containing step description */
new String[] { "Step 1", "Introduction", "/tutorials/gbm.iris/step1.html" },
new String[] { "Step 2", "Dataset inhale", "/tutorials/gbm.iris/step2.html" },
new String[] { "Step 3", "Parsing the dataset", "/tutorials/gbm.iris/step3.html" },
new String[] { "Step 4", "Inspecting the dataset", "/tutorials/gbm.iris/step4.html" },
new String[] { "Step 5", "Building the model", "/tutorials/gbm.iris/step5.html" },
new String[] { "Step 6", "Inspecting the model", "/tutorials/gbm.iris/step6.html" },
new String[] { "Step 7", "Predict on a test set", "/tutorials/gbm.iris/step7.html" },
new String[] { "Step 8", "Scoring the prediction", "/tutorials/gbm.iris/step8.html" },
};

public TutorialGBM() {
_wf = new TutorWorkflow("GBM Tutorial");
int i = 1;
for (String[] info : TUTORIAL_STEPS) {
_wf.addStep(i++, new FileTutorStep(info));
}
}

@Override
protected TutorWorkflow getWorkflow() {
return _wf;
}
}
18 changes: 12 additions & 6 deletions src/main/java/water/api/Tutorials.java
Original file line number Diff line number Diff line change
Expand Up @@ -19,25 +19,31 @@ public class Tutorials extends HTMLOnlyRequest {
+ "</div>"
+ "<div class='row'>"

+ "<div class='span3 col'>"
+ "<div class='span2 col'>"
+ " <h2>Random Forest</h2>"
+ "<p>Random Forest is a classical machine learning algorithm for classification and regression. Learn how to use it with H<sub>2</sub>O.</it></p>"
+ "<p>Random Forest is a classical machine learning method for classification and regression. Learn how to use it with H<sub>2</sub>O.</it></p>"
+ "<a href='/TutorialRFIris.html' class='btn btn-primary'>Try it!</a>"
+ "</div>"

+ "<div class='span3 col'>"
+ "<div class='span2 col'>"
+ " <h2>GBM</h2>"
+ "<p>GBM uses gradient boosted regression trees for highly predictive regression and classification.</p>"
+ "<a href='/TutorialGBM.html' class='btn btn-primary'>Try it!</a>"
+ "</div>"

+ "<div class='span2 col'>"
+ "<h2>GLM</h2>"
+ "<p>Generalized linear model is a generalization of linear regression. Experience its unique power on top of H<sub>2</sub>O.</p>"
+ "<a href='/TutorialGLMProstate.html' class='btn btn-primary'>Try it!</a>"
+ "</div>"

+ "<div class='span3 col'>"
+ "<h2>K-means</h2>"
+ "<div class='span2 col'>"
+ "<h2>K-Means</h2>"
+ "<p>Perform cluster analysis with H<sub>2</sub>O. It employs K-means, a highly scalable clustering algorithm.</p>"
+ "<a href='/TutorialKMeans.html' class='btn btn-primary'>Try it!</a>"
+ "</div>"

+ "<div class='span3 col'>"
+ "<div class='span2 col'>"
+ "<h2>Deep Learning</h2>"
+ "<p>H<sub>2</sub>O's distributed Deep Learning models high-level abstractions in data with deep artificial neural networks.</p>"
+ "<a href='/TutorialDeepLearning.html' class='btn btn-primary'>Try it!</a>"
Expand Down

0 comments on commit c233d98

Please sign in to comment.