Skip to content

Commit

Permalink
Merge branch 'master' of github.com:h2oai/h2o
Browse files Browse the repository at this point in the history
  • Loading branch information
jessica0xdata committed Feb 6, 2015
2 parents 694efac + ec9e3e0 commit 644a1a2
Show file tree
Hide file tree
Showing 15 changed files with 68 additions and 146 deletions.
6 changes: 3 additions & 3 deletions R/H2OTestDemo.R
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ localH2O = new("H2OClient", ip = "localhost", port = 54321)
# h2o.checkClient(localH2O)

# Test using prostate cancer data set
prostate.hex = h2o.importURL(localH2O, path = "https://raw.github.com/0xdata/h2o/master/smalldata/logreg/prostate.csv", key = "prostate.hex")
prostate.hex = h2o.importURL(localH2O, path = "https://raw.github.com/h2oai/h2o/master/smalldata/logreg/prostate.csv", key = "prostate.hex")
prostate.sum = summary(prostate.hex)
print(prostate.sum)
prostate.glm = h2o.glm(y = "CAPSULE", x = c("AGE","RACE","PSA","DCAPS"), data = prostate.hex, family = "binomial", nfolds = 10, alpha = 0.5)
Expand Down Expand Up @@ -38,13 +38,13 @@ for(i in 1:length(glm_test.hex))


#Test of GLMGrid using prostate cancer data set
prostate.hex = h2o.importURL(localH2O, path = "https://raw.github.com/0xdata/h2o/master/smalldata/logreg/prostate.csv", key = "prostate.hex")
prostate.hex = h2o.importURL(localH2O, path = "https://raw.github.com/h2oai/h2o/master/smalldata/logreg/prostate.csv", key = "prostate.hex")
prostate.sum = summary(prostate.hex)
prostate.glmgrid = h2o.glmgrid(y = "CAPSULE", x = c("AGE","RACE","PSA","DCAPS"), data = prostate.hex, family = "binomial", nfolds = 10, alpha = c(0.2,0.5,1),lambda=c(1e-4,1))
print(prostate.glmgrid)

# Test of PCA using prostate cancer data set
prostate.hex = h2o.importURL(localH2O, path = "https://raw.github.com/0xdata/h2o/master/smalldata/logreg/prostate.csv", key = "prostate.hex")
prostate.hex = h2o.importURL(localH2O, path = "https://raw.github.com/h2oai/h2o/master/smalldata/logreg/prostate.csv", key = "prostate.hex")
prostate.pca = h2o.prcomp(prostate.hex)
print(prostate.pca)
summary(prostate.pca)
Expand Down
117 changes: 16 additions & 101 deletions R/README.txt
Original file line number Diff line number Diff line change
@@ -1,108 +1,36 @@
H2O in R
------------

--------

These instructions assume you are using R 2.13.0 or later.

**STEP 1**

The download package can be obtained by clicking on the button "Download H2O" at http://0xdata.com/h2o <http://0xdata.com/h2o.

Unzip the downloaded h2o zip file


**STEP 2: Console Users and Studio Users should follow the same steps: **

In the R console install the package by

1. Visiting http://0xdata.com/downloadtable/
2. Choosing the version of H2O appropriate for their environment
3. Copy and pasting the R command shown below the downloadable zip file on the download page for the version of their choice into their R console.


Correctly following the above steps will return output similar to the following:

trying URL 'http://s3.amazonaws.com/h2o-release/h2o/master/1247/R/bin/macosx/contrib/3.0/h2o_2.3.0.1247.tgz'
Content type 'application/x-tar; charset=binary' length 36702378 bytes (35.0 Mb)
opened URL
==================================================
downloaded 35.0 Mb


**STEP 3**

Start an instance of H2O. If you have questions about how to do this see the notes provided at the bottom of the page for starting from a zip file.

If users choose to not start an instance of H2O prior to attempting to connect to H2O through R, an instance will be started automatically for them at ip: localhost, port: 54321.

*Users should be aware that in order for H2O to successfully run through R, an instance of H2O must also simultaneously be running. If the instance of H2O is stopped, the R program will no longer run, and work done will be lost.*


**STEP 4**

call the H2O package in the R environment, start the connection between R and H2O at ip: localhost and port: 54321


>library(h2o)

>localH2O = h2o.init()



**STEP 6**
**STEP 1: Unzip**

Here is an example of using the above object in an H2O call in R
If you are reading this, you probably already downloaded an h2o zip file and unzipped it.
(Note: Obtain a zip file from the Download button at http://h2o.ai)


**STEP 2: Install the H2O R Package**

>irisPath = system.file("extdata", "iris.csv", package="h2o")
>iris.hex = h2o.importFile(localH2O, path = irisPath, key = "iris.hex")
>summary(iris.hex)
In a terminal window, type:

R CMD INSTALL h2o*.gz


**STEP 3: Start a connection to H2O**

Load the H2O package in the R environment. Start the connection between R and H2O (with defaults at ip: localhost and port: 54321).
Look at the help for h2o.init() for additional information about how to start and connect to H2O.

Getting started from a zip file
-------------------------------
> library(h2o)
> localH2O = h2o.init()


1. Download the latest release of H2O as a .zip file from the H2O website http://0xdata.com/h2O/.
**STEP 4: Drive H2O from R**

2. From your terminal change your working directory to the same directory where your .zip file is saved.

3. From your terminal, unzip the .zip file. For example:


unzip h2o-1.7.0.520.zip

4. At the prompt enter the following commands. (Choose a unique name (use the -name option) for yourself if other people might be running H2O in your network.)


cd h2o-1.7.0.520
java -Xmx1g -jar h2o.jar -name mystats-cloud

5. Wait a few moments and the output similar to the following will appear in your terminal window:



03:05:45.311 main INFO WATER: ----- H2O started -----
03:05:45.312 main INFO WATER: Build git branch: master
03:05:45.312 main INFO WATER: Build git hash: f253798433c109b19acd14cb973b45f255c59f3f
03:05:45.312 main INFO WATER: Build git describe: f253798
03:05:45.312 main INFO WATER: Build project version: 1.7.0.520
03:05:45.313 main INFO WATER: Built by: 'jenkins'
03:05:45.313 main INFO WATER: Built on: 'Thu Sep 12 00:01:52 PDT 2013'
03:05:45.313 main INFO WATER: Java availableProcessors: 8
03:05:45.321 main INFO WATER: Java heap totalMemory: 0.08 gb
03:05:45.321 main INFO WATER: Java heap maxMemory: 0.99 gb
03:05:45.322 main INFO WATER: ICE root: '/tmp/h2o-tomk'
03:05:45.364 main INFO WATER: Internal communication uses port: 54322
+ Listening for HTTP and REST traffic on http://192.168.1.52:54321/
03:05:45.409 main INFO WATER: H2O cloud name: 'mystats-cloud'
03:05:45.409 main INFO WATER: (v1.7.0.520) 'mystats-cloud' on /192.168.1.52:54321, discovery address /236.151.114.91:60567
03:05:45.411 main INFO WATER: Cloud of size 1 formed [/192.168.1.52:54321]
03:05:45.543 main INFO WATER: Log dir: '/tmp/h2o-tomk/h2ologs'
> irisPath = system.file("extdata", "iris.csv", package="h2o")
> iris.hex = h2o.importFile(localH2O, irisPath)
> summary(iris.hex)



Expand All @@ -126,16 +54,3 @@ memory). For best performance, Xmx should be 4x the size of your
data, but never more than the total amount of memory on your
computer. For larger data sets, running on a server or service
with more memory available for computing is recommended.













8 changes: 4 additions & 4 deletions R/h2o-DESCRIPTION.template
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,11 @@ Package: h2o
Type: Package
Title: H2O R Interface
Version: SUBST_PROJECT_VERSION
Date: 2014-05-15
Date: 2015-02-05
Author: Anqi Fu, Spencer Aiello, Ariel Rao, Tom Kraljevic and Petr Maj,
with contributions from the 0xdata team
with contributions from the H2O team
Maintainer: Tom Kraljevic <[email protected]>
Description: Run H2O via its REST API from within R.
Description: Run H2O via its REST interface from within R.
License: Apache License (== 2.0)
Depends: R (>= 2.13.0), statmod, survival, stats, graphics, utils, methods
Imports: RCurl, rjson, tools
Expand All @@ -15,4 +15,4 @@ Collate: Wrapper.R Internal.R Classes.R ParseImport.R models.R Algorithms.R
NeedsCompilation: no
SystemRequirements: Java (>= 1.6)
Suggests: plyr
URL: http://www.0xdata.com
URL: http://www.h2o.ai
21 changes: 10 additions & 11 deletions R/h2o-package.template
Original file line number Diff line number Diff line change
Expand Up @@ -13,36 +13,35 @@ This is a package for running H2O via its REST API from within R. To communicate
Package: \tab h2o\cr
Type: \tab Package\cr
Version: \tab SUBST_PROJECT_VERSION\cr
Date: \tab 2014-05-15\cr
Date: \tab 2015-02-05\cr
License: \tab Apache License (== 2.0)\cr
Depends: \tab R (>= 2.13.0), RCurl, rjson, statmod, tools, methods, utils\cr
}

This package allows the user to run basic H2O commands using R commands. In order to use it, you must first have H2O running (See \href{http://docs.0xdata.com/newuser/quickstart_jar.html}{How to Start H2O}). To run H2O on your local machine, call \code{h2o.init} without any arguments, and H2O will be automatically launched on \url{http://127.0.0.1:54321}, where the IP is "127.0.0.1" and the port is 54321. If H2O is running on a cluster, you must provide the IP and port of the remote machine as arguments to the h2o.init() call.
This package allows the user to run basic H2O commands using R commands. In order to use it, you must first have H2O running (See \href{http://docs.h2o.ai/newuser/quickstart_jar.html}{How to Start H2O}). To run H2O on your local machine, call \code{h2o.init} without any arguments, and H2O will be automatically launched on http://127.0.0.1:54321, where the IP is "127.0.0.1" and the port is 54321. If H2O is running on a cluster, you must provide the IP and port of the remote machine as arguments to the h2o.init() call.

H2O supports a number of standard statistical models, such as GLM, K-means, and Random Forest classification. For example, to run GLM, call \code{\link{h2o.glm}} with the H2O parsed data and parameters (response variable, error distribution, etc...) as arguments. (The operation will be done on the server associated with the data object where H2O is running, not within the R environment).

Note that no actual data is stored in the R workspace; and no actual work is carried out by R. R only saves the named objects, which uniquely identify the data set, model, etc on the server. When the user makes a request, R queries the server via the REST API, which returns a JSON file with the relevant information that R then displays in the console.
}
\author{
Anqi Fu, Tom Kraljevic and Petr Maj, with contributions from the 0xdata team
Anqi Fu, Tom Kraljevic and Petr Maj, with contributions from the H2O team

Maintainer: Ariel Rao <ariel@0xdata.com>
Maintainer: Tom Kraljevic <tomk@0xdata.com>
}
\references{
\itemize{
\item \href{http://www.0xdata.com}{0xdata Homepage}
\item \href{http://docs.0xdata.com}{H2O Documentation}
\item \href{https://github.com/0xdata/h2o}{H2O on Github}
\item \href{http://www.h2o.ai}{H2O Homepage}
\item \href{http://docs.h2o.ai}{H2O Documentation}
\item \href{https://github.com/h2oai/h2o}{H2O on Github}
}
}
\keyword{ package }
\examples{
# Check connection with H2O and ensure local H2O R package matches server version.
# Optionally, ask for startH2O to start H2O if it's not already running.
# Note that for startH2O to work, the IP must be 127.0.0.1 or localhost with port 54321.
# Connect to an instance of H2O (after creating it, if needed).
# See the help for h2o.init() for more details.
library(h2o)
localH2O = h2o.init(ip = "127.0.0.1", port = 54321, startH2O = TRUE)
localH2O = h2o.init()

# Import iris dataset into H2O and print summary
irisPath = system.file("extdata", "iris.csv", package = "h2o")
Expand Down
4 changes: 2 additions & 2 deletions R/h2o-package/man/h2o.deeplearning.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -112,10 +112,10 @@ h2o.deeplearning(x = 1:4, y = 5, data = iris.hex, activation = "Tanh",
\dontrun{
# DeepLearning variable importance
# Also see:
# https://github.com/0xdata/h2o/blob/master/R/tests/testdir_demos/runit_demo_VI_all_algos.R
# https://github.com/h2oai/h2o/blob/master/R/tests/testdir_demos/runit_demo_VI_all_algos.R
data.hex = h2o.importFile(
localH2O,
path = "https://raw.github.com/0xdata/h2o/master/smalldata/bank-additional-full.csv",
path = "https://raw.github.com/h2oai/h2o/master/smalldata/bank-additional-full.csv",
key = "data.hex")
myX = 1:20
myY="y"
Expand Down
4 changes: 2 additions & 2 deletions R/h2o-package/man/h2o.gbm.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -112,10 +112,10 @@ h2o.gbm(y = dependent, x = independent, data = australia.hex, n.trees = 3, inter

# GBM variable importance
# Also see:
# https://github.com/0xdata/h2o/blob/master/R/tests/testdir_demos/runit_demo_VI_all_algos.R
# https://github.com/h2oai/h2o/blob/master/R/tests/testdir_demos/runit_demo_VI_all_algos.R
data.hex = h2o.importFile(
localH2O,
path = "https://raw.github.com/0xdata/h2o/master/smalldata/bank-additional-full.csv",
path = "https://raw.github.com/h2oai/h2o/master/smalldata/bank-additional-full.csv",
key = "data.hex")
myX = 1:20
myY="y"
Expand Down
4 changes: 2 additions & 2 deletions R/h2o-package/man/h2o.glm.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -153,10 +153,10 @@ h2o.glm(y = "VOL", x = myX, data = prostate.hex, family = "gaussian",
\dontrun{
# GLM variable importance
# Also see:
# https://github.com/0xdata/h2o/blob/master/R/tests/testdir_demos/runit_demo_VI_all_algos.R
# https://github.com/h2oai/h2o/blob/master/R/tests/testdir_demos/runit_demo_VI_all_algos.R
data.hex = h2o.importFile(
localH2O,
path = "https://raw.github.com/0xdata/h2o/master/smalldata/bank-additional-full.csv",
path = "https://raw.github.com/h2oai/h2o/master/smalldata/bank-additional-full.csv",
key = "data.hex")
myX = 1:20
myY="y"
Expand Down
2 changes: 1 addition & 1 deletion R/h2o-package/man/h2o.importURL.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ If \code{parse = TRUE}, the function returns an object of class \code{\linkS4cla
library(h2o)
localH2O = h2o.init()
prostate.hex = h2o.importURL(localH2O, path = paste("https://raw.github.com",
"0xdata/h2o/master/smalldata/logreg/prostate.csv", sep = "/"), key = "prostate.hex")
"h2oai/h2o/master/smalldata/logreg/prostate.csv", sep = "/"), key = "prostate.hex")
class(prostate.hex)
summary(prostate.hex)
}
Expand Down
2 changes: 1 addition & 1 deletion R/h2o-package/man/h2o.pcr.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ localH2O = h2o.init()
# Run PCR on Prostate Data
prostate.hex = h2o.importURL(localH2O, path = paste("https://raw.github.com",
"0xdata/h2o/master/smalldata/logreg/prostate.csv", sep = "/"), key = "prostate.hex")
"h2oai/h2o/master/smalldata/logreg/prostate.csv", sep = "/"), key = "prostate.hex")
h2o.pcr(x = c("AGE","RACE","PSA","DCAPS"), y = "CAPSULE", data = prostate.hex, family = "binomial",
nfolds = 0, alpha = 0.5, ncomp = 2)
}
Expand Down
4 changes: 2 additions & 2 deletions R/h2o-package/man/h2o.randomForest.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -106,10 +106,10 @@ h2o.randomForest(y = 5, x = c(2,3,4), data = iris.hex, ntree = 50, depth = 100)
\dontrun{
# RF variable importance
# Also see:
# https://github.com/0xdata/h2o/blob/master/R/tests/testdir_demos/runit_demo_VI_all_algos.R
# https://github.com/h2oai/h2o/blob/master/R/tests/testdir_demos/runit_demo_VI_all_algos.R
data.hex = h2o.importFile(
localH2O,
path = "https://raw.github.com/0xdata/h2o/master/smalldata/bank-additional-full.csv",
path = "https://raw.github.com/h2oai/h2o/master/smalldata/bank-additional-full.csv",
key = "data.hex")
myX = 1:20
myY="y"
Expand Down
2 changes: 1 addition & 1 deletion R/h2o-package/man/h2o.saveAll.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ Returns paths of model objects saved.
library(h2o)
localH2O = h2o.init()
prostate.hex = h2o.importFile(localH2O, path = paste("https://raw.github.com",
"0xdata/h2o/master/smalldata/logreg/prostate.csv", sep = "/"), key = "prostate.hex")
"h2oai/h2o/master/smalldata/logreg/prostate.csv", sep = "/"), key = "prostate.hex")
prostate.glm = h2o.glm(y = "CAPSULE", x = c("AGE","RACE","PSA","DCAPS"),
data = prostate.hex, family = "binomial", nfolds = 10, alpha = 0.5)
prostate.gbm = h2o.gbm(y = "CAPSULE", x = c("AGE","RACE","PSA","DCAPS"), n.trees=3,
Expand Down
2 changes: 1 addition & 1 deletion R/h2o-package/man/h2o.saveModel.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ Returns path of model object saved.
library(h2o)
localH2O = h2o.init()
prostate.hex = h2o.importFile(localH2O, path = paste("https://raw.github.com",
"0xdata/h2o/master/smalldata/logreg/prostate.csv", sep = "/"), key = "prostate.hex")
"h2oai/h2o/master/smalldata/logreg/prostate.csv", sep = "/"), key = "prostate.hex")
prostate.glm = h2o.glm(y = "CAPSULE", x = c("AGE","RACE","PSA","DCAPS"),
data = prostate.hex, family = "binomial", nfolds = 10, alpha = 0.5)
h2o.saveModel(object = prostate.glm, dir = "/Users/UserName/Desktop", save_cv = TRUE, force = TRUE)
Expand Down
1 change: 1 addition & 0 deletions packaging/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -358,6 +358,7 @@ <h2>Run H<sub>2</sub>O on Hadoop in just 3 steps.</h2>
<div id="documentation">
<h2>Documentation</h2>
<ul>
<li><a id="ddocrecent" href="https://github.com/h2oai/h2o/blob/master/CHANGES.md">Recent Changes</a></li>
<li><a id="ddocall" href="docs-website/index.html">Full Documentation</a></li>
<li><a id="ddochadoop" href="docs-website/deployment/hadoop_tutorial.html">Hadoop Documentation</a></li>
<li><a id="ddocr" href="docs-website/bits/h2o_package.pdf">R User Documentation</a></li>
Expand Down
2 changes: 2 additions & 0 deletions py/h2o_methods.py
Original file line number Diff line number Diff line change
Expand Up @@ -602,6 +602,7 @@ def create_frame(self, timeoutSecs=120, **kwargs):
'randomize': None,
'value': None,
'real_range': None,
'binary_fraction': None,
'categorical_fraction': None,
'factors': None,
'integer_fraction': None,
Expand All @@ -610,6 +611,7 @@ def create_frame(self, timeoutSecs=120, **kwargs):
'binary_ones_fraction': None,
'missing_fraction': None,
'response_factors': None,
'has_response': None,
}
browseAlso = kwargs.pop('browseAlso', False)
check_params_update_kwargs(params_dict, kwargs, 'create_frame', print_params=True)
Expand Down
35 changes: 20 additions & 15 deletions py/testdir_multi_jvm/test_KMeans_create_frame_fvec.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ def test_KMeans_create_frame_fvec(self):
'cols': 10
}
h2o_util.pickRandParams(cfParamDict, params)
b = params.get('binary_fraction', None)
i = params.get('integer_fraction', None)
c = params.get('categorical_fraction', None)
r = params.get('randomize', None)
Expand All @@ -67,21 +68,25 @@ def test_KMeans_create_frame_fvec(self):
# h2o does some strict checking on the combinations of these things
# fractions have to add up to <= 1 and only be used if randomize
# h2o default randomize=1?
if r:
if not i:
i = 0
if not c:
c = 0
if (i and c) and (i + c) >= 1.0:
c = 1.0 - i
params['integer_fraction'] = i
params['categorical_fraction'] = c
params['value'] = None

else:
params['randomize'] = 0
params['integer_fraction'] = 0
params['categorical_fraction'] = 0
if not b:
b = 0
if not i:
i = 0
if not c:
c = 0

# force a good combo, by decreasing t2o a little at a time
while (i + b + c) > 1.0:
print "Trying to find a good mix of fractional", b, i, c
b = max(0, b - 0.13)
i = max(0, i - 0.17)
# what's left
c = 1.0 - (i + b)

params['binary_fraction'] = b
params['integer_fraction'] = i
params['categorical_fraction'] = c
params['value'] = None


kwargs = params.copy()
Expand Down

0 comments on commit 644a1a2

Please sign in to comment.