added lots of stuff/ data options are great now/ users like more docs

silicosapiens · May 8, 2014 · 8d53f5e · 8d53f5e
1 parent d9b6037
commit 8d53f5e
Show file tree

Hide file tree

Showing 15 changed files with 354 additions and 116 deletions.
diff --git a/h2o-docs/source/userguide/data.rst b/h2o-docs/source/userguide/data.rst
@@ -3,120 +3,19 @@
 Data
 =====
 
-Ingesting Data
----------------
+.. toctree::
+   :maxdepth: 1
 
-Ingesting data is the process of moving data from outside of H\ :sub:`2`\ O into
-the running instance of H\ :sub:`2`\ O. To ingest data start from the drop down
-menu **Data**, and select the appropriate option. Options and their uses are described below. 
+   inspect
+   dataviewall
+   datasummary
+   dataparse
+   datainspect
+   dataimportfiles
+   dataexportfiles
+   dataquantiles
+   datauploadfiles
+   quantiles
 
- **Import Files:**
 
-   In the path field specify an absolute path to the
-   file. For example: Users/UserName/Work/dataset.csv. Press submit. 
-
-   On the resulting screen the specified path will appear as a
-   highlighted link. Clicking on the path automatically parses the 
-   data. 
-
- **Import URL:** 
-
-   Copy the URL where the raw data are displayed into the URL
-   field. Users may wish to specify a Key; one is usually assigned
-   using the original file name. In this case the URL will become part
-   of the .hex, unless Key is otherwise specified.  For example, 
-   original data can be found at: 
-   http://archive.ics.uci.edu/ml/machine-learning-databases/internet_ads/ad.data
-
-   Once the data are imported, users will be automatically sent to the
-   Import URL page, where they can click on the KEY.  This automatically
-   goes to the Inspect page. Users should not be worried at this point
-   if data do not look as expected. This will be corrected when data are
-   parsed.  
-
- **Import S3:** 
-
-   In the field marked Bucket give the path to an existing AWS bucket
-   where data are stored. 
-
- **Upload:**
-
-    Click on the **Select File** button. A menu of files on the 
-    computer or working directory will appear. Select the appropriate
-    file, and click on **Choose.** When returned to the H\ :sub:`2`\ O
-    screen press **Upload.**
-
-
-
-Parsing Data
-------------
-
-Once data are ingested, they are available to H\ :sub:`2`\ O, but are
-not yet in a format that H\ :sub:`2`\ O can process. Converting the data to 
-an H\ :sub:`2`\ O usable format is called parsing. 
-
-After ingestion users are directed to a **Request Parse** screen. To
-parse data users can leave most options in default. For example, H\ :sub:`2`\ O
-automatically determines separators in data sets. For most data
-formats users will be automatically redirected to a page to request
-parse, where they can simply press submit. Exceptions to this are
-noted below. Once data are parsed a .hex key is displayed for the
-user. This .hex key will be used to refer to the data set in all H\ :sub:`2`\ O
-analysis, and should be noted. It can also be found at a later time
-through the Admin menu by selecting Jobs, or through the **Data**
-menu, by choosing **View All.** 
-
- **Import URL:**
-
-   Click on "Parse into .hex format" displayed at the top of
-   the inspect page after data are inhaled. Import URL takes users
-   directly to parse. 
-
- **Parser Behavior**
-
-   The data type in each column must be consistent. For example, when
-   data are alpha-coded categorical, all entries must be alpha or
-   alpha numeric. If numeric entries are detected by the parser, the
-   column will not be processed. It will register all entries as
-   NA. This is also true when NA entries are included in columns
-   consisting of numeric data. Columns of alpha coded categorical
-   variables containing NA entries will register NA as a distinct
-   factor level. When missing data are coded as periods or dots in the
-   original data set those entries are converted to zero.
-
-
-Other Data Capabilities
------------------------
-
-Each of the following actions can be found in the Data drop down
-menu. 
-
- **Inspect:**
-
-    Used to view a inhaled or parsed data set. Select Inspect
-    from the drop down menu Data. In Key enter the key or .hex key 
-    associated with the desired data. 
-
- **View All:** 
-
-   Used to view all data sets that have been inhaled or
-   parsed into H\ :sub:`2`\ O. To remove a dataset from H\ :sub:`2`\ O
-   click on the red X next to the data set key.  
-
- **Summary:** 
-
-   Used to display descriptive statistics and histograms of
-   any columns within a specific data set. Specify data by the
-   associated .hex key in the Key field, and select variables of
-   interest from the resulting list of variables. Summary can be found
-   under the **Model** drop down menu.
-
-
-
-
-Data Manipulation
-------------------
-
-Users who wish to manipulate their data after it has been parsed into
-H\ :sub:`2`\ O have a set of tools to do via  H\ :sub:`2`\ O + R. 
 
diff --git a/h2o-docs/source/userguide/dataexportfiles.rst b/h2o-docs/source/userguide/dataexportfiles.rst
@@ -0,0 +1,21 @@
+.. _DataExport:
+
+Data: Export Files
+====================
+
+Data files can be exported to S3, HDFS or NFS. 
+
+**Src key** 
+
+  The key associated with the data to be exported. 
+
+
+**Path**
+
+  The file path to S3, HDFS, NFS, or URL where the data are to be
+  exported to. 
+
+**Force** 
+
+  A checkbox option that, when checked, will overwrite existing
+  files. 
diff --git a/h2o-docs/source/userguide/dataimportfiles.rst b/h2o-docs/source/userguide/dataimportfiles.rst
@@ -0,0 +1,19 @@
+
+
+Data: Import Files
+====================
+
+In the path field specify an absolute path to the
+file. For example: Users/UserName/Work/dataset.csv. Press submit. 
+
+On the resulting screen the specified path will appear as a
+highlighted link. Clicking on the path automatically parses the 
+data. 
+
+Import files also enables users to import data from S3 and URL.
+
+
+**Path**
+
+  The file path to S3, HDFS, NFS, or URL where the data are to be
+  imported from. 
diff --git a/h2o-docs/source/userguide/datainspect.rst b/h2o-docs/source/userguide/datainspect.rst
@@ -0,0 +1,16 @@
+
+
+Data: Inspect Request
+=======================
+
+
+**Src Key**
+
+  The source key for the parsed data (with keys usually ending in
+  .hex).
+
+Once source key has been specified, an inspect table displaying parsed
+data is returned to the user. Basic summary information is given at
+the top, as are click button options to specify columns within data as a 
+factor or numeric. For more information visit :ref:`InspectReturn`.
+
diff --git a/h2o-docs/source/userguide/dataparse.rst b/h2o-docs/source/userguide/dataparse.rst
@@ -0,0 +1,61 @@
+.. _DataParse:
+
+Data: Parse
+====================
+
+Once data are ingested, they are available to H\ :sub:`2`\ O, but are
+not yet in a format that H\ :sub:`2`\ O can process. Converting the data to 
+an H\ :sub:`2`\ O usable format is called parsing. 
+
+Parser Behavior
+------------------
+
+The data type in each column must be consistent. For example, when
+data are alpha-coded categorical, all entries must be alpha or
+alpha numeric. If numeric entries are detected by the parser, the
+column will not be processed. It will register all entries as
+NA. This is also true when NA entries are included in columns
+consisting of numeric data. Columns of alpha coded categorical
+variables containing NA entries will register NA as a distinct
+factor level. When missing data are coded as periods or dots in the
+original data set those entries are converted to zero.
+
+**In general options can be left in default and the parser just works.**
+
+**Parser Type** 
+  Drop down menu allows users to specify whether data are formatted as
+  CSV, XLS, or SVMlight. This option is best left in default - the
+  parser recognizes data formats with rare exception. 
+
+**Separator**
+  A list of common separators is given, however, this option is best
+  left in default. 
+
+**Header**
+  Checkbox to be checked if the first line of the file being parsed is
+  a header (includes column names or indices). 
+
+**Header From File**
+  Specify a file key if the header for the data to be parsed is found
+  in another file that has already been imported to H2O.
+
+**Exclude**
+  A comma separated list of columns to be omitted from parse. 
+
+**Source Key** 
+  The file key associated with the imported data to be parsed.
+
+**Destination Key**
+  An optional user specified name for the parsed data to be referenced
+  later in modeling. If left in default a destination key will
+  automatically be assigned to be "original file name.hex". 
+
+**Preview**
+  Auto-generated preview of parsed data. 
+
+**Delete on done**
+  A checkbox indicating whether imported data should be deleted when
+  parsed. In general, this option is recommended, as retaining data will take
+  memory resources, but not aid in modeling because unparsed data
+  can't be acted on by H2O. 
+
diff --git a/h2o-docs/source/userguide/dataquantiles.rst b/h2o-docs/source/userguide/dataquantiles.rst
@@ -0,0 +1,48 @@
+.. _DataQuantiles:
+
+Data: Quantiles (Request)
+==========================
+
+**Source Key** 
+
+  The key associated with the data set of interest. 
+
+**Column**
+
+  The column of interest. 
+
+**Quantile**
+
+  A value bounded on the interval (0,1), where X is the value below
+  which X as a percentage of the data fall. For instance if the
+  quantile .25 is requested, the value returned will be the value
+  within the range of the column of data below which 25% of the data
+  fall.
+
+**Max Qbins** 
+
+  The number of bins into which the column should be split before the
+  quantile is calculated. As the number of bins approaches the number
+  of observations the approximate solution approaches the exact
+  solution. 
+
+**Multiple Pass**
+
+  Only 3 possible entries: 
+  *0*: Calculate the best approximation of the requested quanitle in
+  one pass. 
+  *1*: Return the exact result (with a maximum iteration of 16 passes)
+  *2*: Return both a single pass approximation and multi-pass exact
+  answer. 
+
+**Interpolation Type**
+
+  When the quantile falls between two in-data values, it is necessary
+  to interpolate the true value of the quantile. This can be done by
+  mean interpolation, or linear interpolation. 
+
+  *2*: Mean interpolation
+  *7*: Linear interpolation
+
+
+
diff --git a/h2o-docs/source/userguide/datasummary.rst b/h2o-docs/source/userguide/datasummary.rst
@@ -0,0 +1,31 @@
+
+
+Data: Summary (Request)
+==========================
+
+Summary returns a column by column detailed summary of parsed
+data. For more information on the returned information see
+:ref:`Summary`
+
+**Source**
+
+  The .hex key associated with the data to be summarized. 
+
+**Cols** 
+
+  If a subset of columns is desired, specify that subset
+  here. Default is to return a summary for all columns. 
+
+**Max Ncols**
+
+  The maximum number of columns to be summarized. 
+
+**Max Qbins**
+
+  The number of bins for quantiles. When large data are parsed, they
+  are also binned and distributed across a cluster. When data are
+  multimodal (or otherwise distinctly shaped), increasing the number
+  of bins will allocate fewer data points to each bin and thus
+  increase the accuracy of the quantiles returned. Increasing the
+  number of bins for extremely large data can slow results depending
+  on the memory allocated to computational tasks.   
diff --git a/h2o-docs/source/userguide/datauploadfiles.rst b/h2o-docs/source/userguide/datauploadfiles.rst
@@ -0,0 +1,9 @@
+
+
+Data: Upload Files
+==================== 
+
+Upload files enables users to upload data from their local computer
+or server. Click on *Select File* and an upload helper will appear to
+walk users through their file structure and find the data to be
+uploaded and parsed.  
diff --git a/h2o-docs/source/userguide/dataviewall.rst b/h2o-docs/source/userguide/dataviewall.rst
@@ -0,0 +1,9 @@
+
+
+Data: View All
+==================
+
+Users can view all keys and associated data by selecting the **View
+All** option from the **Data** drop down menu. Keys are listed in the
+far left column, and can be removed from the cluster by clicking on
+the large red X next to the key name. 
diff --git a/h2o-docs/source/userguide/general.rst b/h2o-docs/source/userguide/general.rst
@@ -24,7 +24,7 @@ of this documentation. Advanced users may find additional documentation on
 running in specialized environments helpful: :ref:`Developer`. 
 
 For multinode clusters utilizing several servers, it is strongly
-reccomended that all servers and nodes be symmetric and identically
+recommended that all servers and nodes be symmetric and identically
 configured. For example, allocating different amounts of memory to
 nodes in the same cluster can adversely impact performance.