forked from h2oai/h2o-2
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
added lots of stuff/ data options are great now/ users like more docs
- Loading branch information
1 parent
d9b6037
commit 8d53f5e
Showing
15 changed files
with
354 additions
and
116 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
.. _DataExport: | ||
|
||
Data: Export Files | ||
==================== | ||
|
||
Data files can be exported to S3, HDFS or NFS. | ||
|
||
**Src key** | ||
|
||
The key associated with the data to be exported. | ||
|
||
|
||
**Path** | ||
|
||
The file path to S3, HDFS, NFS, or URL where the data are to be | ||
exported to. | ||
|
||
**Force** | ||
|
||
A checkbox option that, when checked, will overwrite existing | ||
files. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
|
||
|
||
Data: Import Files | ||
==================== | ||
|
||
In the path field specify an absolute path to the | ||
file. For example: Users/UserName/Work/dataset.csv. Press submit. | ||
|
||
On the resulting screen the specified path will appear as a | ||
highlighted link. Clicking on the path automatically parses the | ||
data. | ||
|
||
Import files also enables users to import data from S3 and URL. | ||
|
||
|
||
**Path** | ||
|
||
The file path to S3, HDFS, NFS, or URL where the data are to be | ||
imported from. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
|
||
|
||
Data: Inspect Request | ||
======================= | ||
|
||
|
||
**Src Key** | ||
|
||
The source key for the parsed data (with keys usually ending in | ||
.hex). | ||
|
||
Once source key has been specified, an inspect table displaying parsed | ||
data is returned to the user. Basic summary information is given at | ||
the top, as are click button options to specify columns within data as a | ||
factor or numeric. For more information visit :ref:`InspectReturn`. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
.. _DataParse: | ||
|
||
Data: Parse | ||
==================== | ||
|
||
Once data are ingested, they are available to H\ :sub:`2`\ O, but are | ||
not yet in a format that H\ :sub:`2`\ O can process. Converting the data to | ||
an H\ :sub:`2`\ O usable format is called parsing. | ||
|
||
Parser Behavior | ||
------------------ | ||
|
||
The data type in each column must be consistent. For example, when | ||
data are alpha-coded categorical, all entries must be alpha or | ||
alpha numeric. If numeric entries are detected by the parser, the | ||
column will not be processed. It will register all entries as | ||
NA. This is also true when NA entries are included in columns | ||
consisting of numeric data. Columns of alpha coded categorical | ||
variables containing NA entries will register NA as a distinct | ||
factor level. When missing data are coded as periods or dots in the | ||
original data set those entries are converted to zero. | ||
|
||
**In general options can be left in default and the parser just works.** | ||
|
||
**Parser Type** | ||
Drop down menu allows users to specify whether data are formatted as | ||
CSV, XLS, or SVMlight. This option is best left in default - the | ||
parser recognizes data formats with rare exception. | ||
|
||
**Separator** | ||
A list of common separators is given, however, this option is best | ||
left in default. | ||
|
||
**Header** | ||
Checkbox to be checked if the first line of the file being parsed is | ||
a header (includes column names or indices). | ||
|
||
**Header From File** | ||
Specify a file key if the header for the data to be parsed is found | ||
in another file that has already been imported to H2O. | ||
|
||
**Exclude** | ||
A comma separated list of columns to be omitted from parse. | ||
|
||
**Source Key** | ||
The file key associated with the imported data to be parsed. | ||
|
||
**Destination Key** | ||
An optional user specified name for the parsed data to be referenced | ||
later in modeling. If left in default a destination key will | ||
automatically be assigned to be "original file name.hex". | ||
|
||
**Preview** | ||
Auto-generated preview of parsed data. | ||
|
||
**Delete on done** | ||
A checkbox indicating whether imported data should be deleted when | ||
parsed. In general, this option is recommended, as retaining data will take | ||
memory resources, but not aid in modeling because unparsed data | ||
can't be acted on by H2O. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
.. _DataQuantiles: | ||
|
||
Data: Quantiles (Request) | ||
========================== | ||
|
||
**Source Key** | ||
|
||
The key associated with the data set of interest. | ||
|
||
**Column** | ||
|
||
The column of interest. | ||
|
||
**Quantile** | ||
|
||
A value bounded on the interval (0,1), where X is the value below | ||
which X as a percentage of the data fall. For instance if the | ||
quantile .25 is requested, the value returned will be the value | ||
within the range of the column of data below which 25% of the data | ||
fall. | ||
|
||
**Max Qbins** | ||
|
||
The number of bins into which the column should be split before the | ||
quantile is calculated. As the number of bins approaches the number | ||
of observations the approximate solution approaches the exact | ||
solution. | ||
|
||
**Multiple Pass** | ||
|
||
Only 3 possible entries: | ||
*0*: Calculate the best approximation of the requested quanitle in | ||
one pass. | ||
*1*: Return the exact result (with a maximum iteration of 16 passes) | ||
*2*: Return both a single pass approximation and multi-pass exact | ||
answer. | ||
|
||
**Interpolation Type** | ||
|
||
When the quantile falls between two in-data values, it is necessary | ||
to interpolate the true value of the quantile. This can be done by | ||
mean interpolation, or linear interpolation. | ||
|
||
*2*: Mean interpolation | ||
*7*: Linear interpolation | ||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
|
||
|
||
Data: Summary (Request) | ||
========================== | ||
|
||
Summary returns a column by column detailed summary of parsed | ||
data. For more information on the returned information see | ||
:ref:`Summary` | ||
|
||
**Source** | ||
|
||
The .hex key associated with the data to be summarized. | ||
|
||
**Cols** | ||
|
||
If a subset of columns is desired, specify that subset | ||
here. Default is to return a summary for all columns. | ||
|
||
**Max Ncols** | ||
|
||
The maximum number of columns to be summarized. | ||
|
||
**Max Qbins** | ||
|
||
The number of bins for quantiles. When large data are parsed, they | ||
are also binned and distributed across a cluster. When data are | ||
multimodal (or otherwise distinctly shaped), increasing the number | ||
of bins will allocate fewer data points to each bin and thus | ||
increase the accuracy of the quantiles returned. Increasing the | ||
number of bins for extremely large data can slow results depending | ||
on the memory allocated to computational tasks. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
|
||
|
||
Data: Upload Files | ||
==================== | ||
|
||
Upload files enables users to upload data from their local computer | ||
or server. Click on *Select File* and an upload helper will appear to | ||
walk users through their file structure and find the data to be | ||
uploaded and parsed. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
|
||
|
||
Data: View All | ||
================== | ||
|
||
Users can view all keys and associated data by selecting the **View | ||
All** option from the **Data** drop down menu. Keys are listed in the | ||
far left column, and can be removed from the cluster by clicking on | ||
the large red X next to the key name. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.