From 43dfc84f883822ea27b6e312d4353bf301c2e7ef Mon Sep 17 00:00:00 2001
From: Xiangrui Meng <meng@databricks.com>
Date: Wed, 27 Aug 2014 01:19:48 -0700
Subject: [PATCH] [SPARK-2830][MLLIB] doc update for 1.1

1. renamed mllib-basics to mllib-data-types
1. renamed mllib-stats to mllib-statistics
1. moved random data generation to the bottom of mllib-stats
1. updated toc accordingly

atalwalkar

Author: Xiangrui Meng <meng@databricks.com>

Closes #2151 from mengxr/mllib-doc-1.1 and squashes the following commits:

0bd79f3 [Xiangrui Meng] add mllib-data-types
b64a5d7 [Xiangrui Meng] update the content list of basis statistics in mllib-guide
f625cc2 [Xiangrui Meng] move mllib-basics to mllib-data-types
4d69250 [Xiangrui Meng] move random data generation to the bottom of statistics
e64f3ce [Xiangrui Meng] move mllib-stats.md to mllib-statistics.md
---
 docs/{mllib-basics.md => mllib-data-types.md} |   4 +-
 docs/mllib-dimensionality-reduction.md        |   4 +-
 docs/mllib-guide.md                           |   9 +-
 docs/{mllib-stats.md => mllib-statistics.md}  | 156 +++++++++---------
 4 files changed, 87 insertions(+), 86 deletions(-)
 rename docs/{mllib-basics.md => mllib-data-types.md} (99%)
 rename docs/{mllib-stats.md => mllib-statistics.md} (99%)
diff --git a/docs/mllib-basics.md b/docs/mllib-data-types.md
similarity index 99%
rename from docs/mllib-basics.md
rename to docs/mllib-data-types.md
index 8752df412950a..101dc2f8695f3 100644
--- a/docs/mllib-basics.md
+++ b/docs/mllib-data-types.md
@@ -1,7 +1,7 @@
 ---
 layout: global
-title: Basics - MLlib
-displayTitle: <a href="mllib-guide.html">MLlib</a> - Basics
+title: Data Types - MLlib
+displayTitle: <a href="mllib-guide.html">MLlib</a> - Data Types
 ---
 
 * Table of contents
diff --git a/docs/mllib-dimensionality-reduction.md b/docs/mllib-dimensionality-reduction.md
index 9f2cf6d48ec75..21cb35b4270ca 100644
--- a/docs/mllib-dimensionality-reduction.md
+++ b/docs/mllib-dimensionality-reduction.md
@@ -11,7 +11,7 @@ displayTitle: <a href="mllib-guide.html">MLlib</a> - Dimensionality Reduction
 of reducing the number of variables under consideration.
 It can be used to extract latent features from raw and noisy features
 or compress data while maintaining the structure.
-MLlib provides support for dimensionality reduction on the <a href="mllib-basics.html#rowmatrix">RowMatrix</a> class.
+MLlib provides support for dimensionality reduction on the <a href="mllib-data-types.html#rowmatrix">RowMatrix</a> class.
 
 ## Singular value decomposition (SVD)
 
@@ -58,7 +58,7 @@ passes, $O(n)$ storage on each executor, and $O(n k)$ storage on the driver.
 ### SVD Example
  
 MLlib provides SVD functionality to row-oriented matrices, provided in the
-<a href="mllib-basics.html#rowmatrix">RowMatrix</a> class. 
+<a href="mllib-data-types.html#rowmatrix">RowMatrix</a> class. 
 
 <div class="codetabs">
 <div data-lang="scala" markdown="1">
diff --git a/docs/mllib-guide.md b/docs/mllib-guide.md
index 4d4198b9e0452..d3a510b3c17c6 100644
--- a/docs/mllib-guide.md
+++ b/docs/mllib-guide.md
@@ -7,12 +7,13 @@ MLlib is Spark's scalable machine learning library consisting of common learning
 including classification, regression, clustering, collaborative
 filtering, dimensionality reduction, as well as underlying optimization primitives, as outlined below:
 
-* [Data types](mllib-basics.html)
-* [Basic statistics](mllib-stats.html)
-  * random data generation  
-  * stratified sampling
+* [Data types](mllib-data-types.html)
+* [Basic statistics](mllib-statistics.html)
   * summary statistics
+  * correlations
+  * stratified sampling
   * hypothesis testing
+  * random data generation  
 * [Classification and regression](mllib-classification-regression.html)
   * [linear models (SVMs, logistic regression, linear regression)](mllib-linear-methods.html)
   * [decision trees](mllib-decision-tree.html)
diff --git a/docs/mllib-stats.md b/docs/mllib-statistics.md
similarity index 99%
rename from docs/mllib-stats.md
rename to docs/mllib-statistics.md
index 511a9fbf710cc..c4632413991f1 100644
--- a/docs/mllib-stats.md
+++ b/docs/mllib-statistics.md
@@ -1,7 +1,7 @@
 ---
 layout: global
-title: Statistics Functionality - MLlib
-displayTitle: <a href="mllib-guide.html">MLlib</a> - Statistics Functionality 
+title: Basic Statistics - MLlib
+displayTitle: <a href="mllib-guide.html">MLlib</a> - Basic Statistics 
 ---
 
 * Table of contents
@@ -25,7 +25,7 @@ displayTitle: <a href="mllib-guide.html">MLlib</a> - Statistics Functionality
 \newcommand{\zero}{\mathbf{0}}
 \]`
 
-## Summary Statistics 
+## Summary statistics 
 
 We provide column summary statistics for `RDD[Vector]` through the function `colStats` 
 available in `Statistics`.
@@ -104,81 +104,7 @@ print summary.numNonzeros()
 
 </div>
 
-## Random data generation
-
-Random data generation is useful for randomized algorithms, prototyping, and performance testing.
-MLlib supports generating random RDDs with i.i.d. values drawn from a given distribution:
-uniform, standard normal, or Poisson.
-
-<div class="codetabs">
-<div data-lang="scala" markdown="1">
-[`RandomRDDs`](api/scala/index.html#org.apache.spark.mllib.random.RandomRDDs) provides factory
-methods to generate random double RDDs or vector RDDs.
-The following example generates a random double RDD, whose values follows the standard normal
-distribution `N(0, 1)`, and then map it to `N(1, 4)`.
-
-{% highlight scala %}
-import org.apache.spark.SparkContext
-import org.apache.spark.mllib.random.RandomRDDs._
-
-val sc: SparkContext = ...
-
-// Generate a random double RDD that contains 1 million i.i.d. values drawn from the
-// standard normal distribution `N(0, 1)`, evenly distributed in 10 partitions.
-val u = normalRDD(sc, 1000000L, 10)
-// Apply a transform to get a random double RDD following `N(1, 4)`.
-val v = u.map(x => 1.0 + 2.0 * x)
-{% endhighlight %}
-</div>
-
-<div data-lang="java" markdown="1">
-[`RandomRDDs`](api/java/index.html#org.apache.spark.mllib.random.RandomRDDs) provides factory
-methods to generate random double RDDs or vector RDDs.
-The following example generates a random double RDD, whose values follows the standard normal
-distribution `N(0, 1)`, and then map it to `N(1, 4)`.
-
-{% highlight java %}
-import org.apache.spark.SparkContext;
-import org.apache.spark.api.JavaDoubleRDD;
-import static org.apache.spark.mllib.random.RandomRDDs.*;
-
-JavaSparkContext jsc = ...
-
-// Generate a random double RDD that contains 1 million i.i.d. values drawn from the
-// standard normal distribution `N(0, 1)`, evenly distributed in 10 partitions.
-JavaDoubleRDD u = normalJavaRDD(jsc, 1000000L, 10);
-// Apply a transform to get a random double RDD following `N(1, 4)`.
-JavaDoubleRDD v = u.map(
-  new Function<Double, Double>() {
-    public Double call(Double x) {
-      return 1.0 + 2.0 * x;
-    }
-  });
-{% endhighlight %}
-</div>
-
-<div data-lang="python" markdown="1">
-[`RandomRDDs`](api/python/pyspark.mllib.random.RandomRDDs-class.html) provides factory
-methods to generate random double RDDs or vector RDDs.
-The following example generates a random double RDD, whose values follows the standard normal
-distribution `N(0, 1)`, and then map it to `N(1, 4)`.
-
-{% highlight python %}
-from pyspark.mllib.random import RandomRDDs
-
-sc = ... # SparkContext
-
-# Generate a random double RDD that contains 1 million i.i.d. values drawn from the
-# standard normal distribution `N(0, 1)`, evenly distributed in 10 partitions.
-u = RandomRDDs.uniformRDD(sc, 1000000L, 10)
-# Apply a transform to get a random double RDD following `N(1, 4)`.
-v = u.map(lambda x: 1.0 + 2.0 * x)
-{% endhighlight %}
-</div>
-
-</div>
-
-## Correlations calculation
+## Correlations
 
 Calculating the correlation between two series of data is a common operation in Statistics. In MLlib
 we provide the flexibility to calculate pairwise correlations among many series. The supported 
@@ -455,3 +381,77 @@ for (ChiSqTestResult result : featureTestResults) {
 </div>
 
 </div>
+
+## Random data generation
+
+Random data generation is useful for randomized algorithms, prototyping, and performance testing.
+MLlib supports generating random RDDs with i.i.d. values drawn from a given distribution:
+uniform, standard normal, or Poisson.
+
+<div class="codetabs">
+<div data-lang="scala" markdown="1">
+[`RandomRDDs`](api/scala/index.html#org.apache.spark.mllib.random.RandomRDDs) provides factory
+methods to generate random double RDDs or vector RDDs.
+The following example generates a random double RDD, whose values follows the standard normal
+distribution `N(0, 1)`, and then map it to `N(1, 4)`.
+
+{% highlight scala %}
+import org.apache.spark.SparkContext
+import org.apache.spark.mllib.random.RandomRDDs._
+
+val sc: SparkContext = ...
+
+// Generate a random double RDD that contains 1 million i.i.d. values drawn from the
+// standard normal distribution `N(0, 1)`, evenly distributed in 10 partitions.
+val u = normalRDD(sc, 1000000L, 10)
+// Apply a transform to get a random double RDD following `N(1, 4)`.
+val v = u.map(x => 1.0 + 2.0 * x)
+{% endhighlight %}
+</div>
+
+<div data-lang="java" markdown="1">
+[`RandomRDDs`](api/java/index.html#org.apache.spark.mllib.random.RandomRDDs) provides factory
+methods to generate random double RDDs or vector RDDs.
+The following example generates a random double RDD, whose values follows the standard normal
+distribution `N(0, 1)`, and then map it to `N(1, 4)`.
+
+{% highlight java %}
+import org.apache.spark.SparkContext;
+import org.apache.spark.api.JavaDoubleRDD;
+import static org.apache.spark.mllib.random.RandomRDDs.*;
+
+JavaSparkContext jsc = ...
+
+// Generate a random double RDD that contains 1 million i.i.d. values drawn from the
+// standard normal distribution `N(0, 1)`, evenly distributed in 10 partitions.
+JavaDoubleRDD u = normalJavaRDD(jsc, 1000000L, 10);
+// Apply a transform to get a random double RDD following `N(1, 4)`.
+JavaDoubleRDD v = u.map(
+  new Function<Double, Double>() {
+    public Double call(Double x) {
+      return 1.0 + 2.0 * x;
+    }
+  });
+{% endhighlight %}
+</div>
+
+<div data-lang="python" markdown="1">
+[`RandomRDDs`](api/python/pyspark.mllib.random.RandomRDDs-class.html) provides factory
+methods to generate random double RDDs or vector RDDs.
+The following example generates a random double RDD, whose values follows the standard normal
+distribution `N(0, 1)`, and then map it to `N(1, 4)`.
+
+{% highlight python %}
+from pyspark.mllib.random import RandomRDDs
+
+sc = ... # SparkContext
+
+# Generate a random double RDD that contains 1 million i.i.d. values drawn from the
+# standard normal distribution `N(0, 1)`, evenly distributed in 10 partitions.
+u = RandomRDDs.uniformRDD(sc, 1000000L, 10)
+# Apply a transform to get a random double RDD following `N(1, 4)`.
+v = u.map(lambda x: 1.0 + 2.0 * x)
+{% endhighlight %}
+</div>
+
+</div>