Skip to content

Commit

Permalink
[SPARK-34041][PYTHON][DOCS] Miscellaneous cleanup for new PySpark doc…
Browse files Browse the repository at this point in the history
…umentation

### What changes were proposed in this pull request?

This PR proposes to:
- Add a link of quick start in PySpark docs into "Programming Guides" in Spark main docs
- `ML` / `MLlib` -> `MLlib (DataFrame-based)` / `MLlib (RDD-based)` in API reference page
- Mention other user guides as well because the guide such as [ML](http://spark.apache.org/docs/latest/ml-guide.html) and [SQL](http://spark.apache.org/docs/latest/sql-programming-guide.html).
- Mention other migration guides as well because PySpark can get affected by it.

### Why are the changes needed?

For better documentation.

### Does this PR introduce _any_ user-facing change?

It fixes user-facing docs. However, it's not released out yet.

### How was this patch tested?

Manually tested by running:

```bash
cd docs
SKIP_SCALADOC=1 SKIP_RDOC=1 SKIP_SQLDOC=1 jekyll serve --watch
```

Closes apache#31082 from HyukjinKwon/SPARK-34041.

Authored-by: HyukjinKwon <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
  • Loading branch information
HyukjinKwon committed Jan 8, 2021
1 parent 7b06acc commit aa388cf
Show file tree
Hide file tree
Showing 7 changed files with 36 additions and 10 deletions.
1 change: 1 addition & 0 deletions docs/_layouts/global.html
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,7 @@
<a class="dropdown-item" href="ml-guide.html">MLlib (Machine Learning)</a>
<a class="dropdown-item" href="graphx-programming-guide.html">GraphX (Graph Processing)</a>
<a class="dropdown-item" href="sparkr.html">SparkR (R on Spark)</a>
<a class="dropdown-item" href="api/python/getting_started/index.html">PySpark (Python on Spark)</a>
</div>
</li>

Expand Down
2 changes: 2 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,8 @@ options for deployment:
* [Spark Streaming](streaming-programming-guide.html): processing data streams using DStreams (old API)
* [MLlib](ml-guide.html): applying machine learning algorithms
* [GraphX](graphx-programming-guide.html): processing graphs
* [SparkR](sparkr.html): processing data with Spark in R
* [PySpark](api/python/getting_started/index.html): processing data with Spark in Python

**API Docs:**

Expand Down
3 changes: 3 additions & 0 deletions python/docs/source/getting_started/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,9 @@ Getting Started
===============

This page summarizes the basic steps required to setup and get started with PySpark.
There are more guides shared with other languages such as
`Quick Start <http://spark.apache.org/docs/latest/quick-start.html>`_ in Programming Guides
at `the Spark documentation <http://spark.apache.org/docs/latest/index.html#where-to-go-from-here>`_.

.. toctree::
:maxdepth: 2
Expand Down
12 changes: 10 additions & 2 deletions python/docs/source/migration_guide/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,6 @@ Migration Guide
===============

This page describes the migration guide specific to PySpark.
Many items of other migration guides can also be applied when migrating PySpark to higher versions because PySpark internally shares other components.
Please also refer other migration guides such as `Migration Guide: SQL, Datasets and DataFrame <http://spark.apache.org/docs/latest/sql-migration-guide.html>`_.

.. toctree::
:maxdepth: 2
Expand All @@ -33,3 +31,13 @@ Please also refer other migration guides such as `Migration Guide: SQL, Datasets
pyspark_2.2_to_2.3
pyspark_1.4_to_1.5
pyspark_1.0_1.2_to_1.3


Many items of other migration guides can also be applied when migrating PySpark to higher versions because PySpark internally shares other components.
Please also refer other migration guides:

- `Migration Guide: Spark Core <http://spark.apache.org/docs/latest/core-migration-guide.html>`_
- `Migration Guide: SQL, Datasets and DataFrame <http://spark.apache.org/docs/latest/sql-migration-guide.html>`_
- `Migration Guide: Structured Streaming <http://spark.apache.org/docs/latest/ss-migration-guide.html>`_
- `Migration Guide: MLlib (Machine Learning) <http://spark.apache.org/docs/latest/ml-migration-guide.html>`_

12 changes: 6 additions & 6 deletions python/docs/source/reference/pyspark.ml.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,11 @@
under the License.
ML
==
MLlib (DataFrame-based)
=======================

ML Pipeline APIs
----------------
Pipeline APIs
-------------

.. currentmodule:: pyspark.ml

Expand Down Expand Up @@ -188,8 +188,8 @@ Clustering
PowerIterationClustering


ML Functions
----------------------------
Functions
---------

.. currentmodule:: pyspark.ml.functions

Expand Down
4 changes: 2 additions & 2 deletions python/docs/source/reference/pyspark.mllib.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,8 @@
under the License.
MLlib
=====
MLlib (RDD-based)
=================

Classification
--------------
Expand Down
12 changes: 12 additions & 0 deletions python/docs/source/user_guide/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,21 @@
User Guide
==========

This page is the guide for PySpark users which contains PySpark specific topics.

.. toctree::
:maxdepth: 2

arrow_pandas
python_packaging


There are more guides shared with other languages in Programming Guides
at `the Spark documentation <http://spark.apache.org/docs/latest/index.html#where-to-go-from-here>`_.

- `RDD Programming Guide <http://spark.apache.org/docs/latest/rdd-programming-guide.html>`_
- `Spark SQL, DataFrames and Datasets Guide <http://spark.apache.org/docs/latest/sql-programming-guide.html>`_
- `Structured Streaming Programming Guide <http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html>`_
- `Spark Streaming Programming Guide <http://spark.apache.org/docs/latest/streaming-programming-guide.html>`_
- `Machine Learning Library (MLlib) Guide <http://spark.apache.org/docs/latest/ml-guide.html>`_

0 comments on commit aa388cf

Please sign in to comment.