Skip to content

Commit

Permalink
Merge pull request apache#430 from pwendell/pyspark-guide
Browse files Browse the repository at this point in the history
Minor improvements to PySpark docs
  • Loading branch information
mateiz committed Jan 30, 2013
2 parents d12330b + 3f945e3 commit 55327a2
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 2 deletions.
11 changes: 9 additions & 2 deletions docs/python-programming-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,13 +67,20 @@ The script automatically adds the `pyspark` package to the `PYTHONPATH`.

# Interactive Use

The `pyspark` script launches a Python interpreter that is configured to run PySpark jobs.
When run without any input files, `pyspark` launches a shell that can be used explore data interactively, which is a simple way to learn the API:
The `pyspark` script launches a Python interpreter that is configured to run PySpark jobs. To use `pyspark` interactively, first build Spark, then launch it directly from the command line without any options:

{% highlight bash %}
$ sbt/sbt package
$ ./pyspark
{% endhighlight %}

The Python shell can be used explore data interactively and is a simple way to learn the API:

{% highlight python %}
>>> words = sc.textFile("/usr/share/dict/words")
>>> words.filter(lambda w: w.startswith("spar")).take(5)
[u'spar', u'sparable', u'sparada', u'sparadrap', u'sparagrass']
>>> help(pyspark) # Show all pyspark functions
{% endhighlight %}

By default, the `pyspark` shell creates SparkContext that runs jobs locally.
Expand Down
1 change: 1 addition & 0 deletions python/pyspark/shell.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
This file is designed to be launched as a PYTHONSTARTUP script.
"""
import os
import pyspark
from pyspark.context import SparkContext


Expand Down

0 comments on commit 55327a2

Please sign in to comment.