Spark Structured Streaming

A short course on the new, experimental features by The Data Incubator and O'Reilly Strata. You can purchase the accompanying videos here on the O'Reilly website.

Installation

To run this tutorial, you need Apache Spark and Jupyter. You can install them:

Download and install Apache Spark 2.0.0 by following the instructions here. You may first have to install Hadoop.
Install Jupyter

pip install jupyter

Optional

To be able to run the interactive code cells, create a toree kernel:

jupyter toree install --spark_opts='--master=local[2] --executor-memory 4g --driver-memory 4g' \
    --kernel_name=apache_toree --interpreters=PySpark,SparkR,Scala,SQL --spark_home=$SPARK_HOME

Otherwise, you can copy and paste the cells into a spark shell, which you can start by running

make spark-shell

Starting the Course

To start the course, run

make notebook

and open the Overview.ipynb notebook. Note that you may be at a higher port number if 9000 is already in use.

If you want to play with Spark directly, you can also run

make spark-shell

Credits: The spark project template is based on https://github.com/nfo/spark-project-template

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
meetup		meetup
notebooks		notebooks
twitter		twitter
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spark Structured Streaming

Installation

Optional

Starting the Course

About

Releases

Packages

Languages

lalithakishore/spark-structured-streaming

Folders and files

Latest commit

History

Repository files navigation

Spark Structured Streaming

Installation

Optional

Starting the Course

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages