Skip to content

lalithakishore/spark-structured-streaming

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

78 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spark Structured Streaming

A short course on the new, experimental features by The Data Incubator and O'Reilly Strata. You can purchase the accompanying videos here on the O'Reilly website.

Installation

To run this tutorial, you need Apache Spark and Jupyter. You can install them:

  1. Download and install Apache Spark 2.0.0 by following the instructions here. You may first have to install Hadoop.
  2. Install Jupyter
pip install jupyter

Optional

To be able to run the interactive code cells, create a toree kernel:

jupyter toree install --spark_opts='--master=local[2] --executor-memory 4g --driver-memory 4g' \
    --kernel_name=apache_toree --interpreters=PySpark,SparkR,Scala,SQL --spark_home=$SPARK_HOME

Otherwise, you can copy and paste the cells into a spark shell, which you can start by running

make spark-shell

Starting the Course

To start the course, run

make notebook

and open the Overview.ipynb notebook. Note that you may be at a higher port number if 9000 is already in use.

If you want to play with Spark directly, you can also run

make spark-shell

Credits: The spark project template is based on https://github.com/nfo/spark-project-template

About

A short course on the new, experimental features by The Data Incubator and O'Reilly Strata.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 83.5%
  • Scala 14.5%
  • Makefile 2.0%