RDDs, DataFrames and Datasets in Apache Spark

This repo contains the source for my 2016 Northeast Scala Symposium talk, RDDs, DataFrames and Datasets in Apache Spark, which I updated (a little) for Apache Spark 2.0 and gave again at a Philly Area Scala Enthusiasts (PHASE) Meetup in June, 2016 (http://www.meetup.com/scala-phase/events/229870987/).

Slides: You can see the actual deck, in action, here.

Video: The talk at the Northeast Scala Symposium was recorded. The video is here.

The Git tag nescala captures the code and presentation as given at the Northeast Scala Symposium.

The tag phase captures the code and presentation as given at the PHASE Meetup.

The presentation is in presentation. The demo notebooks are in demo, in runnable source form. Also in demo is a file called notebooks.dbc, which can be loaded directly into Databricks. Feel free to sign up for the free Databricks Community Edition and try them yourself.

The presentation is built with Reveal.js, augmented with some custom build code. To build the presentation, you can run rake from the top level.

The presentation will end up in dist/index.html.

Preparing to build the slides

Install NodeJS and npm.
Install the LESS preprocessor: npm install -g less
Install Bower: npm install -g bower
Run bower install locally.
Make sure you have a version of Ruby 2 installed. (This stuff has been tested with 2.2.3.)
Install Bundler: gem install bundler
Use Bundler to install the required Ruby gems: bundle install

Building the Slides

Once you've successfully completed preparation, building the slide deck is as simple as:

$ rake

Rake will build dist/index.html, a Reveal.js slide show. Just open the file in your browser, and away you go.

Installing the slide show

If you want to install the slide show somewhere (e.g., a web server), copy the entire dist directory (presumably renaming it).

Making PDFs

To create PDF versions of the slides, open the HTML slides in Chrome or Chromium. Then, tack ?print-pdf on the end of the URL, and print the result. See the Reveal.js documentation for details.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
demo		demo
presentation		presentation
.editorconfig		.editorconfig
.gitignore		.gitignore
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
LICENSE.md		LICENSE.md
README.md		README.md
Rakefile		Rakefile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RDDs, DataFrames and Datasets in Apache Spark

Preparing to build the slides

Building the Slides

Installing the slide show

Making PDFs

About

Releases

Packages

Languages

License

smoulem/rdds-dataframes-datasets-presentation-2016

Folders and files

Latest commit

History

Repository files navigation

RDDs, DataFrames and Datasets in Apache Spark

Preparing to build the slides

Building the Slides

Installing the slide show

Making PDFs

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages