This repo contains the source for my 2016 Northeast Scala Symposium talk, RDDs, DataFrames and Datasets in Apache Spark, which I updated (a little) for Apache Spark 2.0 and gave again at a Philly Area Scala Enthusiasts (PHASE) Meetup in June, 2016 (http://www.meetup.com/scala-phase/events/229870987/).
Slides: You can see the actual deck, in action, here.
Video: The talk at the Northeast Scala Symposium was recorded. The video is here.
The Git tag nescala
captures the code and presentation as given at
the Northeast Scala Symposium.
The tag phase
captures the code and presentation as given at the PHASE
Meetup.
The presentation is in presentation. The demo notebooks
are in demo, in runnable source form. Also in demo is a
file called notebooks.dbc
, which can be loaded directly into Databricks.
Feel free to sign up for the free
Databricks Community Edition and try them yourself.
The presentation is built with Reveal.js, augmented with some custom
build code. To build the presentation, you can run rake
from the top level.
The presentation will end up in dist/index.html
.
- Install NodeJS and
npm
. - Install the LESS preprocessor:
npm install -g less
- Install Bower:
npm install -g bower
- Run
bower install
locally. - Make sure you have a version of Ruby 2 installed. (This stuff has been tested with 2.2.3.)
- Install Bundler:
gem install bundler
- Use Bundler to install the required Ruby gems:
bundle install
Once you've successfully completed preparation, building the slide deck is as simple as:
$ rake
Rake will build dist/index.html
, a Reveal.js slide show. Just
open the file in your browser, and away you go.
If you want to install the slide show somewhere (e.g., a web server), copy
the entire dist
directory (presumably renaming it).
To create PDF versions of the slides, open the HTML slides in Chrome or
Chromium. Then, tack ?print-pdf
on the end of the URL, and print the result.
See the Reveal.js documentation for details.