This repo contains example codes for practicing Spark.
I learnt from Learning Spark 2.0, which is a really good resource for starters.
The codes are contained in notebooks, covering:
- basic DataFrame manipulations
- read/write operations
- basic use of MLlib to predict the house price in California
- download the datasets:
./download.sh
- run notebooks in
/notebook