A curated list of Apache Spark resources that developers may find useful. Focused on Apache Spark resources for different use cases. Ordered alphabetically in each category.
Inspired by the Awesome thing.
Apache Spark is a cluster computing platform designed to be fast and general purpose engine for large-scale data processing.
- Spark supports wide range of diverse workflows including Map Reduce, Machine Learning, Graph processing etc.
- Apache Spark makes use of RDD (Resilient Distributed Dataset) the basic abstraction in Spark.
- RDDs are immutable, partitioned collection of elements that can be operated on in parallel
- Consists of Rich Standard Library
- Spark consists of API in many programming languages supported - Scala, Java, Python, R consists of Unified development and deployment environment for all
- Regardless of which programming language you are good at, be it Scala, Java, Python or R, you can use the same single clustered runtime environment for prototyping
- Databricks Spark Reference Applications
- Databricks Spark Knowledge Base
- Getting Started with Apache Spark
- Mastering Apache Spark
- Advanced Distributed Machine Learning with Spark
- Advanced Spark for Data Science and Data Engineering
- Big Data Analysis with Spark
- Distributed Machine Learning with Spark
- Introduction to Spark
- Spark Fundamentals I
- Spark Fundamentals II
- Spark Mini Course
- Spark Overview
- Spark Programming with Python
- Spark Summit Training
- Data Scientists Guide
- Intro to Apache Spark
- Spark CLI - AmpCmp
- Spark Internals
- Spark RDD Examples
- Spark Resources
- Spark Tutorial
- Sparkhub
- Deep Spark
- Developer Resources
- FiloDB
- Spark Cookbook
- Spark Indexedrdd
- Spark OpenTSDB
- Spark Packages
- Spark Timeseries
- Sparkle
- Sparkling