Starred repositories
Apache Spark - A unified analytics engine for large-scale data processing
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
State of the Art Natural Language Processing
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
Examples for High Performance Spark
My Insight Data Engineering Fellowship project. I implemented a big data processing pipeline based on lambda architecture, that aggregates Twitter and US stock market data for user sentiment anal…
This repository contains code for Spark Streaming