Skip to content

od3n/serverless-datalake-on-aws

 
 

Repository files navigation

Building Serverless Data Lakes on AWS

Forked from Author: Unni Pillai

Architecture Diagram

Learning outcomes from this workshop?

  • Design serverless data lake architecture
  • Build a data processing pipeline and Data Lake using Amazon S3 for storing data
  • Use Amazon Kinesis for real-time streaming data
  • Use AWS Glue to automatically catalog datasets
  • Run interactive ETL scripts in an Amazon SageMaker Jupyter notebook connected to an AWS Glue development endpoint
  • Query data using Amazon Athena & visualize it using Amazon QuickSight

Pre-requisites:

  • You need to have access to an AWS account with AdminstratorAccess
  • This lab should be executed in us-east-1 region
  • Best is to follow links from this guide & open them in new a tab
  • Run this lab in a modern browser

Syllabus

Content Link
Lab 1: Ingest and Storage Open Lab ▶️
Lab 2: Glue Data Catalog Open Lab ▶️
Lab 3: Serverless Spark on Glue Open Lab ▶️
Lab 5: Visualize Data Open Lab ▶️

Clean Up

Failing to do this will result in incuring AWS usage charges.

Make sure you bring down / delete all resources created as part of this lab

Resources to delete

About

Forked from 2019 AWS summit workshop content

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%