Skip to content

Local Data Science & Data Engineering Stack. This project is intended to create a flexible data science stack ready to use locally with open source technologies

Notifications You must be signed in to change notification settings

malganis35/local-de-stack

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Local Data Science & Engineering Stack

This repo was created in conjunction with a blog post @ kengbailey.com detailing my local data engineering setup.

This repo is intended to create a ready production data and machine learning stack. This stack is composed of:

  • Minio: S3 Storage
  • PostgreSQL: Structured SQL Database
  • MongoDB: Semi-structured Data
  • DBeaver/CloudBeaver: UI to read Database
  • Airflow: orchestrator
  • MLFlow: experiment tracking
  • Homer: Homepage of the Stack
  • JupyterHub: Exploration in jupyter notebook
  • VSCode Server: Online IDE
  • RStudio Server: Online IDE
  • Grafana: Monitoring and visualization of data and model drift
  • Superset: Data Visualization & BI stack
  • LabelStudio: Asset to labels data

How to install Docker?

Install Docker Desktop (mac, windows, linux)

Install Docker commandline (linux)

How to install Docker Compose?

Install Docker Compose

Commands to start and stop services?

  1. Setup all containers:
make start-all
  1. Open the Homepage
make run
  1. Close all services
make stop-all

Commands to start and stop services?

Run and view logs

docker compose -f postgres-compose.yaml up 

Run in detached mode

docker compose -f postgres-compose.yaml up -d

Stop

docker compose -f postgres-compose.yaml down

Important notes

To connect Grafana and Cloudbeaver to PostgreSQL server, use this information:

host=postgres
username= *see .env file*
password= *see .env file*
database=postgres

If you use DBeaver on your Windows with all the stack setuped on your WSL, use:

host=localhost

About

Local Data Science & Data Engineering Stack. This project is intended to create a flexible data science stack ready to use locally with open source technologies

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Makefile 57.9%
  • Shell 33.9%
  • Python 8.2%