This repo was created in conjunction with a blog post @ kengbailey.com detailing my local data engineering setup.
This repo is intended to create a ready production data and machine learning stack. This stack is composed of:
- Minio: S3 Storage
- PostgreSQL: Structured SQL Database
- MongoDB: Semi-structured Data
- DBeaver/CloudBeaver: UI to read Database
- Airflow: orchestrator
- MLFlow: experiment tracking
- Homer: Homepage of the Stack
- JupyterHub: Exploration in jupyter notebook
- VSCode Server: Online IDE
- RStudio Server: Online IDE
- Grafana: Monitoring and visualization of data and model drift
- Superset: Data Visualization & BI stack
- LabelStudio: Asset to labels data
Install Docker Desktop (mac, windows, linux)
Install Docker commandline (linux)
- Setup all containers:
make start-all
- Open the Homepage
make run
- Close all services
make stop-all
Run and view logs
docker compose -f postgres-compose.yaml up
Run in detached mode
docker compose -f postgres-compose.yaml up -d
Stop
docker compose -f postgres-compose.yaml down
To connect Grafana and Cloudbeaver to PostgreSQL server, use this information:
host=postgres
username= *see .env file*
password= *see .env file*
database=postgres
If you use DBeaver on your Windows with all the stack setuped on your WSL, use:
host=localhost