This project's main focus is complete automation of Data Engineering (DE) tool Installation and Configuration.
Instead of spending frustrating hours on tool installation and configuration, you can clone this repos and run the attached Ansible playbook. This repository lets you to be five commands away from writing ELT code with a modern Data Infrastructure tool stack.
This repository leverages Ansible's Declarative Infrastructure as Code (IaC) to install and configure DE tools.
The goal of this project is to automate deployment of production-ready Data Engineering tools on any EC2 instance.
Downloading the playbook to install Data Engineering tools on your server.
git clone https://github.com/angelddaz/de-devtools ~/de-devtools
# see ./Makefile for make
cd ~/de-devtools && make local
- Operating System: Ubuntu 18.04
- Languages
- Python 3
- PostgreSQL 10
- Open Source Software
- Airflow
- Spark (Work in Progress)
- Presto (Work in Progress)
- dbt (Work in Progress)
- Cloud
- AWS [S3, DynamoDB, Lambda] (Works in Progress)
All tools are open source or free tiers. Reference: https://free-for.dev/#/
Ubuntu:18.04 Operating System
apt dependencies for the ansible playbook:
sudo apt-get update && sudo apt-get install -y software-properties-common git make ansible
sudo apt-add-repository --yes --update ppa:ansible/ansible
PostgreSQL Database Object Conflicts: Make sure you do not have a local Postgres Database and Roles called airflow
.
- Make sure your PostgreSQL Service is running
sudo service postgresql start
- Build the downloaded and configured puckel docker image
cd ~/de-devtools/docker-airflow
docker build .
- Run a preconfigured container. Use the LocalExecutor if you're not advanced in airflow and are using a single server or computer.
# working directory: ~/de-devtools/docker-airflow
# choose local
docker-compose -f docker-compose-LocalExecutor.yml up -d
# or choose Celery
docker-compose -f docker-compose-CeleryExecutor.yml up -d
- (Optional) Ease of life config:
Make an alias in your
~/.bashrc
file for easier CLI usageairflow [subcommand]
:
alias airflow='docker run --rm -it puckel/docker-airflow airflow'
Building a docker image and running a docker container.
# Creates image
make
# run container on image
make container
Following Ansible Syntax best practices with built in linter
ansible-lint main.yml