The task is to build an analytical storage based on Vertica using Data Vault storage model. Data is kept on Amazon s3 service. Data pipeline should be done with a following sequence (s3 - localhost in Docker Container - Vertica STG - Vertica DDS) and implemented with Apache Airflow.
has been developed for storing raw data from the source
has been developed unparse and structure raw data
- Copy the repository to your local machine:
git clone https://github.com/{{ username }}/de-project-sprint-6.git
- Change to the project directory:
cd de-project-sprint-6
- Run docker-compose:
docker-compose up -d
- After the container starts, you will have access to:
- Airflow
localhost:3000/airflow
- Database
vertica connection secured
- Create a virtual environment
python3 -m venv venv
- Activation of the virtual environment:
source venv/bin/activate
- Update pip to latest version:
pip install --upgrade pip
- install vertical python in container
pip install vertica_python
- install vertical hooks in container
pip install apache-airflow-providers-vertica[common.sql]
/src/dags
/pics/
/data/