workflow orchestration with Mage
1. download from mage-ai/mage-zoomcamp.
git clone https://github.com/mage-ai/mage-zoomcamp.git mage-project
rm -rf .git
and then remove magic-zoomcamp
in .gitignore
for git versioning control.
cp dev.env .env
cd mage-project
docker compose build
docker compose up
navigate to http://localhost:6789, you can see...
- "Help improve Mage": ask you permission to contribute usage statistics to help improve the developer.
- "example_pipeline": default example.
and it initialized a new mage repository. It will be present in your project under the name magic-zoomcamp
. If you changed the varable PROJECT_NAME
in the .env
file, it will be named whatever you set it to.
This repository should have the following structure:
.
├── mage_data
│ └── magic-zoomcamp
├── magic-zoomcamp
│ ├── __pycache__
│ ├── charts
│ ├── custom
│ ├── data_exporters
│ ├── data_loaders
│ ├── dbt
│ ├── extensions
│ ├── interactions
│ ├── pipelines
│ ├── scratchpads
│ ├── transformers
│ ├── utils
│ ├── __init__.py
│ ├── io_config.yaml
│ ├── metadata.yaml
│ └── requirements.txt
├── Dockerfile
├── README.md
├── dev.env
├── .env
├── docker-compose.yml
└── requirements.txt
If you not use postgres-related part in docker-compose.yml
, such as mage-ai/compose-quickstart, I think you can remove it. but I haven't try it.
before start, something need to do...
add dev environment to io_config.yaml
using postgres in docker-compose.
dev:
# PostgresSQL
POSTGRES_CONNECT_TIMEOUT: 10
POSTGRES_DBNAME: "{{ env_var('POSTGRES_DBNAME') }}"
POSTGRES_SCHEMA: "{{ env_var('POSTGRES_SCHEMA') }}"
POSTGRES_USER: "{{ env_var('POSTGRES_USER') }}"
POSTGRES_PASSWORD: "{{ env_var('POSTGRES_PASSWORD') }}"
POSTGRES_HOST: "{{ env_var('POSTGRES_HOST') }}"
POSTGRES_PORT: "{{ env_var('POSTGRES_PORT') }}"
update the line 23 ~/Documents/secrets/personal-gcp.json:/home/src/personal-gcp.json
in docker-compose.yml
to fit the path of your google crediential json file.
update Google part in io_config.yaml
to be like below
# Google
GOOGLE_SERVICE_ACC_KEY_FILEPATH: "/home/src/personal-gcp.json" # replace filename depond on your settings in docker-compose.yml
GOOGLE_LOCATION: US # Optional
update .env
GOOGLE_APPLICATION_CREDENTIALS=/home/src/personal_gcp.json # replace filename
restart docker compose
docker compose stop
docker compose rm
docker compose build # becuase changing the docker-compose.yml
docker compose up
steps according to the homework.
run docker compose up
and navigate to http://localhost:6789
.
choose Python
> API
, update to file.
choose Python
> Generic (no template)
, update to file.
choose Python
> PostgreSQL
, update to file.
use "Data Loader" to check, choose SQL
, the options:
- Connection:
PostgreSQL
- Profile:
dev
- Use raw SQL
and update to file.
4-2-1 GCP Services Up
cd ../../week_1_basics_n_setup/1_terraform_gcp/terraform
terraform apply
# when not use, don't forget to
# terraform destroy
4-2-2 Add project environment variable
update .env
PROJECT_ID=xxxxx # replace
BUCKET_NAME=xxxxx # replace
restart docker compose
docker compose stop
docker compose rm
docker compose build # becuase changing the docker-compose.yml
docker compose up
4-2-3 Add Data exporter
choose Python
> Generic (no template)
, update to file.
4-2-4 Check "Tree"
all exporters is under the transformer, like...
4-3-1 GCP Services Up (if 4-2 done, it can be ignored)
cd ../../week_1_basics_n_setup/1_terraform_gcp/terraform
terraform apply
# when not use, don't forget to
# terraform destroy
4-3-2 Add project environment variable
update .env
DATASET_NAME=nyc_taxi_data
restart docker compose
docker compose stop
docker compose rm
docker compose build # becuase changing the docker-compose.yml
docker compose up
4-3-3 Add Data exporter
choose Python
> Google BigQuery
, update to file.
4-3-4 Check "Tree"
all exporters is under the transformer.
About source code go back to vscode, you can see ...
.
├── mage_data
│ └── magic-zoomcamp
├── magic-zoomcamp
│ ├── __pycache__
│ ├── charts
│ ├── custom
│ ├── data_exporters // data_exporters
│ ├── data_loaders // data_loaders
│ ├── dbt
│ ├── extensions
│ ├── interactions
│ ├── pipelines // pipelines
│ ├── scratchpads
│ ├── transformers // transformers
│ ├── utils
│ ├── __init__.py
│ ├── io_config.yaml
│ ├── metadata.yaml
│ └── requirements.txt
├── Dockerfile
├── README.md
├── dev.env
├── .env // project environment variables (project name, postgres...)
├── docker-compose.yml
└── requirements.txt
more details about project structure, you can see officical website.