Skip to content

Alexey-Samodurov/mybi-dbt-showcase

 
 

Repository files navigation

myBI Market + dbt demo project

This repo guides you through building analyics for myBI Market with Modern Data Stack:

  • myBI Connect (Extract - Load tool)
  • Clickhouse (Database)
  • dbt (Transformations) + mybi-dbt-core module
  • Metabase (Business Intelligence)
  • Github Actions (Continuous Integration + Deployment)

Table of Contents

  1. Configure environment
  2. Install and configure mybi_dbt_core package
  3. Configure data sources
  4. Build staging layer models
  5. Model Data Marts
  6. Set up metrics layer
  7. Visualize on a dashboard
  8. Publish dbt project docs
  9. Introduce Continuous Integration

Configure environment

  1. Install prerequisites:

  2. Fork & Clone this repository and open in IDE

  3. Spin up Docker containers

    All the services are configured via Docker containers.

    • Clickhouse
    • Metabase
    • Cube
    # launch containers: clickhouse, metabase, cube
    docker-compose build --no-cache
    docker-compose up -d
  4. Open dev container with dbt installed

    devcontainer open .
    
    # test connections
    dbt --version
    dbt debug

    Spin up Docker containers

Install and configure mybi_dbt_core package

  1. Install module via packages.yml

    dbt clean # clean temp files
    dbt deps # install dependencies (modules)
  2. ✅ Enable only relevant data models (and disable the rest)

    We will use specific data sources:

    Confirm with command: dbt ls --resource-type model -s tag:staging

    Confirm specfic data sources enabled

  3. ✅ Assign variables

    Certain variable values have to be set:

    • source database connection details
    • database and schema name to find mybi source data
    • specific account_id values to be included
  4. ✅ Turn on custom schema management

    I use generate_schema_name.sql macro to set target schema names:

    Renders a schema name given a custom schema name. In production, this macro will render out the overriden schema name for a model. Otherwise, the default schema specified in the active target is used.

    Take a look at custom_schema_management.sql macro to find out more.

Configure data sources

  1. Create Clickhouse database of PostgreSQL Database Engine with source data:

    dbt run-operation init_source_data

    Initialized myBI source database

Build staging layer models

Staging models are alredy configured for you in mybi_dbt_core package:

All you need to do is just build these models in one command:

dbt build -s tag:staging

Build staging models

Model Data Marts

With staging models in place we now can proceed to data modeling.

  1. Intermediate models include:

    dbt build -s tag:intermediate
  2. Data Marts include:

    dbt build --full-refresh -s tag:marts

    Take a look at the project graph (DAG):

    Data marts graph

Set up metrics layer

dbt Metrics

Cube Metrics

Cube Playground

Visualize on a dashboard

Now we are ready to visualize key metrics on a dashboard.

I have configured Clickhouse connection and prepared Metabase dashboard which you can access at http://localhost:3000/dashboard/1-mybi-tutorial:

Explore data from Metabase dashboard

You may explore data from Metabase yourself or even build your own dashboard.

Publish dbt project docs

dbt Docs can be easily served locally on http://localhost:8080:

dbt docs generate
dbt docs serve

Or you may access pre-build version from Github Pages:

Access dbt project docs

Introduce Continuous Integration

Let's say you want to introduce some code changes. How do you ensure data quality?

You protect your main branch and require Pull Requests to have:

Continuous Integration Checks

This way you make sure to deploy high quality and functional code.

Contributing

If you have any questions or comments please create an issue.

About

myBI Connect + dbt showcase project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Dockerfile 100.0%