Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DAG for exporting retool data #550

Merged
merged 19 commits into from
Dec 16, 2024
Merged

Conversation

amishas157
Copy link
Contributor

@amishas157 amishas157 commented Dec 6, 2024

PR Checklist

PR Structure

  • This PR has reasonably narrow scope (if not, break it down into smaller PRs).
  • This PR avoids mixing refactoring changes with feature changes (split into two PRs
    otherwise).
  • This PR's title starts with the jira ticket associated with the PR.

Thoroughness

  • This PR adds tests for the most critical parts of the new functionality or fixes.
  • I've updated the README with the added features, breaking changes, new instructions on how to use the repository.

What

This PR adds DAG to fetch custom data from retool API. It triggers command in stellar-internal , pull data created/updated within given time range.
Existing batch_id is deleted and then re-created(This is to keep idempotent results while retries)

Why

To support new data ingestion

Known limitations

Not a limitation, but open to renaming the dag if it too generic. Probable options:

  • external_data_dag_retool
  • export_retool
  • external_data_dag_daily (This can be used to import any external data which runs at daily cadence)

Testing

Tested the airflow in test-hubble and records look loaded in BQ table

@amishas157 amishas157 force-pushed the patch/fetch-retool-data branch 3 times, most recently from bbcf345 to c1aa6df Compare December 6, 2024 23:28
udpate image

udpate image

udpate image

udpate image
@amishas157 amishas157 force-pushed the patch/fetch-retool-data branch from c1aa6df to ef7e871 Compare December 6, 2024 23:32
@amishas157 amishas157 force-pushed the patch/fetch-retool-data branch from 4f4b420 to 1102681 Compare December 11, 2024 22:49
@amishas157 amishas157 force-pushed the patch/fetch-retool-data branch 3 times, most recently from 64605ef to b95a518 Compare December 11, 2024 23:48
update xcom value

update

set xcom input

simplify

rename
@amishas157 amishas157 force-pushed the patch/fetch-retool-data branch from b95a518 to c632660 Compare December 12, 2024 00:03
@amishas157 amishas157 force-pushed the patch/fetch-retool-data branch from 4090b59 to 7ee9e0d Compare December 13, 2024 21:46
@amishas157 amishas157 force-pushed the patch/fetch-retool-data branch from 75c4c6e to 2461095 Compare December 13, 2024 22:37
@amishas157 amishas157 force-pushed the patch/fetch-retool-data branch 3 times, most recently from f0e9186 to d09b178 Compare December 16, 2024 21:40
@amishas157 amishas157 force-pushed the patch/fetch-retool-data branch from d09b178 to 1576b35 Compare December 16, 2024 22:00
update

move

update

update
@amishas157 amishas157 force-pushed the patch/fetch-retool-data branch from 1576b35 to 5920807 Compare December 16, 2024 22:07
@@ -315,6 +315,7 @@
"schema_filepath": "/home/airflow/gcs/dags/schemas/",
"sentry_dsn": "https://[email protected]/6190849",
"sentry_environment": "development",
"stellar_etl_internal_image_name": "amishastellar/stellar-etl-internal:cd53bcf70",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is temporary. Once we have build setup for stellar-etl, will switch to stellar workspace.

@@ -0,0 +1,163 @@
[
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took this JSON schema from bigquery after creating the table

@amishas157 amishas157 marked this pull request as ready for review December 16, 2024 22:27
@amishas157 amishas157 requested a review from a team as a code owner December 16, 2024 22:27
dag = DAG(
"external_data_dag",
default_args=get_default_dag_args(),
start_date=datetime(2024, 12, 5, 14, 30),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend updating this start date. Generally because this is set to run "0 22 * * *" you don't want to set the hour::minutes part of the datetime

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 9a4e60f

)


def get_insert_to_bq_task(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this function different than https://github.com/stellar/stellar-etl-airflow/blob/master/dags/stellar_etl_airflow/build_bq_insert_job_task.py or anything else in ./dags/stellar_etl_ariflow/build_*?

If it's different it might be worthwhile to separate this out into either utils.py or another *.py file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, it is different since it is combination of delete and insert. Whereas the one in build_bq_insert_job_task is just insertion

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that's fine then. Yeah a nit to move to a separate helper file. Also recommend renaming the function to be more descriptive of it deleting and inserting instead of just get_insert_to_bq_task. Maybe just add delete to the function name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I moved it to dags/stellar_etl_airflow/build_del_ins_operator.py in 9a4e60f

Also renamed the method to create_export_del_insert_operator. Essentially create_export_del_insert_operator is specific type of create_del_ins_task

image = "{{ var.value.stellar_etl_internal_image_name }}"

output_filepath = ""
if use_gcs:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: technically the gcs if/case could be split out into a separate function or interface so that it's easy to add support for other cloud storage systems. BUT I don't think we'll ever need support outside of GCS for this so it's fine to leave this as is

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. I think in general we could try to write the operators as classes, which can help with overriding.. Right now, it is more functional.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly one day we'll probably refactor all of stellar-etl-airflow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤞

Copy link
Contributor

@chowbao chowbao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM left some minor comments

@amishas157 amishas157 merged commit afc5bb1 into master Dec 16, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants