Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add db retention support (MarquezProject#2486)
* Add db migration to add cascade deletion on `fk`s Signed-off-by: wslulciuc <[email protected]> * Add `DbDataRetention` and `dataRetentionInDays` config Signed-off-by: wslulciuc <[email protected]> * Add `DbRetentionJob` Signed-off-by: wslulciuc <[email protected]> * Add `DbRetentionCommand` Signed-off-by: wslulciuc <[email protected]> * Add `frequencyMins` config for runs and rename `dbRetentionInDays` Signed-off-by: wslulciuc <[email protected]> * Add docs to `DbRetentionJob` and minor renaming Signed-off-by: wslulciuc <[email protected]> * Wrap `DbRetention.retentionOnDbOrError()` in `try/catch` Signed-off-by: wslulciuc <[email protected]> * Add docs to DbRetention Signed-off-by: wslulciuc <[email protected]> * continued: Add docs to `DbRetention` Signed-off-by: wslulciuc <[email protected]> * Add handling of `errorOnDbRetention` Signed-off-by: wslulciuc <[email protected]> * Add docs to `DbException` and `DbRetentionException` Signed-off-by: wslulciuc <[email protected]> * `info` -> `debug` when inserting column lineage Signed-off-by: wslulciuc <[email protected]> * Remove `dbRetention.enabled` Signed-off-by: wslulciuc <[email protected]> * Update handling of `StatementException` Signed-off-by: wslulciuc <[email protected]> * Minor changes Signed-off-by: wslulciuc <[email protected]> * Add `docs/faq.md` Signed-off-by: wslulciuc <[email protected]> * continued: `Add docs/faq.md` Signed-off-by: wslulciuc <[email protected]> * continued: Add `docs/faq.md` Signed-off-by: wslulciuc <[email protected]> * continued: Add `docs/faq.md` Signed-off-by: wslulciuc <[email protected]> * Define `DEFAULT_RETENTION_DAYS` constant in `DbRetention` Signed-off-by: wslulciuc <[email protected]> * Make chunk size in retention query configurable Signed-off-by: wslulciuc <[email protected]> * Remove `DATA_RETENTION_IN_DAYS` from `MarquezConfig` Signed-off-by: wslulciuc <[email protected]> * Update docs for chunk size config Signed-off-by: wslulciuc <[email protected]> * Remove error log from `DbRetention.retentionOnDbOrError()` Signed-off-by: wslulciuc <[email protected]> * Use `LOOP` for retention Signed-off-by: wslulciuc <[email protected]> * continued: Use `LOOP` for retention Signed-off-by: wslulciuc <[email protected]> * Use `numberOfRowsPerBatch` Signed-off-by: wslulciuc <[email protected]> * Use `--number-of-rows-per-batch` Signed-off-by: wslulciuc <[email protected]> * Add pause to prevent lock timeouts Signed-off-by: wslulciuc <[email protected]> * Add `FOR UPDATE SKIP LOCKED` Signed-off-by: wslulciuc <[email protected]> * Add `sql()` Signed-off-by: wslulciuc <[email protected]> * Add `--dry-run` Signed-off-by: wslulciuc <[email protected]> * Add `jdbi3-testcontainers` Signed-off-by: wslulciuc <[email protected]> * Remove shortened flag args Signed-off-by: wslulciuc <[email protected]> * Use `marquez.db.DbRetention.DEFAULT_DRY_RUN` Signed-off-by: wslulciuc <[email protected]> * Add DbRetention.retentionOnRuns() Signed-off-by: wslulciuc <[email protected]> * Add `DbMigration.migrateDbOrError(DataSource)` Signed-off-by: wslulciuc <[email protected]> * Add `TestingDb` Signed-off-by: wslulciuc <[email protected]> * Add `DbTest` Signed-off-by: wslulciuc <[email protected]> * Add `testRetentionOnDbOrError_withDatasetsOlderThanXDays()` Signed-off-by: wslulciuc <[email protected]> * Remove `jobs.DbRetentionConfig.dryRun` Signed-off-by: wslulciuc <[email protected]> * Add `--dry-run` option to `faq.md` Signed-off-by: wslulciuc <[email protected]> * continued: Add --dry-run option to faq.md Signed-off-by: wslulciuc <[email protected]> * continued: `Add testRetentionOnDbOrError_withDatasetsOlderThanXDays` Signed-off-by: wslulciuc <[email protected]> * Fix retention query for datasets and dataset versions Signed-off-by: wslulciuc <[email protected]> * Add test for retention on dataset versions Signed-off-by: wslulciuc <[email protected]> * Add comments to tests Signed-off-by: wslulciuc <[email protected]> * Add `testRetentionOnDbOrErrorWithDatasetVersionsOlderThanXDays_skipIfVersionAsInputForRun()` Signed-off-by: wslulciuc <[email protected]> * Add `testRetentionOnDbOrErrorWithJobsOlderThanXDays()` Signed-off-by: wslulciuc <[email protected]> * Add `testRetentionOnDbOrErrorWithJobVersionsOlderThanXDays()` Signed-off-by: wslulciuc <[email protected]> * Add tests for dry run Signed-off-by: wslulciuc <[email protected]> * Add testRetentionOnDbOrErrorWithRunsOlderThanXDays() Signed-off-by: wslulciuc <[email protected]> * Add `testRetentionOnDbOrErrorWithOlEventsOlderThanXDays()` Signed-off-by: wslulciuc <[email protected]> * continued: `Add testRetentionOnDbOrErrorWithOlEventsOlderThanXDays()` Signed-off-by: wslulciuc <[email protected]> * Add `javadocs` to `DbRetention` Signed-off-by: wslulciuc <[email protected]> * Run tests in order of retention Signed-off-by: wslulciuc <[email protected]> --------- Signed-off-by: wslulciuc <[email protected]> Co-authored-by: Harel Shein <[email protected]>
- Loading branch information