Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelize Iceberg materialized view base table freshness retrieval #24734

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

chenjian2664
Copy link
Contributor

@chenjian2664 chenjian2664 commented Jan 17, 2025

Description

This approach attempts to parallelize materialized view base table freshness
retrieval in Iceberg.

Benchmark with the 20 base tables materialized views, using sql
REFRESH MATERIALIZED VIEW mv :

  • with the assumption of avg time loading table is 10ms, it decrease
    refreshing time from 560ms to 310ms.
  • with the assumption of avg time loading table is 100ms, it decrease
    refreshing time more than 1s.

@cla-bot cla-bot bot added the cla-signed label Jan 17, 2025
@github-actions github-actions bot added the iceberg Iceberg connector label Jan 17, 2025
@chenjian2664 chenjian2664 force-pushed the iceberg_mv_fresh branch 2 times, most recently from 4b54953 to f3dd46e Compare January 17, 2025 04:43
@chenjian2664 chenjian2664 changed the title parallel fetch matrialized view freshness in Iceberg parallel fetch table change info for materialized view freshness in Iceberg Jan 17, 2025
Copy link

This pull request has gone a while without any activity. Tagging for triage help: @mosabua

@github-actions github-actions bot added the stale label Feb 11, 2025
@github-actions github-actions bot removed the stale label Feb 12, 2025
@ebyhr ebyhr requested review from SemionPar and pajaks February 19, 2025 09:15
@chenjian2664
Copy link
Contributor Author

Similar to 64d51fc

@chenjian2664 chenjian2664 changed the title parallel fetch table change info for materialized view freshness in Iceberg Parallelize Iceberg materialized view base table freshness retrieval Feb 21, 2025
Previously, base table freshness was retrieved sequentially, which could make
materialized view refreshes inefficient, especially when base tables changed
 frequently or loaded slowly.
This change parallelizes the retrieval process, improving refresh performance,
particularly for workloads with frequently changing or slow-loading base tables.

Benchmark Results, with 20 base tables in a materialized view using
 `REFRESH MATERIALIZED VIEW`:
* Avg table load time: 10ms → Refresh time reduced from 560ms to 310ms.
* Avg table load time: 100ms → Refresh time reduced by more than 1s.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed iceberg Iceberg connector
Development

Successfully merging this pull request may close these issues.

None yet

1 participant