You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, Iceberg's FileIO interface only provides deleteFile(String path), which deletes files one at a time. However, for workloads that need to remove many files (e.g., metadata cleanup, expired data removal), this can be inefficient and incur unnecessary costs, especially for cloud object storage services.
Many storage systems (such as AWS S3, Google Cloud Storage, and Azure Blob Storage) provide APIs for batch file deletions. Leveraging these capabilities could improve performance and reduce API call overhead.
I discovered this potential improvement when working on Apache Polaris—adding batch deletion support can help optimize catalog operations, such as deleting metadata and statistics files.
Proposal
Introduce a new method to FileIO:
voiddeleteFiles(Iterable<String> paths)
It maintains backward compatibility by offering a default implementation that loops over deleteFile().
Implementations such as S3FileIO can override this method for optimized batch operations.
Expected Benefits
Reduce the number of storage API calls when deleting multiple files.
Improve performance for workloads that require bulk file deletions.
Ensure compatibility with all existing FileIO implementations.
I am happy to contribute this feature under the community’s guidance. Since this involves extending an existing interface, I would appreciate any feedback and insights from the community.
Query engine
None
Willingness to contribute
I can contribute this improvement/feature independently
I would be willing to contribute this improvement/feature with guidance from the Iceberg community
I cannot contribute this improvement/feature at this time
The text was updated successfully, but these errors were encountered:
Feature Request / Improvement
Background
Currently, Iceberg's
FileIO
interface only providesdeleteFile(String path)
, which deletes files one at a time. However, for workloads that need to remove many files (e.g., metadata cleanup, expired data removal), this can be inefficient and incur unnecessary costs, especially for cloud object storage services.Many storage systems (such as AWS S3, Google Cloud Storage, and Azure Blob Storage) provide APIs for batch file deletions. Leveraging these capabilities could improve performance and reduce API call overhead.
I discovered this potential improvement when working on Apache Polaris—adding batch deletion support can help optimize catalog operations, such as deleting metadata and statistics files.
Proposal
Introduce a new method to
FileIO
:Expected Benefits
I am happy to contribute this feature under the community’s guidance. Since this involves extending an existing interface, I would appreciate any feedback and insights from the community.
Query engine
None
Willingness to contribute
The text was updated successfully, but these errors were encountered: