Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add batch file deletion support to FileIO #12387

Closed
2 of 3 tasks
danielhumanmod opened this issue Feb 23, 2025 · 1 comment
Closed
2 of 3 tasks

Add batch file deletion support to FileIO #12387

danielhumanmod opened this issue Feb 23, 2025 · 1 comment
Labels
improvement PR that improves existing functionality

Comments

@danielhumanmod
Copy link

danielhumanmod commented Feb 23, 2025

Feature Request / Improvement

Background

Currently, Iceberg's FileIO interface only provides deleteFile(String path), which deletes files one at a time. However, for workloads that need to remove many files (e.g., metadata cleanup, expired data removal), this can be inefficient and incur unnecessary costs, especially for cloud object storage services.

Many storage systems (such as AWS S3, Google Cloud Storage, and Azure Blob Storage) provide APIs for batch file deletions. Leveraging these capabilities could improve performance and reduce API call overhead.

I discovered this potential improvement when working on Apache Polaris—adding batch deletion support can help optimize catalog operations, such as deleting metadata and statistics files.

Proposal

Introduce a new method to FileIO:

void deleteFiles(Iterable<String> paths) 
  • It maintains backward compatibility by offering a default implementation that loops over deleteFile().
  • Implementations such as S3FileIO can override this method for optimized batch operations.

Expected Benefits

  • Reduce the number of storage API calls when deleting multiple files.
  • Improve performance for workloads that require bulk file deletions.
  • Ensure compatibility with all existing FileIO implementations.

I am happy to contribute this feature under the community’s guidance. Since this involves extending an existing interface, I would appreciate any feedback and insights from the community.

Query engine

None

Willingness to contribute

  • I can contribute this improvement/feature independently
  • I would be willing to contribute this improvement/feature with guidance from the Iceberg community
  • I cannot contribute this improvement/feature at this time
@danielhumanmod danielhumanmod added the improvement PR that improves existing functionality label Feb 23, 2025
@danielhumanmod
Copy link
Author

Just find it has already supported in #12154 , issue closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement PR that improves existing functionality
Projects
None yet
Development

No branches or pull requests

1 participant