Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

option to use 'deep' caching for small files, metadata-only caching for large ones #3550

Open
notestaff opened this issue Jan 15, 2023 · 1 comment

Comments

@notestaff
Copy link

notestaff commented Jan 15, 2023

TL;DR: add option to use 'deep' caching for files below certain size, normal metadata caching for larger files.

Git-versioned files often get their metadata updated (when switching branches) without changing the content. There are other files (e.g. config files generated by earlier pipeline steps) where, again, file metadata change should not trigger recomputation, because the content is the same. Currently, avoiding recomputation requires turning on 'deep' caching, but that's infeasible if some input files are large. Some code in nextflow.util.CacheHelper tries to infer from heuristics whether deep caching should be used ("if it's an asset in the repo it's likely small"), but using the actual file size would be better. A global parameter could control the size below which files' content would get hashed.

@stale
Copy link

stale bot commented Aug 12, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants