-
Notifications
You must be signed in to change notification settings - Fork 667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Embedded cache DB alternative #2774
Comments
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
I guess this one is resolved by the cloud cache #4097 I know we were also considering parquet, but parquet is a columnar storage format so wouldn't be a good fit for the task cache. Instead any cache backend should be a true key-value store or at least row-based. |
Although I see you were looking into LMDB. That should be a good choice. I can look into it if you want |
I gave a try in the past to Lmdb in the past I was not really convinced: weird API, native OS dependencies, also it does not even really really maintained any more. Maybe we should give a try to thin wrapper over classic Sqlite or Duckbd. |
That's too bad, LMDB seems to have really good performance. SQLite could be a good option. Probably not DuckDB though, since it is designed for OLAP. It could be useful to export a cache DB to DuckDB for downstream analytics, but not during the pipeline execution. |
Only a benchmark can really tell |
Bug report
Nextflow tasks metadata is stored into a local embedded key-value database based on LevelDB.
This provides good performance, however, the LevelDB store has some stability issues on specific hardware/file systems causing and represent blocking factor for those users. See for example: #2377, #403, #351 and #309
The goal of this issue is to explore the use of lmdbjava as alternative storage for nextflow tasks metadata
The text was updated successfully, but these errors were encountered: