Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embedded cache DB alternative #2774

Open
pditommaso opened this issue Apr 7, 2022 · 6 comments
Open

Embedded cache DB alternative #2774

pditommaso opened this issue Apr 7, 2022 · 6 comments

Comments

@pditommaso
Copy link
Member

Bug report

Nextflow tasks metadata is stored into a local embedded key-value database based on LevelDB.

This provides good performance, however, the LevelDB store has some stability issues on specific hardware/file systems causing and represent blocking factor for those users. See for example: #2377, #403, #351 and #309

The goal of this issue is to explore the use of lmdbjava as alternative storage for nextflow tasks metadata

@stale stale bot added the stale label Sep 20, 2022
@stale stale bot closed this as completed Nov 23, 2022
@nextflow-io nextflow-io deleted a comment from stale bot Jan 16, 2023
@pditommaso pditommaso reopened this Jan 16, 2023
@stale stale bot removed the stale label Jan 16, 2023
@stale
Copy link

stale bot commented Aug 12, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Aug 12, 2023
@bentsherman
Copy link
Member

I guess this one is resolved by the cloud cache #4097

I know we were also considering parquet, but parquet is a columnar storage format so wouldn't be a good fit for the task cache. Instead any cache backend should be a true key-value store or at least row-based.

@bentsherman
Copy link
Member

Although I see you were looking into LMDB. That should be a good choice. I can look into it if you want

@bentsherman bentsherman reopened this Aug 13, 2023
@bentsherman bentsherman removed the stale label Aug 13, 2023
@pditommaso
Copy link
Member Author

pditommaso commented Aug 13, 2023

I gave a try in the past to Lmdb in the past I was not really convinced: weird API, native OS dependencies, also it does not even really really maintained any more.

Maybe we should give a try to thin wrapper over classic Sqlite or Duckbd.

@bentsherman
Copy link
Member

That's too bad, LMDB seems to have really good performance. SQLite could be a good option. Probably not DuckDB though, since it is designed for OLAP. It could be useful to export a cache DB to DuckDB for downstream analytics, but not during the pipeline execution.

@pditommaso
Copy link
Member Author

Only a benchmark can really tell

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants