Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite RAM usage can exhaust system #100

Open
paulmillr opened this issue Jul 11, 2024 · 6 comments
Open

Infinite RAM usage can exhaust system #100

paulmillr opened this issue Jul 11, 2024 · 6 comments

Comments

@paulmillr
Copy link

RAM usage seems to be unlimited with the module. RAM requirements need to be clarified somewhere, because crashing apps is not ok.

@thecodrr
Copy link
Owner

Fdir just crawls the directories, and yes, if you have an insane amount of items it may end up OOM-ing your app. What's the suggested outcome here?

@paulmillr
Copy link
Author

Readdirp have limits on ram usage due to its architecture. It can’t exceed X.

@43081j
Copy link
Contributor

43081j commented Aug 21, 2024

readdirp will just emit files/dirs, right? so it won't really use much memory, as it isn't building up a list of found paths

if the user turns the result into an array, they will hit the same issue fdir has.

you can set maxFiles possibly to limit the array size, so you won't hit any OOM issues

alternatively, maybe we could introduce an option to pass a callback which will be called each time we visit a path, and we skip building up the internal set of paths (which means you can't use group etc)

@paulmillr
Copy link
Author

Well, stream APIs exist for a reason. Not always users want an array of 400K files. etc.

@laurent22
Copy link

Yes I think I'm completely missing the point of this package. It's fast so presumably you see the most benefits when crawling large directories... but then it returns millions of files without any way to process the data using a stream or callback API?

Again not seeing the point at all. It's nice if you only have a few files I guess but then why bother with a third-party dependency, just use readdir.

@thecodrr
Copy link
Owner

thecodrr commented Feb 27, 2025

Again not seeing the point at all. It's nice if you only have a few files I guess but then why bother with a third-party dependency, just use readdir.

readdir is pretty slow so...if you don't care about performance then sure.

fdir shouldn't give you memory problems unless you are crawling trillions of files (which to be honest is only an edge case). Actually, try crawling the root directory on your system and see if you run out of memory.

any way to process the data using a stream or callback API?

Stream APIs are inherently slow. I tested yield performance and it was particularly wanting. The point of fdir is to crawl and give you all the files in a directory fast. What you do afterwards with the paths is up to you.

It's technically possible to run out of memory with fdir but you'll have other concerns at the point at which that becomes an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants