Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce memory usage by reusing segmenter #37

Open
maia opened this issue Mar 12, 2018 · 0 comments
Open

reduce memory usage by reusing segmenter #37

maia opened this issue Mar 12, 2018 · 0 comments

Comments

@maia
Copy link
Contributor

maia commented Mar 12, 2018

I just realised that for an array with 1000 strings with each 50-300 chars length (url titles and description generated by gottfrois/link_thumbnailer), the following causes a much higher memory load…
array.map {|string| PragmaticSegmenter::Segmenter.new(text: string, language: 'de').segment }
…than this here:
PragmaticSegmenter::Segmenter.new(text: array.join('\r'), language: 'de').segment

In my tests it's a 30-50MB difference, I assume objects inside a #map will not get garbage collected sequentially but all at once, when the entire array has been mapped.

@diasks2 would you consider updating the API to also support:
ps = PragmaticSegmenter::Segmenter.new(language: 'de'); array.map {|string| ps.segment(string) }
…which would allow to reuse the Segmenter object and will most likely reduce memory load? It would be possible to support the old API as well, by additionally allowing initialisation without a passed text and adding an optional argument to #segment.

As a side note, I've noticed lots of #gsub which probably can be replaced with #gsub! to reduce the strain on the garbage collector. I'll submit a PR whenever I ever get to it, unfortunately my current work load only allows me to report the issue and not much more.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant