forked from Alluxio/alluxio
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add string interning to user and group names
### What changes are proposed in this pull request? Use `intern()` on frequent strings like user/group names. This should change the string memory allocation from the heap into the internal string pool by an extent so we allocate fewer objects (and subsequently friendly to GC). ### Why are the changes needed? In a heap analysis I see many duplicate Strings. For example when I run a cluster on AWS with `ec2-user` as both user and group, I see many duplicate "ec2-user" strings from a heap analysis: ![image](https://user-images.githubusercontent.com/14806853/147194157-ac1d7d6e-69b6-469f-a9e1-708ee1f72566.png) My heap size is 15G and 1.7G are allocated to user/group name strings. I have around 25 million files. Note that the 50K String objects are the live objects. There should have been many more and GC-ed from the heap. ![image](https://user-images.githubusercontent.com/14806853/147194363-8d206578-b192-4acc-bdf0-41bea12f9701.png) ![image](https://user-images.githubusercontent.com/14806853/147194392-b4ae9ce4-5115-4ec3-bdce-69578fb1a0fa.png) If I'm reading correctly, each string retains 56 bytes in heap. Using `intern()` should avoid many new String allocations. https://blog.codecentric.de/en/2012/03/save-memory-by-using-string-intern-in-java/ ### Does this PR introduce any user facing changes? The string intern table brings in a trade off in CPU time because there's now an explicit seek overhead. Using `intern()` is not a no-brainer. https://blog.codecentric.de/en/2012/03/save-memory-by-using-string-intern-in-java/ This suggests that the default string intern table size has changed to 60013 in Java 8+ so we should be able to put our user/group names in the intern table without too much seek time overhead, **without extra JVM options**. However there are contradicting complaints on performance like https://stackoverflow.com/a/10628759/4933827 This is from 2017 so maybe later version java has improved significantly. This means we need to benchmark the performance with a little care than just reasoning if interning makes sense or not. pr-link: Alluxio#14743 change-id: cid-3cef909efbd39c6924c8a6abe5c920f81d248d64
- Loading branch information
1 parent
fbe3e86
commit d3b0452
Showing
8 changed files
with
26 additions
and
26 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters