Skip to content

Commit

Permalink
Add string interning to user and group names
Browse files Browse the repository at this point in the history
### What changes are proposed in this pull request?

Use `intern()` on frequent strings like user/group names. This should
change the string memory allocation from the heap into the internal
string pool by an extent so we allocate fewer objects (and subsequently
friendly to GC).

### Why are the changes needed?

In a heap analysis I see many duplicate Strings. For example when I run
a cluster on AWS with `ec2-user` as both user and group, I see many
duplicate "ec2-user" strings from a heap analysis:


![image](https://user-images.githubusercontent.com/14806853/147194157-ac1d7d6e-69b6-469f-a9e1-708ee1f72566.png)
My heap size is 15G and 1.7G are allocated to user/group name strings. I
have around 25 million files. Note that the 50K String objects are the
live objects. There should have been many more and GC-ed from the heap.


![image](https://user-images.githubusercontent.com/14806853/147194363-8d206578-b192-4acc-bdf0-41bea12f9701.png)

![image](https://user-images.githubusercontent.com/14806853/147194392-b4ae9ce4-5115-4ec3-bdce-69578fb1a0fa.png)
If I'm reading correctly, each string retains 56 bytes in heap.

Using `intern()` should avoid many new String allocations.
https://blog.codecentric.de/en/2012/03/save-memory-by-using-string-intern-in-java/

### Does this PR introduce any user facing changes?

The string intern table brings in a trade off in CPU time because
there's now an explicit seek overhead. Using `intern()` is not a
no-brainer.


https://blog.codecentric.de/en/2012/03/save-memory-by-using-string-intern-in-java/
This suggests that the default string intern table size has changed to
60013 in Java 8+ so we should be able to put our user/group names in the
intern table without too much seek time overhead, **without extra JVM
options**.

However there are contradicting complaints on performance like
https://stackoverflow.com/a/10628759/4933827 This is from 2017 so maybe
later version java has improved significantly.

This means we need to benchmark the performance with a little care than
just reasoning if interning makes sense or not.

pr-link: Alluxio#14743
change-id: cid-3cef909efbd39c6924c8a6abe5c920f81d248d64
  • Loading branch information
jiacheliu3 authored Jan 5, 2022
1 parent fbe3e86 commit d3b0452
Show file tree
Hide file tree
Showing 8 changed files with 26 additions and 26 deletions.
4 changes: 2 additions & 2 deletions core/common/src/main/java/alluxio/grpc/GrpcUtils.java
Original file line number Diff line number Diff line change
Expand Up @@ -106,8 +106,8 @@ public static AccessControlList fromProto(PAcl pAcl) {
acl = new AccessControlList();
}

acl.setOwningUser(pAcl.getOwner());
acl.setOwningGroup(pAcl.getOwningGroup());
acl.setOwningUser(pAcl.getOwner().intern());
acl.setOwningGroup(pAcl.getOwningGroup().intern());
acl.setMode((short) pAcl.getMode());

if (pAcl.getEntriesCount() > 0) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -423,8 +423,8 @@ public static AccessControlList fromStringEntries(String owner, String owningGro
// when the stringEntries size is 0, it can only be a DefaultAccessControlList
acl = new DefaultAccessControlList();
}
acl.setOwningUser(owner);
acl.setOwningGroup(owningGroup);
acl.setOwningUser(owner.intern());
acl.setOwningGroup(owningGroup.intern());

for (String stringEntry : stringEntries) {
AclEntry aclEntry = AclEntry.fromCliString(stringEntry);
Expand Down Expand Up @@ -506,8 +506,8 @@ public AccessControlListDeserializer() {
public AccessControlList deserialize(JsonParser jsonParser,
DeserializationContext deserializationContext) throws IOException, JsonProcessingException {
JsonNode node = jsonParser.getCodec().readTree(jsonParser);
String owner = node.get(OWNER_FIELD).asText();
String owningGroup = node.get(OWNING_GROUP_FIELD).asText();
String owner = node.get(OWNER_FIELD).asText().intern();
String owningGroup = node.get(OWNING_GROUP_FIELD).asText().intern();
List<String> stringEntries = new ArrayList<>();
Iterator<JsonNode> nodeIterator = node.get(STRING_ENTRY_FIELD).elements();
while (nodeIterator.hasNext()) {
Expand Down
4 changes: 2 additions & 2 deletions core/common/src/main/java/alluxio/util/proto/ProtoUtils.java
Original file line number Diff line number Diff line change
Expand Up @@ -273,8 +273,8 @@ public static AccessControlList fromProto(Acl.AccessControlList acl) {
} else {
ret = new AccessControlList();
}
ret.setOwningUser(acl.getOwningUser());
ret.setOwningGroup(acl.getOwningGroup());
ret.setOwningUser(acl.getOwningUser().intern());
ret.setOwningGroup(acl.getOwningGroup().intern());

if (acl.getIsEmpty()) {
return ret;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -867,8 +867,8 @@ public List<Inode> createPath(RpcContext rpcContext, LockedInodePath inodePath,
if (context.isMetadataLoad()) {
// if we are creating the file as a result of loading metadata, the newDir is already
// persisted, and we got the permissions info from the ufs.
newDir.setOwner(context.getOwner())
.setGroup(context.getGroup())
newDir.setOwner(context.getOwner().intern())
.setGroup(context.getGroup().intern())
.setMode(context.getMode().toShort());

Long operationTimeMs = context.getOperationTimeMs();
Expand Down Expand Up @@ -931,8 +931,8 @@ private static void inheritOwnerAndGroupIfEmpty(MutableInode<?> newInode,
if (ServerConfiguration.getBoolean(PropertyKey.MASTER_METASTORE_INODE_INHERIT_OWNER_AND_GROUP)
&& newInode.getOwner().isEmpty() && newInode.getGroup().isEmpty()) {
// Inherit owner / group if empty
newInode.setOwner(ancestorInode.getOwner());
newInode.setGroup(ancestorInode.getGroup());
newInode.setOwner(ancestorInode.getOwner().intern());
newInode.setGroup(ancestorInode.getGroup().intern());
}
}

Expand Down Expand Up @@ -1233,8 +1233,8 @@ public void syncPersistNewDirectory(MutableInodeDirectory dir)
dir.setPersistenceState(PersistenceState.TO_BE_PERSISTED);
syncPersistDirectory(dir).ifPresent(status -> {
// If the directory already exists in the UFS, update our metadata to match the UFS.
dir.setOwner(status.getOwner())
.setGroup(status.getGroup())
dir.setOwner(status.getOwner().intern())
.setGroup(status.getGroup().intern())
.setMode(status.getMode())
.setXAttr(status.getXAttr());

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -564,7 +564,7 @@ public void updateFromEntry(UpdateInodeEntry entry) {
setCreationTimeMs(entry.getCreationTimeMs());
}
if (entry.hasGroup() && !entry.getGroup().isEmpty()) {
setGroup(entry.getGroup());
setGroup(entry.getGroup().intern());
}
if (entry.hasLastModificationTimeMs()) {
setLastModificationTimeMs(entry.getLastModificationTimeMs(),
Expand All @@ -584,7 +584,7 @@ public void updateFromEntry(UpdateInodeEntry entry) {
setName(entry.getName());
}
if (entry.hasOwner() && !entry.getOwner().isEmpty()) {
setOwner(entry.getOwner());
setOwner(entry.getOwner().intern());
}
if (entry.hasParentId()) {
setParentId(entry.getParentId());
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -198,8 +198,8 @@ public static MutableInodeDirectory fromJournalEntry(InodeDirectoryEntry entry)
} else {
// Backward compatibility.
AccessControlList acl = new AccessControlList();
acl.setOwningUser(entry.getOwner());
acl.setOwningGroup(entry.getGroup());
acl.setOwningUser(entry.getOwner().intern());
acl.setOwningGroup(entry.getGroup().intern());
short mode = entry.hasMode() ? (short) entry.getMode() : Constants.DEFAULT_FILE_SYSTEM_MODE;
acl.setMode(mode);
ret.mAcl = acl;
Expand Down Expand Up @@ -233,8 +233,8 @@ public static MutableInodeDirectory create(long id, long parentId, String name,
.setName(name)
.setTtl(context.getTtl())
.setTtlAction(context.getTtlAction())
.setOwner(context.getOwner())
.setGroup(context.getGroup())
.setOwner(context.getOwner().intern())
.setGroup(context.getGroup().intern())
.setMode(context.getMode().toShort())
.setAcl(context.getAcl())
// SetAcl call is also setting default AclEntries
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -406,8 +406,8 @@ public static MutableInodeFile fromJournalEntry(InodeFileEntry entry) {
} else {
// Backward compatibility.
AccessControlList acl = new AccessControlList();
acl.setOwningUser(entry.getOwner());
acl.setOwningGroup(entry.getGroup());
acl.setOwningUser(entry.getOwner().intern());
acl.setOwningGroup(entry.getGroup().intern());
short mode = entry.hasMode() ? (short) entry.getMode() : Constants.DEFAULT_FILE_SYSTEM_MODE;
acl.setMode(mode);
ret.mAcl = acl;
Expand Down Expand Up @@ -449,8 +449,8 @@ public static MutableInodeFile create(long blockContainerId, long parentId, Stri
.setParentId(parentId)
.setLastModificationTimeMs(context.getOperationTimeMs(), true)
.setLastAccessTimeMs(context.getOperationTimeMs(), true)
.setOwner(context.getOwner())
.setGroup(context.getGroup())
.setOwner(context.getOwner().intern())
.setGroup(context.getGroup().intern())
.setMode(context.getMode().toShort())
.setAcl(context.getAcl())
.setPersistenceState(context.isPersisted() ? PersistenceState.PERSISTED
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -55,8 +55,8 @@ public Pair<AccessControlList, DefaultAccessControlList> getAcl(FileSystem hdfs,
AccessControlList acl = new AccessControlList();
DefaultAccessControlList defaultAcl = new DefaultAccessControlList();

acl.setOwningUser(hdfsAcl.getOwner());
acl.setOwningGroup(hdfsAcl.getGroup());
acl.setOwningUser(hdfsAcl.getOwner().intern());
acl.setOwningGroup(hdfsAcl.getGroup().intern());
defaultAcl.setOwningUser(hdfsAcl.getOwner());
defaultAcl.setOwningGroup(hdfsAcl.getGroup());
for (AclEntry entry : hdfsAcl.getEntries()) {
Expand Down

0 comments on commit d3b0452

Please sign in to comment.