Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logging to help troubleshoot OOM #41

Closed
yeukhon opened this issue Aug 7, 2018 · 4 comments
Closed

Logging to help troubleshoot OOM #41

yeukhon opened this issue Aug 7, 2018 · 4 comments

Comments

@yeukhon
Copy link

yeukhon commented Aug 7, 2018

Anonimatron version:
Operating system and version:
Java runtime (java -version):
Java 7

Executed commands or actions:

java -Xss512M -Xms58G -Xmx58G -XX:+UseG1GC -XX:MaxGCPauseMillis=350 \
-XX:InitiatingHeapOccupancyPercent=35 -XX:ReservedCodeCacheSize=512M \
-Djava.util.concurrent.ForkJoinPool.common.parallelism=32 \
-Dlog4j.configuration=file:/var/lib/anonimatron/releases/log4j.properties  \
-Djavax.net.ssl.keyStore=/etc/pki/certmaster/my_server.jks  \
-Djavax.net.ssl.keyStorePassword=$JKS_PASSWORD  \
-Djavax.net.ssl.trustStore=/etc/pki/certmaster/truststore.jks \
-Djavax.net.ssl.trustStorePassword=$JKS_PASSWORD -jar /var/lib/anonimatron/releases/anonimatron.jar \
-c /var/lib/anonimatron/releases/my_configuration.xml > /var/log/anonimatron/output.log

Expected outcome or behavior:

Does not OOM.

Actual outcome or behavior:

This is what we got...

Exception in thread "Thread-47" Exception in thread "Thread-30" java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
at java.nio.CharBuffer.wrap(CharBuffer.java:373)
at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:265)
at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125)
at java.io.OutputStreamWriter.write(OutputStreamWriter.java:207)
at java.io.BufferedWriter.flushBuffer(BufferedWriter.java:129)
at java.io.PrintStream.write(PrintStream.java:526)
at java.io.PrintStream.print(PrintStream.java:583)
at com.rolfje.anonimatron.progress.ProgressPrinter.print(ProgressPrinter.java:55)
at com.rolfje.anonimatron.progress.ProgressPrinter.run(ProgressPrinter.java:40)
at java.lang.Thread.run(Thread.java:748)
Uncaught error from thread [AnonimatronSystem-scheduler-1] shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for ActorSystem[AnonimatronSystem]
java.lang.OutOfMemoryError: Java heap space
at akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:409)
at akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375)
at java.lang.Thread.run(Thread.java:748)

Is there a way in anonimatron to log which table/database was processing by the thread that went OOM?

Thanks.

@realrolfje
Copy link
Owner

The Anonimatron code is not multi-threaded. Records are processed table by table and record by record, see JdbcAnonimizerService line 61 and further

The current table being processed is stored in a log4j NDC before processing of that table starts.

Maybe you can adjust your log4j configuration so that it outputs this NDC. The included log4j configuration of anonimatron should provide a starting point.

I am curious what amount or kind of data your are anonymizing that is causing this problem, usually this points to a configuration problem where the source data is always unique (like a record id) and large.

@yeukhon
Copy link
Author

yeukhon commented Aug 7, 2018

Thank you very much for the prompt response. Looking at htop it seems from the getgo the heap continues to grow and won't back down. I am going to play around with the java command. Has there been any known memory leaks in version 1.7.? I am trying to upgrade to as latest as possible on my end, sorry.

@realrolfje
Copy link
Owner

I am not aware of any current memory issues in Anonimatron, it is being used in large production systems where it completes long anonimization runs (multiple minutes) without a problem.

Of course you can be running into a situation we have not encountered before, so I think we need to keep all options open. Can you maybe share an anonimized version of your config file (please remove passwords and other data you don't want to share online)?

@yeukhon
Copy link
Author

yeukhon commented Aug 27, 2018

Thank you @realrolfje. Sorry for the late response. We are just able to produce a new anonymized backup yesterday, although we had to use r5 instance, roughly about 102G at peak time, 87G for the remaining of the process. So there must be some tables that's really huge and eating up all our memory (I set heap 30G/102G min/max out of 120G).

But I still want to thank you for this amazing project. We are currently verifying the produced backup.

  1. Here is the config https://gist.github.com/yeukhon/af6faf042d52690efaa6db8b835423a1

  2. Is it possible to use anonimatron to truncate number of rows in a given table?

Thanks.

@yeukhon yeukhon closed this as completed Dec 31, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants