Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Multi release jar again (metabase#24366)
* Try multi-release true again in our manifest Problem statement: Luiz packs our partner jars (exasol, starburst, etc.) into our jar so they can be "first class" and in cloud. But with the 44 cycle we've run into some issues: ```shell /tmp/j via ☕ v17.30 on ☁️ metabase-query ❯ jar uf 0.44.0-RC1.jar modules/*.jar ❯ java --version openjdk 11.0.14.1 2022-02-08 OpenJDK Runtime Environment Temurin-11.0.14.1+1 (build 11.0.14.1+1) OpenJDK 64-Bit Server VM Temurin-11.0.14.1+1 (build 11.0.14.1+1, mixed mode) /tmp/j via ☕ v11.0.14.1 on ☁️ metabase-query ❯ jar uf 0.44.0-RC1.jar modules/*.jar java.lang.module.InvalidModuleDescriptorException: Unsupported major.minor version 61.0 at java.base/jdk.internal.module.ModuleInfo.invalidModuleDescriptor(ModuleInfo.java:1091) at java.base/jdk.internal.module.ModuleInfo.doRead(ModuleInfo.java:195) at java.base/jdk.internal.module.ModuleInfo.read(ModuleInfo.java:147) at java.base/java.lang.module.ModuleDescriptor.read(ModuleDescriptor.java:2553) at jdk.jartool/sun.tools.jar.Main.addExtendedModuleAttributes(Main.java:2083) at jdk.jartool/sun.tools.jar.Main.update(Main.java:1017) at jdk.jartool/sun.tools.jar.Main.run(Main.java:366) at jdk.jartool/sun.tools.jar.Main.main(Main.java:1680) ``` Diogo tracked this down with some great sleuthing to an upgrade in our graal/js engine from “22.0.0.2” -> “22.1.0". This brought along the transitive truffle jar (which is the actual engine powering the js engine). The 22.0.0.2 was technically a multi-release jar but it only included java 11 sources. The 22.1.0 added java 17 sources in addition to the java 11. And this proves fatal to using the `jar` command. When `"Multi-Release"` is set to true, it knows to only look at versions it will need. Lacking this, it looks at all of the classes and the class version for 17 is 61.0 is higher than it knows how to understand and it breaks. Obvious Solution: Set Multi-Release to true. We have done this in the past. On startup we have a message logged: > WARNING: sun.reflect.Reflection.getCallerClass is not supported. This > will impact performance. And setting multi-release can remove this. But when we did that we ended up with: - metabase#16380 - metabase#17027 That issue describes slowdowns of queries on the order of 0.6 seconds -> 1.3 seconds. Almost doubling. People reported dashboards timing out. Jeff tracked this down to > Profiling revealed that the calls to Log4jLoggerFactory.getLogger > became much slower between the two versions. See attached screenshots. And this is a pernicious problem that we cannot easily test for. Lets try again: I've set multi-release to true and built a jar with `bin/build`. I immediately ran into problems: ```shell ❯ MB_DB_CONNECTION_URI="postgres://user:pass@localhost:5432/compare" MB_JETTY_PORT=3007 java "$(socket-repl 6007)" -jar multi-release-local.jar Warning: protocol #'java-time.core/Amount is overwriting function abs WARNING: abs already refers to: #'clojure.core/abs in namespace: java-time.core, being replaced by: #'java-time.core/abs WARNING: abs already refers to: #'clojure.core/abs in namespace: java-time, being replaced by: #'java-time/abs Warning: environ value /Users/dan/.sdkman/candidates/java/current for key :java-home has been overwritten with /Users/dan/.sdkman/candidates/java/17.0.1-zulu/zulu-17.jdk/Contents/Home Exception in thread "main" java.lang.Error: Circular loading of installed providers detected at java.base/java.nio.file.spi.FileSystemProvider.installedProviders(FileSystemProvider.java:198) at java.base/java.nio.file.Path.of(Path.java:206) at java.base/java.nio.file.Paths.get(Paths.java:98) at org.apache.logging.log4j.core.util.Source.toFile(Source.java:55) at org.apache.logging.log4j.core.util.Source.<init>(Source.java:142) at org.apache.logging.log4j.core.config.ConfigurationSource.<init>(ConfigurationSource.java:139) ``` So hazarded a guess that a bump in the log4j would solve this. And it does solve it. Then profiling some queries against bigquery (just viewing the table) in the RC2 and the locally built version with the multi-release: ```shell -- multi-release 2022-07-27 12:28:00,659 DEBUG middleware.log :: POST /api/dataset 202 [ASYNC: completed] 1.1 s 2022-07-27 12:28:02,609 DEBUG middleware.log :: POST /api/dataset 202 [ASYNC: completed] 897.9 ms 2022-07-27 12:28:03,950 DEBUG middleware.log :: POST /api/dataset 202 [ASYNC: completed] 778.1 ms -- RC non-multi-release 2022-07-27 12:28:57,633 DEBUG middleware.log :: POST /api/dataset 202 [ASYNC: completed] 1.0 s 2022-07-27 12:28:59,343 DEBUG middleware.log :: POST /api/dataset 202 [ASYNC: completed] 912.9 ms 2022-07-27 12:29:02,328 DEBUG middleware.log :: POST /api/dataset 202 [ASYNC: completed] 808.6 ms ``` So times seem very similar. ============ Proper benching: using criterium ```shell MB_JETTY_PORT=3008 java "$(socket-repl 6008)" -cp "/Users/dan/.m2/repository/criterium/criterium/0.4.6/criterium-0.4.6.jar":0.39.2.jar metabase.core ``` `(bench (log/warn "benching"))` Summary: 39.2: 21.109470 µs RC2: 4.975204 µs multi-release: 7.673965 µs These flood the consoles with logs ``` Older release: 39.2 user=> (bench (log/warn "benching")) Evaluation count : 2886240 in 60 samples of 48104 calls. Execution time mean : 21.109470 µs Execution time std-deviation : 567.271917 ns Execution time lower quantile : 20.171870 µs ( 2.5%) Execution time upper quantile : 22.429557 µs (97.5%) Overhead used : 6.835913 ns Found 5 outliers in 60 samples (8.3333 %) low-severe 4 (6.6667 %) low-mild 1 (1.6667 %) Variance from outliers : 14.1886 % Variance is moderately inflated by outliers ============================================= RC2: user=> (bench (log/warn "benching"))Evaluation count : 12396420 in 60 samples of 206607 calls. Execution time mean : 4.975204 µs Execution time std-deviation : 521.769687 ns Execution time lower quantile : 4.711607 µs ( 2.5%) Execution time upper quantile : 6.404317 µs (97.5%) Overhead used : 6.837290 ns Found 5 outliers in 60 samples (8.3333 %) low-severe 2 (3.3333 %) low-mild 3 (5.0000 %) Variance from outliers : 72.0600 % Variance is severely inflated by outliers ============================================= Proposed Multi-Release user=> (bench (log/warn "benching")) Evaluation count : 7551000 in 60 samples of 125850 calls. Execution time mean : 7.673965 µs Execution time std-deviation : 201.155749 ns Execution time lower quantile : 7.414837 µs ( 2.5%) Execution time upper quantile : 8.138010 µs (97.5%) Overhead used : 6.843981 ns Found 1 outliers in 60 samples (1.6667 %) low-severe 1 (1.6667 %) Variance from outliers : 14.1472 % Variance is moderately inflated by outliers ``` `(bench (log/info "benching info"))` This does not hit a console so is a no-op. Summary: 39.2: 11.534614 µs RC2: 98.408357 ns multi-release: 2.236756 µs ``` ============================================= 39.2: user=> (bench (log/info "benching info")) Evaluation count : 5223480 in 60 samples of 87058 calls. Execution time mean : 11.534614 µs Execution time std-deviation : 57.756163 ns Execution time lower quantile : 11.461502 µs ( 2.5%) Execution time upper quantile : 11.657644 µs (97.5%) Overhead used : 6.835913 ns Found 3 outliers in 60 samples (5.0000 %) low-severe 2 (3.3333 %) low-mild 1 (1.6667 %) Variance from outliers : 1.6389 % Variance is slightly inflated by outliers ============================================= RC2: user=> (bench (log/info "benching info"))Evaluation count : 574427220 in 60 samples of 9573787 calls. Execution time mean : 98.408357 ns pExecution time std-deviation : 1.792214 ns Execution time lower quantile : 96.891477 ns ( 2.5%) Execution time upper quantile : 103.394664 ns (97.5%) Overhead used : 6.837290 ns Found 8 outliers in 60 samples (13.3333 %) low-severe 3 (5.0000 %) low-mild 5 (8.3333 %) Variance from outliers : 7.7881 % Variance is slightly inflated by outliers ============================================= Multi-release: user=> (bench (log/info "benching info"))Evaluation count : 26477700 in 60 samples of 441295 calls. Execution time mean : 2.236756 µs Execution time std-deviation : 15.412356 ns Execution time lower quantile : 2.212301 µs ( 2.5%) Execution time upper quantile : 2.275434 µs (97.5%) Overhead used : 6.843981 ns Found 3 outliers in 60 samples (5.0000 %) low-severe 3 (5.0000 %) Variance from outliers : 1.6389 % Variance is slightly inflated by outliers ``` * bump graal/js * Custom MB log factory (metabase#24369) * Custom MB log factory * Write stupid code to appease stupid Eastwood * `ns-name` already calls `the-ns` on its argument. * More code cleanup * Improved code * Remove NOCOMMIT * empty commit to trigger CI Co-authored-by: Cam Saul <[email protected]>
- Loading branch information