Skip to content

Commit

Permalink
Multi release jar again (metabase#24366)
Browse files Browse the repository at this point in the history
* Try multi-release true again in our manifest

Problem statement:
Luiz packs our partner jars (exasol, starburst, etc.) into our jar so
they can be "first class" and in cloud. But with the 44 cycle we've run
into some issues:

```shell
/tmp/j via ☕ v17.30 on ☁️  metabase-query
❯ jar uf 0.44.0-RC1.jar modules/*.jar

❯ java --version
openjdk 11.0.14.1 2022-02-08
OpenJDK Runtime Environment Temurin-11.0.14.1+1 (build 11.0.14.1+1)
OpenJDK 64-Bit Server VM Temurin-11.0.14.1+1 (build 11.0.14.1+1, mixed mode)

/tmp/j via ☕ v11.0.14.1 on ☁️  metabase-query
❯ jar uf 0.44.0-RC1.jar modules/*.jar
java.lang.module.InvalidModuleDescriptorException: Unsupported major.minor version 61.0
	at java.base/jdk.internal.module.ModuleInfo.invalidModuleDescriptor(ModuleInfo.java:1091)
	at java.base/jdk.internal.module.ModuleInfo.doRead(ModuleInfo.java:195)
	at java.base/jdk.internal.module.ModuleInfo.read(ModuleInfo.java:147)
	at java.base/java.lang.module.ModuleDescriptor.read(ModuleDescriptor.java:2553)
	at jdk.jartool/sun.tools.jar.Main.addExtendedModuleAttributes(Main.java:2083)
	at jdk.jartool/sun.tools.jar.Main.update(Main.java:1017)
	at jdk.jartool/sun.tools.jar.Main.run(Main.java:366)
	at jdk.jartool/sun.tools.jar.Main.main(Main.java:1680)

```

Diogo tracked this down with some great sleuthing to an upgrade in our
graal/js engine from “22.0.0.2” -> “22.1.0". This brought along the
transitive truffle jar (which is the actual engine powering the js
engine). The 22.0.0.2 was technically a multi-release jar but it only
included java 11 sources. The 22.1.0 added java 17 sources in addition
to the java 11.

And this proves fatal to using the `jar` command. When `"Multi-Release"`
is set to true, it knows to only look at versions it will need. Lacking
this, it looks at all of the classes and the class version for 17 is
61.0 is higher than it knows how to understand and it breaks.

Obvious Solution:
Set Multi-Release to true. We have done this in the past. On startup we
have a message logged:

> WARNING: sun.reflect.Reflection.getCallerClass is not supported. This
> will impact performance.

And setting multi-release can remove this. But when we did that we ended
up with:
- metabase#16380
- metabase#17027

That issue describes slowdowns of queries on the order of 0.6 seconds ->
1.3 seconds. Almost doubling. People reported dashboards timing
out. Jeff tracked this down to

> Profiling revealed that the calls to Log4jLoggerFactory.getLogger
> became much slower between the two versions. See attached screenshots.

And this is a pernicious problem that we cannot easily test for.

Lets try again:
I've set multi-release to true and built a jar with `bin/build`. I
immediately ran into problems:

```shell
❯ MB_DB_CONNECTION_URI="postgres://user:pass@localhost:5432/compare" MB_JETTY_PORT=3007 java "$(socket-repl 6007)" -jar multi-release-local.jar
Warning: protocol #'java-time.core/Amount is overwriting function abs
WARNING: abs already refers to: #'clojure.core/abs in namespace: java-time.core, being replaced by: #'java-time.core/abs
WARNING: abs already refers to: #'clojure.core/abs in namespace: java-time, being replaced by: #'java-time/abs
Warning: environ value /Users/dan/.sdkman/candidates/java/current for key :java-home has been overwritten with /Users/dan/.sdkman/candidates/java/17.0.1-zulu/zulu-17.jdk/Contents/Home
Exception in thread "main" java.lang.Error: Circular loading of installed providers detected
	at java.base/java.nio.file.spi.FileSystemProvider.installedProviders(FileSystemProvider.java:198)
	at java.base/java.nio.file.Path.of(Path.java:206)
	at java.base/java.nio.file.Paths.get(Paths.java:98)
	at org.apache.logging.log4j.core.util.Source.toFile(Source.java:55)
	at org.apache.logging.log4j.core.util.Source.<init>(Source.java:142)
	at
        org.apache.logging.log4j.core.config.ConfigurationSource.<init>(ConfigurationSource.java:139)
```

So hazarded a guess that a bump in the log4j would solve this. And it
does solve it.

Then profiling some queries against bigquery (just viewing the table) in
the RC2 and the locally built version with the multi-release:

```shell
-- multi-release
2022-07-27 12:28:00,659 DEBUG middleware.log :: POST /api/dataset 202 [ASYNC: completed] 1.1 s
2022-07-27 12:28:02,609 DEBUG middleware.log :: POST /api/dataset 202 [ASYNC: completed] 897.9 ms
2022-07-27 12:28:03,950 DEBUG middleware.log :: POST /api/dataset 202 [ASYNC: completed] 778.1 ms

-- RC non-multi-release
2022-07-27 12:28:57,633 DEBUG middleware.log :: POST /api/dataset 202 [ASYNC: completed] 1.0 s
2022-07-27 12:28:59,343 DEBUG middleware.log :: POST /api/dataset 202 [ASYNC: completed] 912.9 ms
2022-07-27 12:29:02,328 DEBUG middleware.log :: POST /api/dataset 202 [ASYNC: completed] 808.6 ms
```
So times seem very similar.

============

Proper benching:

using criterium

```shell
MB_JETTY_PORT=3008 java "$(socket-repl 6008)" -cp "/Users/dan/.m2/repository/criterium/criterium/0.4.6/criterium-0.4.6.jar":0.39.2.jar metabase.core
```

`(bench (log/warn "benching"))`

Summary:
39.2:          21.109470 µs
RC2:           4.975204 µs
multi-release: 7.673965 µs

These flood the consoles with logs

```
Older release: 39.2

user=> (bench (log/warn "benching"))
Evaluation count : 2886240 in 60 samples of 48104 calls.
             Execution time mean : 21.109470 µs
    Execution time std-deviation : 567.271917 ns
   Execution time lower quantile : 20.171870 µs ( 2.5%)
   Execution time upper quantile : 22.429557 µs (97.5%)
                   Overhead used : 6.835913 ns

Found 5 outliers in 60 samples (8.3333 %)
	low-severe	 4 (6.6667 %)
	low-mild	 1 (1.6667 %)
 Variance from outliers : 14.1886 % Variance is moderately inflated by outliers

=============================================

RC2:

user=> (bench (log/warn "benching"))Evaluation count : 12396420 in 60 samples of 206607 calls.
             Execution time mean : 4.975204 µs
    Execution time std-deviation : 521.769687 ns
   Execution time lower quantile : 4.711607 µs ( 2.5%)
   Execution time upper quantile : 6.404317 µs (97.5%)
                   Overhead used : 6.837290 ns

Found 5 outliers in 60 samples (8.3333 %)
	low-severe	 2 (3.3333 %)
	low-mild	 3 (5.0000 %)
 Variance from outliers : 72.0600 % Variance is severely inflated by outliers

=============================================

Proposed Multi-Release

user=> (bench (log/warn "benching"))
Evaluation count : 7551000 in 60 samples of 125850 calls.
             Execution time mean : 7.673965 µs
    Execution time std-deviation : 201.155749 ns
   Execution time lower quantile : 7.414837 µs ( 2.5%)
   Execution time upper quantile : 8.138010 µs (97.5%)
                   Overhead used : 6.843981 ns

Found 1 outliers in 60 samples (1.6667 %)
	low-severe	 1 (1.6667 %)
 Variance from outliers : 14.1472 % Variance is moderately inflated by outliers

```

`(bench (log/info "benching info"))`

This does not hit a console so is a no-op.

Summary:
39.2:          11.534614 µs
RC2:           98.408357 ns
multi-release: 2.236756 µs

```
=============================================
39.2:

user=> (bench (log/info "benching info"))
Evaluation count : 5223480 in 60 samples of 87058 calls.
             Execution time mean : 11.534614 µs
    Execution time std-deviation : 57.756163 ns
   Execution time lower quantile : 11.461502 µs ( 2.5%)
   Execution time upper quantile : 11.657644 µs (97.5%)
                   Overhead used : 6.835913 ns

Found 3 outliers in 60 samples (5.0000 %)
	low-severe	 2 (3.3333 %)
	low-mild	 1 (1.6667 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers

=============================================

RC2:

user=> (bench (log/info "benching info"))Evaluation count : 574427220 in 60 samples of 9573787 calls.
             Execution time mean : 98.408357 ns
    pExecution time std-deviation : 1.792214 ns
   Execution time lower quantile : 96.891477 ns ( 2.5%)
   Execution time upper quantile : 103.394664 ns (97.5%)
                   Overhead used : 6.837290 ns

Found 8 outliers in 60 samples (13.3333 %)
	low-severe	 3 (5.0000 %)
	low-mild	 5 (8.3333 %)
 Variance from outliers : 7.7881 % Variance is slightly inflated by outliers

=============================================

Multi-release:

user=> (bench (log/info "benching info"))Evaluation count : 26477700 in 60 samples of 441295 calls.
             Execution time mean : 2.236756 µs
    Execution time std-deviation : 15.412356 ns
   Execution time lower quantile : 2.212301 µs ( 2.5%)
   Execution time upper quantile : 2.275434 µs (97.5%)
                   Overhead used : 6.843981 ns

Found 3 outliers in 60 samples (5.0000 %)
	low-severe	 3 (5.0000 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
 ```

* bump graal/js

* Custom MB log factory (metabase#24369)

* Custom MB log factory

* Write stupid code to appease stupid Eastwood

* `ns-name` already calls `the-ns` on its argument.

* More code cleanup

* Improved code

* Remove NOCOMMIT

* empty commit to trigger CI

Co-authored-by: Cam Saul <[email protected]>
  • Loading branch information
dpsutton and camsaul authored Jul 28, 2022
1 parent 2425447 commit 1f4bf25
Show file tree
Hide file tree
Showing 5 changed files with 84 additions and 22 deletions.
1 change: 1 addition & 0 deletions build.clj
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,7 @@

(def manifest-entries
{"Manifest-Version" "1.0"
"Multi-Release" "true"
"Created-By" "Metabase build.clj"
"Build-Jdk-Spec" (System/getProperty "java.specification.version")
"Main-Class" "metabase.core"})
Expand Down
14 changes: 7 additions & 7 deletions deps.edn
Original file line number Diff line number Diff line change
Expand Up @@ -86,12 +86,12 @@
:exclusions [org.slf4j/slf4j-api]}
org.apache.commons/commons-compress {:mvn/version "1.21"} ; compression utils
org.apache.commons/commons-lang3 {:mvn/version "3.12.0"} ; helper methods for working with java.lang stuff
org.apache.logging.log4j/log4j-1.2-api {:mvn/version "2.17.2"} ; apache logging framework
org.apache.logging.log4j/log4j-api {:mvn/version "2.17.2"} ; add compatibility with log4j 1.2
org.apache.logging.log4j/log4j-core {:mvn/version "2.17.2"} ; apache logging framework
org.apache.logging.log4j/log4j-jcl {:mvn/version "2.17.2"} ; allows the commons-logging API to work with log4j 2
org.apache.logging.log4j/log4j-jul {:mvn/version "2.17.2"} ; java.util.logging (JUL) -> Log4j2 adapter
org.apache.logging.log4j/log4j-slf4j-impl {:mvn/version "2.17.2"} ; allows the slf4j API to work with log4j 2
org.apache.logging.log4j/log4j-1.2-api {:mvn/version "2.18.0"} ; apache logging framework
org.apache.logging.log4j/log4j-api {:mvn/version "2.18.0"} ; add compatibility with log4j 1.2
org.apache.logging.log4j/log4j-core {:mvn/version "2.18.0"} ; apache logging framework
org.apache.logging.log4j/log4j-jcl {:mvn/version "2.18.0"} ; allows the commons-logging API to work with log4j 2
org.apache.logging.log4j/log4j-jul {:mvn/version "2.18.0"} ; java.util.logging (JUL) -> Log4j2 adapter
org.apache.logging.log4j/log4j-slf4j-impl {:mvn/version "2.18.0"} ; allows the slf4j API to work with log4j 2
org.apache.poi/poi {:mvn/version "5.2.2"} ; Work with Office documents (e.g. Excel spreadsheets) -- newer version than one specified by Docjure
org.apache.poi/poi-ooxml {:mvn/version "5.2.2"
:exclusions [org.bouncycastle/bcpkix-jdk15on
Expand Down Expand Up @@ -119,7 +119,7 @@
org.clojure/tools.trace {:mvn/version "0.7.11"} ; function tracing
org.eclipse.jetty/jetty-server {:mvn/version "9.4.44.v20210927"} ; web server
org.flatland/ordered {:mvn/version "1.15.10"} ; ordered maps & sets
org.graalvm.js/js {:mvn/version "22.0.0.2"} ; JavaScript engine
org.graalvm.js/js {:mvn/version "22.1.0"} ; JavaScript engine
org.liquibase/liquibase-core {:mvn/version "4.10.0" ; migration management (Java lib)
:exclusions [ch.qos.logback/logback-classic]}
org.mariadb.jdbc/mariadb-java-client {:mvn/version "2.7.5"} ; MySQL/MariaDB driver
Expand Down
10 changes: 7 additions & 3 deletions src/metabase/core.clj
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
metabase.driver.mysql
metabase.driver.postgres
[metabase.events :as events]
[metabase.logger :as mb.logger]
[metabase.models.user :refer [User]]
[metabase.plugins :as plugins]
[metabase.plugins.classloader :as classloader]
Expand All @@ -25,10 +26,13 @@
[metabase.util.i18n :refer [deferred-trs trs]]
[toucan.db :as db]))

(comment
;; Load up the drivers shipped as part of the main codebase, so they will show up in the list of available DB types
(comment metabase.driver.h2/keep-me
metabase.driver.mysql/keep-me
metabase.driver.postgres/keep-me)
metabase.driver.h2/keep-me
metabase.driver.mysql/keep-me
metabase.driver.postgres/keep-me
;; Make sure the custom Metabase logger code gets loaded up so we use our custom logger for performance reasons.
mb.logger/keep-me)

;; don't i18n this, it's legalese
(log/info
Expand Down
71 changes: 67 additions & 4 deletions src/metabase/logger.clj
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,14 @@
(:require [amalloy.ring-buffer :refer [ring-buffer]]
[clj-time.coerce :as time.coerce]
[clj-time.format :as time.format]
[metabase.config :refer [local-process-uuid]])
[clojure.core.memoize :as memoize]
[clojure.tools.logging :as log]
[clojure.tools.logging.impl :as log.impl]
[metabase.config :as config])
(:import org.apache.commons.lang3.exception.ExceptionUtils
[org.apache.logging.log4j Level Logger LogManager]
[org.apache.logging.log4j.core Appender LogEvent LoggerContext]
org.apache.logging.log4j.core.config.LoggerConfig
org.apache.logging.log4j.LogManager))
org.apache.logging.log4j.core.config.LoggerConfig))

(def ^:private ^:const max-log-entries 2500)

Expand All @@ -25,7 +28,7 @@
:msg (.getMessage event)
:exception (when-let [throwable (.getThrown event)]
(seq (ExceptionUtils/getStackFrames throwable)))
:process_uuid local-process-uuid})
:process_uuid config/local-process-uuid})

(defn- metabase-appender ^Appender []
(let [^org.apache.logging.log4j.core.Filter filter nil
Expand All @@ -52,3 +55,63 @@
(doseq [[_ ^LoggerConfig logger-config] (.getLoggers config)]
(.addAppender logger-config appender level filter))
(.updateLoggers ctx))))

;;; [[clojure.tools.logging]] should be using our custom Logging factory which performs NICELY and memoizes calls to
;;; `.getLogger` instead of fetching it every single time which is SLOW in Multi-Release JARs. See #16830

(defn- log-level-keyword->Level
"Return the Log4j2 [[Level]] that corresponds to a given [[clojure.tools.logging]] `level-keyword`."
^Level [level-keyword]
(case level-keyword
:off Level/OFF
:trace Level/TRACE
:debug Level/DEBUG
:info Level/INFO
:warn Level/WARN
:error Level/ERROR
:fatal Level/FATAL))

;;; Extend [[org.apache.logging.log4j.Logger]] so it works with [[clojure.tools.logging]].
;;;
;;; Yes, https://github.com/clojure/tools.logging/blob/master/src/main/clojure/clojure/tools/logging/impl.clj#L178-L208
;;; is presumably doing this already but it might not be a good idea to assume that code gets called. It seems like
;;; [[clojure.tools.logging] tries to use the SLF4J logging factory by default rather than the Log4j2 factory. If that
;;; happens then I think those lines don't get called at all. So better safe than sorry I guess.
(extend-protocol log.impl/Logger
Logger
(enabled? [^Logger this level-keyword]
(.isEnabled this (log-level-keyword->Level level-keyword)))

(write! [^Logger this level-keyword ^Throwable e ^Object message]
(let [level (log-level-keyword->Level level-keyword)]
(if e
(.log this level message e)
(.log this level message)))))

(def ^:private ^{:arglists '(^org.apache.logging.log4j.Logger [a-namespace])} ns-logger
"Get the appropriate [[Logger]] to use for a Clojure namespace. `a-namespace` can be either
a [[clojure.lang.Namespace]], a [[clojure.lang.Symbol]], or a [[String]]. This is memoized because `.getLogger` can
be pretty slow and it should never change in a prod run anyway.
When changing loggers at runtime e.g. in [[metabase.test.util.log]] you need to call [[clear-memoized-ns-loggers!]]
so the logger factory will pick them up."
(memoize/memo
(fn [a-namespace]
(.getLogger (LogManager/getContext false) (str a-namespace)))))

(defn clear-memoized-ns-loggers!
"Clear the memoization cache for [[ns-logger]]. This is used when [[metabase.test.util.log/set-ns-log-level!]]
installs new loggers at runtime."
[]
(memoize/memo-clear! ns-logger))

(defrecord MetabaseLoggerFactory []
log.impl/LoggerFactory
(name [_this]
(str `MetabaseLoggerFactory))
(get-logger [_this logger-ns]
(ns-logger logger-ns)))

;; switch out the [[clojure.tools.logging]] logger factory with our custom better-performing version.
(alter-var-root #'log/*logger-factory* (constantly (->MetabaseLoggerFactory)))
(log/infof "Setting clojure.tools.logging factory to %s" `MetabaseLoggerFactory)
10 changes: 2 additions & 8 deletions test/metabase/test/util/log.clj
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
(:require [clojure.test :refer :all]
[clojure.tools.logging :as log]
[clojure.tools.logging.impl :as log.impl]
[metabase.logger :as mb.logger]
[metabase.test-runner.parallel :as test-runner.parallel]
[potemkin :as p]
[schema.core :as s])
Expand All @@ -12,12 +13,6 @@
[org.apache.logging.log4j.core Appender LifeCycle LogEvent Logger LoggerContext]
[org.apache.logging.log4j.core.config Configuration LoggerConfig]))

;; make sure [[clojure.tools.logging]] is using the Log4j2 factory, otherwise the swaps we attempt to do here don't seem
;; to work.
(when-not (= (log.impl/name log/*logger-factory*) "org.apache.logging.log4j")
(alter-var-root #'log/*logger-factory* (constantly (log.impl/log4j2-factory)))
(log/infof "Setting clojure.tools.logging factory to %s" `log.impl/log4j2-factory))

(def ^:private ^:deprecated logger->original-level
(delay
(let [loggers (.getLoggers ^LoggerContext (LogManager/getContext false))]
Expand Down Expand Up @@ -107,7 +102,6 @@
(defn- effective-ns-logger
"Get the logger that will be used for the namespace named by `a-namespace`."
^LoggerConfig [a-namespace]
(assert (= (log.impl/name log/*logger-factory*) "org.apache.logging.log4j"))
(let [^Logger logger (log.impl/get-logger log/*logger-factory* (logger-name a-namespace))]
(.get logger)))

Expand Down Expand Up @@ -147,6 +141,7 @@
(.getFilter parent-logger))]
(.addLogger (configuration) (logger-name a-namespace) new-logger)
(.updateLoggers (logger-context))
(mb.logger/clear-memoized-ns-loggers!)
(println "Created a new logger for" (logger-name a-namespace)))))

(s/defn set-ns-log-level!
Expand Down Expand Up @@ -284,7 +279,6 @@
(logs#)))))))



;;;; tests

(deftest set-ns-log-level!-test
Expand Down

0 comments on commit 1f4bf25

Please sign in to comment.