-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLOS effective method cache doesn't work on specializers which are subtypes of mop:specializer
#587
Comments
I have confirmed that there seems to be a slowdown:
Still not quite sure what is being tested, as there is a fair amount of i/o going on as well as in-memory string creation, not just regexp matching. Performing some basic profiling between 1.8.0 and 1.9.1 would be the next step to figure out what might be causing this. |
Thanks for checking this. I will try to assist. Your slowdown is not as radical as mine. I added the following
and now the tests are like this, with (show-runtime) being like yesterday
All 3 cause slowdown. split certainly creates new strings, scan - I don't think so (but it is complex internally).
So I don't know what to check next, any suggestions please? Also note all 3 tests cons more in 1.9.1. How to force creating thousands of unique strings? I tried this:
1.9.1 is slower again but they both don't cons. |
okay, it does not look like it is string allocation; this example does not allocate any string I think, the (time ...) body is only ppcre:do-scans and it is 13 times slower in 1.9.1 vs 1.8.0. the performance-testing.lisp file
then when I (load "performance-testing"), I get these times
I tried (subseq ...) to split contents of files myself from precalculated positions of newlines, but its performance is very similar between both implementations. I tried to look inside ppcre library and I it looks that for scanning it first parses "\n" into #\newline and passes it as (list :group #\newline) to the next function in https://github.com/edicl/cl-ppcre/blob/master/api.lisp line 68, then line 111 (convert parse-tree) will return an object of type str as its primary value so it will be bound to regex. it looks like then line 144 (create-matcher-aux regex #'identity) will jump to https://github.com/edicl/cl-ppcre/blob/master/closures.lisp line 192.... it is a complex library, is there a way to analyse what is happening without dissecting cl-ppcre? maybe a java profiler can help? what should I check next please? |
(Un)surprisingly, I am not particularly knowledgeable in profiling ABCL. Partly this stems from my general development approach in the course of accompanying the Armed Bear on its journey to ANSI conformance to prefer correctness over efficiency. Usually, unless something is really performing badly, I don't even really attempt to measure or gauge performance differences: as long as the implementation gets "more correct", I take the patch. Oddly, even after twelve years of reaching ANSI conformance, the majority of issues in ABCL involve fixing compatibility or correctness, in a (hopefully) ever asymptotically approaching manner. It would be very useful to have more informational resources towards profiling ABCL. Here is an initial stab of what I know. We should collect these and (hopefully) subsequent note into some aspect of ABCL developer resources (perhaps under https://github.com/armedbear/abcl/tree/master/tools). Bisect the offending changeIf one has a good test that shows the problem, since ABCL doesn't actually get too many patches it might be easiest to use a Git bisect to find the exact commit which causes the performance degradation. The code in https://github.com/armedbear/abcl/blob/master/tools/check.lisp was written for the Mercurial bisect command but should be able to be ported trivially. ABCL currently has the problem of not having tags in git for the various branches #370. I usually use the release dates listed at https://abcl.org/trac/ to find these ranges. Java Profiling with NetbeansSince ABCL runs on the JVM, the "actual" performance is of course how that JVM is performing wrt. to its hosting operating system under the given task. There are many JVM profiling tools, but I only really know anything about the Netbeans IDE profiler https://netbeans.apache.org/kb/docs/java/profiler-intro.html. Netbeans might well be an minority choice as industry-standard practice in 2023 is probably to use either Oracle Java Flight Recorder or plugins for IntelliJ's IDE. Netbeans has the virtues of a) being free as in libre and money and b) being able to use the ABCL source tree as a Netbeans project due to the artifacts in https://github.com/armedbear/abcl/blob/master/nbproject/. Since this instrumentation is happening in the JVM hosting ABCL, such profiling is the least likely to "disturb" whatever performance one is trying to measure. Common Lisp Profilinghttps://github.com/armedbear/abcl/blob/master/src/org/armedbear/lisp/profiler.lisp contains a simple Common Lisp based profiler that measures call and hot counts for method executions. There also seems to be support for recording call times, but I have never used this machinery.
Hope that helps. I will try to find time to look at this in Netbeans, but all my current ABCL time is being spent fixing the ANSI/Gray streams problems with the pretty printer to enable SLIME to work again #586. |
Thanks! All 3 suggestions are helpful. I will probably start with bisecting commit history, and slowly make my way through it. My current hunch is that there are forms that incorrectly fall back to interpreted mode, I remember I saw improvements announcement around this area (possibly to overcome jvm opcodes limit per class). I don't get why Windows seems to be affected most though. It will take some time, so I will not be spamming this thread until I find something useful. I totally agree with your priorities putting correctness and specs conformance first. ABCL is incredibly useful to navigate corporate applications that are mostly written in Java. The fact that I can download libraries from quicklisp and they almost always work is a testimony to how compliant ABCL is, and precisely this hassle free experience makes ABCL so useful. However because I reverted to 1.8.0 when I discovered real world tasks take over 400 seconds in 1.9.1 where they took say 120 seconds in 1.8.0, to avoid being stuck in 1.8.0 forever, I will keep looking into this issue. So thanks again for looking after the Bear :) |
I believe this is the offending commit: d0a8264 2023-02-17 Use ATOMIC-DEFGENERIC for MOP:SPECIALIZER discrimination. I don't understand it but maybe your declaim has typo?
I generated commit.txt by executing
This file is for ABCL being tested. I wrote bisect.lisp that is inspired by your check.lisp, it has to be run under something like sbcl, because it clears cache by removing every directory that starts with abcl-. It has a bug so it never stopped but it generated enough to pinpoint the commit. in sbcl you execute:
it creates the new slow run, the old fast run and starts bisecting. These files are autogenerated: The performance-log.txt shows performance per commit, performance-good-bad.lisp shows pairs of currently discovered newest "good" and oldest "bad" commits - you can see it started looping at commit positions 39 and 40, so it should have stopped, but we probably pinpointed the issue. |
This is definitely a typo. Working on trying to fix this for abcl-1.9.2 <#594> |
Confirmed that changes for <#539> in allowing The performance degradation is due to disabling the use effective method function ("emf") cache <https://github.com/armedbear/abcl/blob/master/src/org/armedbear/lisp/clos.lisp#L2576>. To fix this we need to overhaul the cache <https://github.com/armedbear/abcl/blob/master/src/org/armedbear/lisp/EMFCache.java#L161>. -- Frankly, I need to review the EMF Cache structure, and our MOP implementation to get a handle on this, so I need a little time here. @alanruttenberg From what you understand, with a suitable amount of elbow grease, would it be possible to extend the Java-side caching here to cover |
mop:specializer
I'm going to have to dig deeper to understand what is going on. Not sure I'll be able to until week after next. |
After re-reading Rhodes and Newton 2008, it seems that it would greatly improve our user experience of implementing discrimination based on
Their implementation obviates the need for requiring the reader macro as tested in @bohonghuang's motivating code <file:t/mop-specializer> |
Could you try your tests with the following change: diff --git a/src/org/armedbear/lisp/clos.lisp b/src/org/armedbear/lisp/clos.lisp
index 9b33cc08..b0fc9944 100644
--- a/src/org/armedbear/lisp/clos.lisp
+++ b/src/org/armedbear/lisp/clos.lisp
@@ -2573,7 +2573,7 @@ to ~S with argument list ~S."
(wrap-emfun-for-keyword-args-check gf emfun non-keyword-args
applicable-keywords)))
;;; Cache only understand classes and eql specializers
- (when (eq (class-of gf) 'standard-generic-function)
+ (when (typep gf 'standard-generic-function)
(cache-emf gf args emfun))
(funcall emfun args))
(apply #'no-applicable-method gf args)))) I believe that my previous check was wrong, causing there to be nothing added to the emf-cache. |
This test, however, is wrong I realize. Bleh. I don't think I want to mess with the Java EMF code, so I want to exclude caching methods whose arguments are specializers other than eql. That test won't do it. To be continued. At least the test will verify that that line is the bottleneck. |
Ok, this is a better check. It checks if any of the applicable method have a specializer other than eql-specializer or class and if so doesn't use the cache. diff --git a/src/org/armedbear/lisp/clos.lisp b/src/org/armedbear/lisp/clos.lisp
index 9b33cc08..1af50659 100644
--- a/src/org/armedbear/lisp/clos.lisp
+++ b/src/org/armedbear/lisp/clos.lisp
@@ -2572,8 +2572,15 @@ to ~S with argument list ~S."
(setf emfun
(wrap-emfun-for-keyword-args-check gf emfun non-keyword-args
applicable-keywords)))
- ;;; Cache only understand classes and eql specializers
- (when (eq (class-of gf) 'standard-generic-function)
+ ;; Cache only understand classes and eql specializers. Check
+ ;; the applicable methods and if any have a specializer that
+ ;; isn't an eql-specializer or class. don't cache
+ (unless (some (lambda(m)
+ (some
+ (lambda(x)
+ (and (typep x 'specializer) (not (typep x 'eql-specializer)) (not (typep x 'class))))
+ (method-specializers m)))
+ applicable-methods)
(cache-emf gf args emfun))
(funcall emfun args))
(apply #'no-applicable-method gf args)))) |
Ok, I've read the java. My understanding is that it can not handle arbitrary specializers. The cache key is list of types of arguments and a special eqlspecializer object. To compute the key when caching the args are traversed and first the arg is checked to see if it has ever been used as a specializer. If so it's considered an eql specializer otherwise the class is used. The list of specializers/classes is the key that will be looked up. This leads to a surprising situation in which a given method might have several cache entries, some with an eql specializer in a position that was not eql specialized. Comment from the code is below: * <p>Consider:
* <pre><tt>
* (defgeneric f (a b))
*
* (defmethod f (a (b (eql 'symbol)))
* "T (EQL 'SYMBOL)")
*
* (defmethod f ((a symbol) (b (eql 'symbol)))
* "SYMBOL (EQL 'SYMBOL)")
*
* (f 12 'symbol)
* => "T (EQL 'SYMBOL)"
*
* (f 'twelve 'symbol)
* => "SYMBOL (EQL 'SYMBOL)"
*
* (f 'symbol 'symbol)
* => "SYMBOL (EQL 'SYMBOL)"
*
* </tt></pre>
*
* After the two above calls <tt>cache</tt> will contain three keys:
* <pre>
* { class FIXNUM, EqlSpecialization('SYMBOL) }
* { class SYMBOL, EqlSpecialization('SYMBOL) }
* { EqlSpecialization('SYMBOL), EqlSpecialization('SYMBOL) }.
* </pre> With some other kind of specializer you would need to figure out what to use as hash key for arguments in that position. Whether there is any useful hash to compute really depends on the type of specializer and that knowledge would have to be exposed somehow to the cache functions. So my current thinking is that the best we an do is the check in #587 (comment) . Everything I'm saying should be checked, of course... |
As another example suppose you have (defmethod foo ((a number)) (print 'foo))
The multiple entries are so that on retrieval the cache lookup just as to look at the list of classes of the arguments to get a key used to look up the method. So bottom line the EMF tries to do as little work as possible when doing a lookup, trading that ability for extra space in the cache and potentially multiple calls (one for each unique list of argument classes) to the slow dispatch routine and corresponding multiple entries in the cache. A custom specializer can do arbitrary work to determine whether an argument falls under the specializer, work that has no place to be done in the cache code and even if you did so it's not clear that one could come up with something that can be part of a hash key. |
I don't think space is much of an issue in our EMFCache implementation at this point: avoiding the slow dispatch routines seems much more important.
I feared that the arbitrary behavior from custom specializer precludes a caching strategy, so thanks for providing more confirmation for that hunch. Your refined test in <#587 (comment)> makes more sense as well now. I will try applying that to see if the speed loss in |
With the refined test for using the cache, the speed loss drops considerably under an order of a magnitude difference: we are still somewhat slower than abcl-1.8.0, but within an acceptable value to consider this issue addressed, at least for the time being. If anyone has an idea how to come up with a better caching strategy we could implement, please pipe up. |
Addresses <armedbear#587> From <armedbear#587 (comment)>. -- <https://github.com/PiotrWasik> found that abcl-1.9.1 had around an order of magnitude slowdown from abcl-1.8.0 with the subsequent code, which is included here for completeness. (load "c:/Users/piotr/quicklisp/setup.lisp") (funcall (read-from-string "ql:quickload") :cl-fad) (funcall (read-from-string "ql:quickload") :cl-ppcre) (funcall (read-from-string "ql:quickload") :arnesi) |# (asdf:load-system :cl-fad) (asdf:load-system :cl-ppcre) (asdf:load-system :arnesi) (defun load-all-source-files () (let (content-of-files) (cl-fad:walk-directory #+nil "c:/dev/usr/abcl-bin-1.9.1/abcl-1.9.1-sources/" #p"~/work/abcl/src" (lambda (filename) (push (arnesi:read-string-from-file filename) content-of-files))) content-of-files)) (defun show-runtime-10 () (let ((content-of-files (load-all-source-files)) (start-time (get-internal-real-time))) (dolist (content content-of-files) (cl-ppcre:do-scans (match-start match-end rs re "\\n" content))) (float (/ (- (get-internal-real-time) start-time) internal-time-units-per-second)))) ;; (princ "test Version 10") ;; (princ (multiple-value-list (lisp-implementation-version))) (defun report () (arnesi:write-string-to-file (format nil "(~s ~s ~s)~%" (get-universal-time) (first ext:*command-line-argument-list*) (show-runtime-10)) "c:/dev/home/performance-log.txt" :if-exists :append :if-does-not-exist :create)) (report) Without patching declaim 1.426 Patched 1.628 |#
Addresses <#587> From <#587 (comment)>. -- <https://github.com/PiotrWasik> found that abcl-1.9.1 had around an order of magnitude slowdown from abcl-1.8.0 with the subsequent code, which is included here for completeness. (load "c:/Users/piotr/quicklisp/setup.lisp") (funcall (read-from-string "ql:quickload") :cl-fad) (funcall (read-from-string "ql:quickload") :cl-ppcre) (funcall (read-from-string "ql:quickload") :arnesi) |# (asdf:load-system :cl-fad) (asdf:load-system :cl-ppcre) (asdf:load-system :arnesi) (defun load-all-source-files () (let (content-of-files) (cl-fad:walk-directory #+nil "c:/dev/usr/abcl-bin-1.9.1/abcl-1.9.1-sources/" #p"~/work/abcl/src" (lambda (filename) (push (arnesi:read-string-from-file filename) content-of-files))) content-of-files)) (defun show-runtime-10 () (let ((content-of-files (load-all-source-files)) (start-time (get-internal-real-time))) (dolist (content content-of-files) (cl-ppcre:do-scans (match-start match-end rs re "\\n" content))) (float (/ (- (get-internal-real-time) start-time) internal-time-units-per-second)))) ;; (princ "test Version 10") ;; (princ (multiple-value-list (lisp-implementation-version))) (defun report () (arnesi:write-string-to-file (format nil "(~s ~s ~s)~%" (get-universal-time) (first ext:*command-line-argument-list*) (show-runtime-10)) "c:/dev/home/performance-log.txt" :if-exists :append :if-does-not-exist :create)) (report) Without patching declaim 1.426 Patched 1.628 |#
abcl 1.9.1 performance - regexp operations 8 times slower than in abcl 1.8.0
To replicate (on Windows),
C:\dev\home where you will run cmd.exe
C:\dev\usr\abcl-bin-1.8.0 (with abcl.jar in it)
C:\dev\usr\abcl-bin-1.9.1 (with abcl.jar in it)
now cmd.exe, cd c:\dev\home
run-abcl-1.9.1.bat
(load "c:/dev/home/performance-testing")
(quit) ;; this was just to get all dependencies downloaded, not sure why, but I had an impression I could not replicate it in the first run when I was downloading dependencies from quicklisp
clear your quicklisp compilation cache, in my case C:\Users\piotr\AppData\Local\cache\common-lisp
then again
run-abcl-1.9.1.bat
(load "c:/dev/home/performance-testing")
(show-runtime)
164.31 seconds real time
245270175 cons cells
NIL
(quit)
run-abcl-1.8.0.bat
(load "c:/dev/home/performance-testing")
(show-runtime)
18.858 seconds real time
37650729 cons cells
NIL
(quit)
The content of the performance-testing.lisp file
testing-log.txt
my-abclrc.txt
run-abcl-1.8.0.bat.txt
run-abcl-1.9.1.bat.txt
performance-testing.lisp.txt
The text was updated successfully, but these errors were encountered: