Issue python#13165: stringbench is now available in the Tools/stringb…

…ench folder. It used to live in its own SVN project.
elprans · Apr 9, 2012 · 1584ae3 · 1584ae3
1 parent 75d9aca
commit 1584ae3
Show file tree

Hide file tree

Showing 4 changed files with 1,560 additions and 0 deletions.
diff --git a/Misc/NEWS b/Misc/NEWS
@@ -57,6 +57,12 @@ Tests
 - Issue #14355: Regrtest now supports the standard unittest test loading, and
   will use it if a test file contains no `test_main` method.
 
+Tools / Demos
+-------------
+
+- Issue #13165: stringbench is now available in the Tools/stringbench folder.
+  It used to live in its own SVN project.
+
 
 What's New in Python 3.3.0 Alpha 2?
 ===================================

diff --git a/Tools/README b/Tools/README
@@ -32,6 +32,9 @@ scripts         A number of useful single-file programs, e.g. tabnanny.py
                 tabs and spaces, and 2to3, which converts Python 2 code
                 to Python 3 code.
 
+stringbench     A suite of micro-benchmarks for various operations on
+                strings (both 8-bit and unicode).
+
 test2to3        A demonstration of how to use 2to3 transparently in setup.py.
 
 unicode         Tools for generating unicodedata and codecs from unicode.org

diff --git a/Tools/stringbench/README b/Tools/stringbench/README
@@ -0,0 +1,68 @@
+stringbench is a set of performance tests comparing byte string
+operations with unicode operations.  The two string implementations
+are loosely based on each other and sometimes the algorithm for one is
+faster than the other.
+
+These test set was started at the Need For Speed sprint in Reykjavik
+to identify which string methods could be sped up quickly and to
+identify obvious places for improvement.
+
+Here is an example of a benchmark
+
+
+@bench('"Andrew".startswith("A")', 'startswith single character', 1000)
+def startswith_single(STR):
+    s1 = STR("Andrew")
+    s2 = STR("A")
+    s1_startswith = s1.startswith
+    for x in _RANGE_1000:
+        s1_startswith(s2)
+
+The bench decorator takes three parameters.  The first is a short
+description of how the code works.  In most cases this is Python code
+snippet.  It is not the code which is actually run because the real
+code is hand-optimized to focus on the method being tested.
+
+The second parameter is a group title.  All benchmarks with the same
+group title are listed together.  This lets you compare different
+implementations of the same algorithm, such as "t in s"
+vs. "s.find(t)".
+
+The last is a count.  Each benchmark loops over the algorithm either
+100 or 1000 times, depending on the algorithm performance.  The output
+time is the time per benchmark call so the reader needs a way to know
+how to scale the performance.
+
+These parameters become function attributes.
+
+
+Here is an example of the output
+
+
+========== count newlines
+38.54   41.60   92.7    ...text.with.2000.newlines.count("\n") (*100)
+========== early match, single character
+1.14    1.18    96.8    ("A"*1000).find("A") (*1000)
+0.44    0.41    105.6   "A" in "A"*1000 (*1000)
+1.15    1.17    98.1    ("A"*1000).index("A") (*1000)
+
+The first column is the run time in milliseconds for byte strings.
+The second is the run time for unicode strings.  The third is a
+percentage; byte time / unicode time.  It's the percentage by which
+unicode is faster than byte strings.
+
+The last column contains the code snippet and the repeat count for the
+internal benchmark loop.
+
+The times are computed with 'timeit.py' which repeats the test more
+and more times until the total time takes over 0.2 seconds, returning
+the best time for a single iteration.
+
+The final line of the output is the cumulative time for byte and
+unicode strings, and the overall performance of unicode relative to
+bytes.  For example
+
+4079.83 5432.25 75.1    TOTAL
+
+However, this has no meaning as it evenly weights every test.
+