Bug fixes

jw · May 28, 2008 · d2151e3 · d2151e3
1 parent 241e3c7
commit d2151e3
Show file tree

Hide file tree

Showing 17 changed files with 58 additions and 71 deletions.
diff --git a/ANNOUNCE b/ANNOUNCE
@@ -1,11 +1,11 @@
-May 17, 2008
+May 28, 2008
 
-                  Announcing :  PLY-2.4 (Python Lex-Yacc)
+                  Announcing :  PLY-2.5 (Python Lex-Yacc)
 
                         http://www.dabeaz.com/ply
 
 I'm pleased to announce a significant new update to PLY---a 100% Python
-implementation of the common parsing tools lex and yacc.  PLY-2.4 fixes
+implementation of the common parsing tools lex and yacc.  PLY-2.5 fixes
 some bugs in error handling and provides some performance improvements.
 
 If you are new to PLY, here are a few highlights:
@@ -29,13 +29,6 @@ If you are new to PLY, here are a few highlights:
    problems. Currently, PLY can build its parsing tables using 
    either SLR or LALR(1) algorithms. 
 
--  PLY can be used to build parsers for large programming languages.
-   Although it is not ultra-fast due to its Python implementation,
-   PLY can be used to parse grammars consisting of several hundred
-   rules (as might be found for a language like C).  The lexer and LR
-   parser are also reasonably efficient when parsing normal
-   sized programs.
-
 More information about PLY can be obtained on the PLY webpage at:
 
                    http://www.dabeaz.com/ply

diff --git a/README b/README
@@ -1,8 +1,8 @@
-PLY (Python Lex-Yacc)                   Version 2.4  (May, 2008)
+PLY (Python Lex-Yacc)                   Version 2.5  (May 28, 2008)
 
 David M. Beazley ([email protected])
 
-Copyright (C) 2001-2007   David M. Beazley
+Copyright (C) 2001-2008   David M. Beazley
 
 This library is free software; you can redistribute it and/or
 modify it under the terms of the GNU Lesser General Public

diff --git a/doc/ply.html b/doc/ply.html
@@ -12,7 +12,7 @@ <h1>PLY (Python Lex-Yacc)</h1>
 </b>
 
 <p>
-<b>PLY Version: 2.4</b>
+<b>PLY Version: 2.5</b>
 <p>
 
 <!-- INDEX -->
@@ -472,7 +472,8 @@ <H3><a name="ply_nn7"></a>3.4 Token values</H3>
 </blockquote>
 
 It is important to note that storing data in other attribute names is <em>not</em> recommended.  The <tt>yacc.py</tt> module only exposes the
-contents of the <tt>value</tt> attribute.  Thus, accessing other attributes may  be unnecessarily awkward.
+contents of the <tt>value</tt> attribute.  Thus, accessing other attributes may  be unnecessarily awkward.   If you
+need to store multiple values on a token, assign a tuple, dictionary, or instance to <tt>value</tt>.
 
 <H3><a name="ply_nn8"></a>3.5 Discarded tokens</H3>
 
@@ -894,14 +895,18 @@ <H3><a name="ply_nn17"></a>3.14 Alternative specification of lexers</H3>
 </pre>
 </blockquote>
 
-For reasons that are subtle, you should <em>NOT</em> invoke <tt>lex.lex()</tt> inside the <tt>__init__()</tt> method of your class.  If you
-do, it may cause bizarre behavior if someone tries to duplicate a lexer object.  Keep reading.
+When building a lexer from class, you should construct the lexer from
+an instance of the class, not the class object itself.  Also, for
+reasons that are subtle, you should <em>NOT</em>
+invoke <tt>lex.lex()</tt> inside the <tt>__init__()</tt> method of
+your class.  If you do, it may cause bizarre behavior if someone tries
+to duplicate a lexer object. 
 
 <H3><a name="ply_nn18"></a>3.15 Maintaining state</H3>
 
-
 In your lexer, you may want to maintain a variety of state information.  This might include mode settings, symbol tables, and other details.  There are a few
-different ways to handle this situation.  First, you could just keep some global variables:
+different ways to handle this situation.  One way to do this is to keep a set of global variables in the module
+where you created the lexer.   For example:
 
 <blockquote>
 <pre>
@@ -940,9 +945,9 @@ <H3><a name="ply_nn18"></a>3.15 Maintaining state</H3>
 </blockquote>
 
 This latter approach has the advantage of storing information inside
-the lexer itself---something that may be useful if multiple instances
+the lexer object itself---something that may be useful if multiple instances
 of the same lexer have been created.  However, it may also feel kind
-of "hacky" to the purists.  Just to put their mind at some ease, all
+of "hacky" to the OO purists.  Just to put their mind at some ease, all
 internal attributes of the lexer (with the exception of <tt>lineno</tt>) have names that are prefixed
 by <tt>lex</tt> (e.g., <tt>lexdata</tt>,<tt>lexpos</tt>, etc.).  Thus,
 it should be perfectly safe to store attributes in the lexer that
@@ -977,12 +982,12 @@ <H3><a name="ply_nn18"></a>3.15 Maintaining state</H3>
 </pre>
 </blockquote>
 
-The class approach may be the easiest to manage if your application is going to be creating multiple instances of the same lexer and
-you need to manage a lot of state.  
+The class approach may be the easiest to manage if your application is
+going to be creating multiple instances of the same lexer and you need
+to manage a lot of state.
 
 <H3><a name="ply_nn19"></a>3.16 Lexer cloning</H3>
 
-
 <p>
 If necessary, a lexer object can be quickly duplicated by invoking its <tt>clone()</tt> method.  For example:
 

diff --git a/example/ansic/clex.py b/example/ansic/clex.py
@@ -143,12 +143,12 @@ def t_ID(t):
 # Comments
 def t_comment(t):
     r'/\*(.|\n)*?\*/'
-    t.lineno += t.value.count('\n')
+    t.lexer.lineno += t.value.count('\n')
 
 # Preprocessor directive (ignored)
 def t_preprocessor(t):
     r'\#(.)*?\n'
-    t.lineno += 1
+    t.lexer.lineno += 1
 
 def t_error(t):
     print "Illegal character %s" % repr(t.value[0])

diff --git a/example/yply/ylex.py b/example/yply/ylex.py
@@ -42,7 +42,7 @@ def t_SECTION(t):
 # Comments
 def t_ccomment(t):
     r'/\*(.|\n)*?\*/'
-    t.lineno += t.value.count('\n')
+    t.lexer.lineno += t.value.count('\n')
 
 t_ignore_cppcomment = r'//.*'
 
@@ -95,7 +95,7 @@ def t_code_error(t):
     raise RuntimeError
 
 def t_error(t):
-    print "%d: Illegal character '%s'" % (t.lineno, t.value[0])
+    print "%d: Illegal character '%s'" % (t.lexer.lineno, t.value[0])
     print t.value
     t.lexer.skip(1)
 

diff --git a/ply/lex.py b/ply/lex.py
@@ -22,7 +22,7 @@
 # See the file COPYING for a complete copy of the LGPL.
 # -----------------------------------------------------------------------------
 
-__version__    = "2.4"
+__version__    = "2.5"
 __tabversion__ = "2.4"       # Version of table file used
 
 import re, sys, types, copy, os
@@ -89,6 +89,7 @@ def __init__(self):
         self.lexretext = None         # Current regular expression strings
         self.lexstatere = {}          # Dictionary mapping lexer states to master regexs
         self.lexstateretext = {}      # Dictionary mapping lexer states to regex strings
+        self.lexstaterenames = {}     # Dictionary mapping lexer states to symbol names
         self.lexstate = "INITIAL"     # Current lexer state
         self.lexstatestack = []       # Stack of lexer states
         self.lexstateinfo = None      # State information
@@ -161,7 +162,7 @@ def writetab(self,tabfile,outputdir=""):
         for key, lre in self.lexstatere.items():
              titem = []
              for i in range(len(lre)):
-                  titem.append((self.lexstateretext[key][i],_funcs_to_names(lre[i][1],key,initialfuncs)))
+                  titem.append((self.lexstateretext[key][i],_funcs_to_names(lre[i][1],self.lexstaterenames[key][i])))
              tabre[key] = titem
 
         tf.write("_lexstatere   = %s\n" % repr(tabre))
@@ -409,20 +410,11 @@ def _validate_file(filename):
 # suitable for output to a table file
 # -----------------------------------------------------------------------------
 
-def _funcs_to_names(funclist,state,initial):
-    # If this is the initial state, we clear the state and initial list
-    if state == 'INITIAL': 
-        state = ""
-        initial = []
+def _funcs_to_names(funclist,namelist):
     result = []
-    for f in funclist:
+    for f,name in zip(funclist,namelist):
          if f and f[0]:
-             # If a function is defined,  make sure it's name corresponds to the correct state
-             if not initial or f in initial:
-                 statestr = "t_"
-             else:
-                 statestr = "t_"+state+"_"
-             result.append((statestr+ f[1],f[1]))
+             result.append((name, f[1]))
          else:
              result.append(f)
     return result
@@ -459,25 +451,27 @@ def _form_master_re(relist,reflags,ldict,toknames):
 
         # Build the index to function map for the matching engine
         lexindexfunc = [ None ] * (max(lexre.groupindex.values())+1)
+        lexindexnames = lexindexfunc[:]
+
         for f,i in lexre.groupindex.items():
             handle = ldict.get(f,None)
             if type(handle) in (types.FunctionType, types.MethodType):
                 lexindexfunc[i] = (handle,toknames[f])
+                lexindexnames[i] = f
             elif handle is not None:
-                # If rule was specified as a string, we build an anonymous
-                # callback function to carry out the action
+                lexindexnames[i] = f
                 if f.find("ignore_") > 0:
                     lexindexfunc[i] = (None,None)
                 else:
                     lexindexfunc[i] = (None, toknames[f])
-
-        return [(lexre,lexindexfunc)],[regex]
+        
+        return [(lexre,lexindexfunc)],[regex],[lexindexnames]
     except Exception,e:
         m = int(len(relist)/2)
         if m == 0: m = 1
-        llist, lre = _form_master_re(relist[:m],reflags,ldict,toknames)
-        rlist, rre = _form_master_re(relist[m:],reflags,ldict,toknames)
-        return llist+rlist, lre+rre
+        llist, lre, lnames = _form_master_re(relist[:m],reflags,ldict,toknames)
+        rlist, rre, rnames = _form_master_re(relist[m:],reflags,ldict,toknames)
+        return llist+rlist, lre+rre, lnames+rnames
 
 # -----------------------------------------------------------------------------
 # def _statetoken(s,names)
@@ -794,9 +788,10 @@ def lex(module=None,object=None,debug=0,optimize=0,lextab="lextab",reflags=0,now
     # Build the master regular expressions
 
     for state in regexs.keys():
-        lexre, re_text = _form_master_re(regexs[state],reflags,ldict,toknames)
+        lexre, re_text, re_names = _form_master_re(regexs[state],reflags,ldict,toknames)
         lexobj.lexstatere[state] = lexre
         lexobj.lexstateretext[state] = re_text
+        lexobj.lexstaterenames[state] = re_names
         if debug:
             for i in range(len(re_text)):
                  print "lex: state '%s'. regex[%d] = '%s'" % (state, i, re_text[i])
@@ -806,6 +801,7 @@ def lex(module=None,object=None,debug=0,optimize=0,lextab="lextab",reflags=0,now
         if state != "INITIAL" and type == 'inclusive':
              lexobj.lexstatere[state].extend(lexobj.lexstatere['INITIAL'])
              lexobj.lexstateretext[state].extend(lexobj.lexstateretext['INITIAL'])
+             lexobj.lexstaterenames[state].extend(lexobj.lexstaterenames['INITIAL'])
 
     lexobj.lexstateinfo = stateinfo
     lexobj.lexre = lexobj.lexstatere["INITIAL"]
@@ -888,7 +884,10 @@ def runmain(lexer=None,data=None):
 
 def TOKEN(r):
     def set_doc(f):
-        f.__doc__ = r
+        if callable(r):
+            f.__doc__ = r.__doc__
+        else:
+            f.__doc__ = r
         return f
     return set_doc
 

diff --git a/ply/yacc.py b/ply/yacc.py
@@ -50,7 +50,7 @@
 # own risk!
 # ----------------------------------------------------------------------------
 
-__version__    = "2.4"
+__version__    = "2.5"
 __tabversion__ = "2.4"       # Table version
 
 #-----------------------------------------------------------------------------

diff --git a/test/lex_ignore.exp b/test/lex_ignore.exp
@@ -2,6 +2,5 @@
 Traceback (most recent call last):
   File "./lex_ignore.py", line 29, in <module>
     lex.lex()
-  File "../ply/lex.py", line 772, in lex
-    raise SyntaxError,"lex: Unable to build lexer."
+  File "../../ply/lex.py", line 783, in lex
 SyntaxError: lex: Unable to build lexer.
diff --git a/test/lex_re1.exp b/test/lex_re1.exp
@@ -2,6 +2,5 @@ lex: Invalid regular expression for rule 't_NUMBER'. unbalanced parenthesis
 Traceback (most recent call last):
   File "./lex_re1.py", line 25, in <module>
     lex.lex()
-  File "../ply/lex.py", line 772, in lex
-    raise SyntaxError,"lex: Unable to build lexer."
+  File "../../ply/lex.py", line 783, in lex
 SyntaxError: lex: Unable to build lexer.
diff --git a/test/lex_re2.exp b/test/lex_re2.exp
@@ -2,6 +2,5 @@ lex: Regular expression for rule 't_PLUS' matches empty string.
 Traceback (most recent call last):
   File "./lex_re2.py", line 25, in <module>
     lex.lex()
-  File "../ply/lex.py", line 772, in lex
-    raise SyntaxError,"lex: Unable to build lexer."
+  File "../../ply/lex.py", line 783, in lex
 SyntaxError: lex: Unable to build lexer.
diff --git a/test/lex_re3.exp b/test/lex_re3.exp
@@ -3,6 +3,5 @@ lex: Make sure '#' in rule 't_POUND' is escaped with '\#'.
 Traceback (most recent call last):
   File "./lex_re3.py", line 27, in <module>
     lex.lex()
-  File "../ply/lex.py", line 772, in lex
-    raise SyntaxError,"lex: Unable to build lexer."
+  File "../../ply/lex.py", line 783, in lex
 SyntaxError: lex: Unable to build lexer.
diff --git a/test/lex_state1.exp b/test/lex_state1.exp
@@ -2,6 +2,5 @@ lex: states must be defined as a tuple or list.
 Traceback (most recent call last):
   File "./lex_state1.py", line 38, in <module>
     lex.lex()
-  File "../ply/lex.py", line 772, in lex
-    raise SyntaxError,"lex: Unable to build lexer."
+  File "../../ply/lex.py", line 783, in lex
 SyntaxError: lex: Unable to build lexer.
diff --git a/test/lex_state2.exp b/test/lex_state2.exp
@@ -3,6 +3,5 @@ lex: invalid state specifier 'example'. Must be a tuple (statename,'exclusive|in
 Traceback (most recent call last):
   File "./lex_state2.py", line 38, in <module>
     lex.lex()
-  File "../ply/lex.py", line 772, in lex
-    raise SyntaxError,"lex: Unable to build lexer."
+  File "../../ply/lex.py", line 783, in lex
 SyntaxError: lex: Unable to build lexer.
diff --git a/test/lex_state3.exp b/test/lex_state3.exp
@@ -3,6 +3,5 @@ lex: No rules defined for state 'example'
 Traceback (most recent call last):
   File "./lex_state3.py", line 40, in <module>
     lex.lex()
-  File "../ply/lex.py", line 772, in lex
-    raise SyntaxError,"lex: Unable to build lexer."
+  File "../../ply/lex.py", line 783, in lex
 SyntaxError: lex: Unable to build lexer.
diff --git a/test/lex_state4.exp b/test/lex_state4.exp
@@ -2,6 +2,5 @@ lex: state type for state comment must be 'inclusive' or 'exclusive'
 Traceback (most recent call last):
   File "./lex_state4.py", line 39, in <module>
     lex.lex()
-  File "../ply/lex.py", line 772, in lex
-    raise SyntaxError,"lex: Unable to build lexer."
+  File "../../ply/lex.py", line 783, in lex
 SyntaxError: lex: Unable to build lexer.
diff --git a/test/lex_state5.exp b/test/lex_state5.exp
@@ -2,6 +2,5 @@ lex: state 'comment' already defined.
 Traceback (most recent call last):
   File "./lex_state5.py", line 40, in <module>
     lex.lex()
-  File "../ply/lex.py", line 772, in lex
-    raise SyntaxError,"lex: Unable to build lexer."
+  File "../../ply/lex.py", line 783, in lex
 SyntaxError: lex: Unable to build lexer.
diff --git a/test/lex_state_norule.exp b/test/lex_state_norule.exp
@@ -2,6 +2,5 @@ lex: No rules defined for state 'example'
 Traceback (most recent call last):
   File "./lex_state_norule.py", line 40, in <module>
     lex.lex()
-  File "../ply/lex.py", line 772, in lex
-    raise SyntaxError,"lex: Unable to build lexer."
+  File "../../ply/lex.py", line 783, in lex
 SyntaxError: lex: Unable to build lexer.