Skip to content

Commit

Permalink
Flexible inline (Python-Markdown#629)
Browse files Browse the repository at this point in the history
Add new InlineProcessor class that handles inline processing much better and allows for more flexibility. This adds new InlineProcessors that no longer utilize unnecessary pretext and posttext captures. New class can accept the buffer that is being worked on and manually process the text without regex and return new replacement bounds. This helps us to handle links in a better way and handle nested brackets and logic that is too much for regular expression. The refactor also allows image links to have links/paths with spaces like links. Ref Python-Markdown#551, Python-Markdown#613, Python-Markdown#590, Python-Markdown#161.
  • Loading branch information
facelessuser authored and waylan committed Jan 18, 2018
1 parent de9cc42 commit d18c3d0
Show file tree
Hide file tree
Showing 16 changed files with 785 additions and 184 deletions.
109 changes: 108 additions & 1 deletion docs/extensions/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,8 @@ class MyPreprocessor(Preprocessor):

## Inline Patterns {: #inlinepatterns }

### Legacy

Inline Patterns implement the inline HTML element syntax for Markdown such as
`*emphasis*` or `[links](http://example.com)`. Pattern objects should be
instances of classes that inherit from `markdown.inlinepatterns.Pattern` or
Expand Down Expand Up @@ -85,7 +87,7 @@ from markdown.util import etree
class EmphasisPattern(Pattern):
def handleMatch(self, m):
el = etree.Element('em')
el.text = m.group(3)
el.text = m.group(2)
return el
```

Expand All @@ -110,8 +112,113 @@ implemented with separate instances of the `SimpleTagPattern` listed below.
Feel free to use or extend any of the Pattern classes found at
`markdown.inlinepatterns`.

### Future

While users can still create plugins with the existing
`markdown.inlinepatterns.Pattern`, a new, more flexible inline processor has
been added which users are encouraged to migrate to. The new inline processor
is found at `markdown.inlinepatterns.InlineProcessor`.

The new processor is very similar to legacy with two major distinctions.

1. Patterns no longer need to match the entire block, so patterns no longer
start with `r'^(.*?)'` and end with `r'(.*?)!'`. This was a huge
performance sink and this requirement has been removed. The returned match
object will only contain what is explicitly matched in the pattern, and
extension pattern groups now start with `m.group(1)`.

2. The `handleMatch` method now takes an additional input called `data`,
which is the entire block under analysis, not just what is matched with
the specified pattern. The method also returns the element *and* the index
boundaries relative to `data` that the return element is replacing
(usually `m.start(0)` and `m.end(0)`). If the boundaries are returned as
`None`, it is assumed that the match did not take place, and nothing will
be altered in `data`.

If all you need is the same functionality as the legacy processor, you can do
as shown below. Most of the time, simple regular expression processing is all
you'll need.

```python
from markdown.inlinepatterns import InlineProcessor
from markdown.util import etree

# an oversimplified regex
MYPATTERN = r'\*([^*]+)\*'

class EmphasisPattern(InlineProcessor):
def handleMatch(self, m, data):
el = etree.Element('em')
el.text = m.group(1)
return el, m.start(0), m.end(0)

# pass in pattern and create instance
emphasis = EmphasisPattern(MYPATTERN)
```

But, the new processor allows you handle much more complex patterns that are
too much for Python's Re to handle. For instance, to handle nested brackets in
link patterns, the built-in link inline processor uses the following pattern to
find where a link *might* start:

```python
LINK_RE = NOIMG + r'\['
link = LinkInlineProcessor(LINK_RE, md_instance)
```

It then uses programmed logic to actually walk the string (`data`), starting at
where the match started (`m.start(0)`). If for whatever reason, the text
does not appear to be a link, it returns `None` for the start and end boundary
in order to communicate to the parser that no match was found.

```python
# Just a snippet of of the link's handleMatch
# method to illustrate new logic
def handleMatch(self, m, data):
text, index, handled = self.getText(data, m.end(0))

if not handled:
return None, None, None

href, title, index, handled = self.getLink(data, index)
if not handled:
return None, None, None

el = util.etree.Element("a")
el.text = text

el.set("href", href)

if title is not None:
el.set("title", title)

return el, m.start(0), index
```

### Generic Pattern Classes

Some example processors that are available.

* **`SimpleTextInlineProcessor(pattern)`**:

Returns simple text of `group(2)` of a `pattern` and the start and end
position of the match.

* **`SimpleTagInlineProcessor(pattern, tag)`**:

Returns an element of type "`tag`" with a text attribute of `group(3)`
of a `pattern`. `tag` should be a string of a HTML element (i.e.: 'em').
It also returns the start and end position of the match.

* **`SubstituteTagInlineProcessor(pattern, tag)`**:

Returns an element of type "`tag`" with no children or text (i.e.: `br`)
and the start and end position of the match.

A very small number of the basic legacy processors are still available to
prevent breakage of 3rd party extensions during the transition period to the
new processors. Three of the available processors are listed below.

* **`SimpleTextPattern(pattern)`**:

Returns simple text of `group(2)` of a `pattern`.
Expand Down
12 changes: 6 additions & 6 deletions markdown/extensions/abbr.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
from __future__ import unicode_literals
from . import Extension
from ..preprocessors import Preprocessor
from ..inlinepatterns import Pattern
from ..inlinepatterns import InlineProcessor
from ..util import etree, AtomicString
import re

Expand Down Expand Up @@ -52,7 +52,7 @@ def run(self, lines):
abbr = m.group('abbr').strip()
title = m.group('title').strip()
self.markdown.inlinePatterns['abbr-%s' % abbr] = \
AbbrPattern(self._generate_pattern(abbr), title)
AbbrInlineProcessor(self._generate_pattern(abbr), title)
# Preserve the line to prevent raw HTML indexing issue.
# https://github.com/Python-Markdown/markdown/issues/584
new_text.append('')
Expand All @@ -76,18 +76,18 @@ def _generate_pattern(self, text):
return r'(?P<abbr>\b%s\b)' % (r''.join(chars))


class AbbrPattern(Pattern):
class AbbrInlineProcessor(InlineProcessor):
""" Abbreviation inline pattern. """

def __init__(self, pattern, title):
super(AbbrPattern, self).__init__(pattern)
super(AbbrInlineProcessor, self).__init__(pattern)
self.title = title

def handleMatch(self, m):
def handleMatch(self, m, data):
abbr = etree.Element('abbr')
abbr.text = AtomicString(m.group('abbr'))
abbr.set('title', self.title)
return abbr
return abbr, m.start(0), m.end(0)


def makeExtension(**kwargs): # pragma: no cover
Expand Down
16 changes: 8 additions & 8 deletions markdown/extensions/footnotes.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
from __future__ import unicode_literals
from . import Extension
from ..preprocessors import Preprocessor
from ..inlinepatterns import Pattern
from ..inlinepatterns import InlineProcessor
from ..treeprocessors import Treeprocessor
from ..postprocessors import Postprocessor
from .. import util
Expand Down Expand Up @@ -77,7 +77,7 @@ def extendMarkdown(self, md, md_globals):
# Insert an inline pattern before ImageReferencePattern
FOOTNOTE_RE = r'\[\^([^\]]*)\]' # blah blah [^1] blah
md.inlinePatterns.add(
"footnote", FootnotePattern(FOOTNOTE_RE, self), "<reference"
"footnote", FootnoteInlineProcessor(FOOTNOTE_RE, self), "<reference"
)
# Insert a tree-processor that would actually add the footnote div
# This must be before all other treeprocessors (i.e., inline and
Expand Down Expand Up @@ -315,15 +315,15 @@ def detab(line):
return items, i


class FootnotePattern(Pattern):
class FootnoteInlineProcessor(InlineProcessor):
""" InlinePattern for footnote markers in a document's body text. """

def __init__(self, pattern, footnotes):
super(FootnotePattern, self).__init__(pattern)
super(FootnoteInlineProcessor, self).__init__(pattern)
self.footnotes = footnotes

def handleMatch(self, m):
id = m.group(2)
def handleMatch(self, m, data):
id = m.group(1)
if id in self.footnotes.footnotes.keys():
sup = util.etree.Element("sup")
a = util.etree.SubElement(sup, "a")
Expand All @@ -333,9 +333,9 @@ def handleMatch(self, m):
a.set('rel', 'footnote') # invalid in HTML5
a.set('class', 'footnote-ref')
a.text = util.text_type(self.footnotes.footnotes.index(id) + 1)
return sup
return sup, m.start(0), m.end(0)
else:
return None
return None, None, None


class FootnotePostTreeprocessor(Treeprocessor):
Expand Down
4 changes: 2 additions & 2 deletions markdown/extensions/nl2br.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,15 +19,15 @@
from __future__ import absolute_import
from __future__ import unicode_literals
from . import Extension
from ..inlinepatterns import SubstituteTagPattern
from ..inlinepatterns import SubstituteTagInlineProcessor

BR_RE = r'\n'


class Nl2BrExtension(Extension):

def extendMarkdown(self, md, md_globals):
br_tag = SubstituteTagPattern(BR_RE, 'br')
br_tag = SubstituteTagInlineProcessor(BR_RE, 'br')
md.inlinePatterns.add('nl', br_tag, '_end')


Expand Down
10 changes: 5 additions & 5 deletions markdown/extensions/smart_strong.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,21 +18,21 @@
from __future__ import absolute_import
from __future__ import unicode_literals
from . import Extension
from ..inlinepatterns import SimpleTagPattern
from ..inlinepatterns import SimpleTagInlineProcessor

SMART_STRONG_RE = r'(?<!\w)(_{2})(?!_)(.+?)(?<!_)\2(?!\w)'
STRONG_RE = r'(\*{2})(.+?)\2'
SMART_STRONG_RE = r'(?<!\w)(_{2})(?!_)(.+?)(?<!_)\1(?!\w)'
STRONG_RE = r'(\*{2})(.+?)\1'


class SmartEmphasisExtension(Extension):
""" Add smart_emphasis extension to Markdown class."""

def extendMarkdown(self, md, md_globals):
""" Modify inline patterns. """
md.inlinePatterns['strong'] = SimpleTagPattern(STRONG_RE, 'strong')
md.inlinePatterns['strong'] = SimpleTagInlineProcessor(STRONG_RE, 'strong')
md.inlinePatterns.add(
'strong2',
SimpleTagPattern(SMART_STRONG_RE, 'strong'),
SimpleTagInlineProcessor(SMART_STRONG_RE, 'strong'),
'>emphasis2'
)

Expand Down
18 changes: 9 additions & 9 deletions markdown/extensions/smarty.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@

from __future__ import unicode_literals
from . import Extension
from ..inlinepatterns import HtmlPattern, HTML_RE
from ..inlinepatterns import HtmlInlineProcessor, HTML_RE
from ..odict import OrderedDict
from ..treeprocessors import InlineProcessor

Expand Down Expand Up @@ -150,21 +150,21 @@
HTML_STRICT_RE = HTML_RE + r'(?!\>)'


class SubstituteTextPattern(HtmlPattern):
class SubstituteTextPattern(HtmlInlineProcessor):
def __init__(self, pattern, replace, markdown_instance):
""" Replaces matches with some text. """
HtmlPattern.__init__(self, pattern)
HtmlInlineProcessor.__init__(self, pattern)
self.replace = replace
self.markdown = markdown_instance

def handleMatch(self, m):
def handleMatch(self, m, data):
result = ''
for part in self.replace:
if isinstance(part, int):
result += m.group(part)
else:
result += self.markdown.htmlStash.store(part)
return result
return result, m.start(0), m.end(0)


class SmartyExtension(Extension):
Expand Down Expand Up @@ -233,11 +233,11 @@ def educateQuotes(self, md):
(doubleQuoteSetsRe, (ldquo + lsquo,)),
(singleQuoteSetsRe, (lsquo + ldquo,)),
(decadeAbbrRe, (rsquo,)),
(openingSingleQuotesRegex, (2, lsquo)),
(openingSingleQuotesRegex, (1, lsquo)),
(closingSingleQuotesRegex, (rsquo,)),
(closingSingleQuotesRegex2, (rsquo, 2)),
(closingSingleQuotesRegex2, (rsquo, 1)),
(remainingSingleQuotesRegex, (lsquo,)),
(openingDoubleQuotesRegex, (2, ldquo)),
(openingDoubleQuotesRegex, (1, ldquo)),
(closingDoubleQuotesRegex, (rdquo,)),
(closingDoubleQuotesRegex2, (rdquo,)),
(remainingDoubleQuotesRegex, (ldquo,))
Expand All @@ -255,7 +255,7 @@ def extendMarkdown(self, md, md_globals):
self.educateAngledQuotes(md)
# Override HTML_RE from inlinepatterns.py so that it does not
# process tags with duplicate closing quotes.
md.inlinePatterns["html"] = HtmlPattern(HTML_STRICT_RE, md)
md.inlinePatterns["html"] = HtmlInlineProcessor(HTML_STRICT_RE, md)
if configs['smart_dashes']:
self.educateDashes(md)
inlineProcessor = InlineProcessor(md)
Expand Down
16 changes: 8 additions & 8 deletions markdown/extensions/wikilinks.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
from __future__ import absolute_import
from __future__ import unicode_literals
from . import Extension
from ..inlinepatterns import Pattern
from ..inlinepatterns import InlineProcessor
from ..util import etree
import re

Expand Down Expand Up @@ -46,20 +46,20 @@ def extendMarkdown(self, md, md_globals):

# append to end of inline patterns
WIKILINK_RE = r'\[\[([\w0-9_ -]+)\]\]'
wikilinkPattern = WikiLinks(WIKILINK_RE, self.getConfigs())
wikilinkPattern = WikiLinksInlineProcessor(WIKILINK_RE, self.getConfigs())
wikilinkPattern.md = md
md.inlinePatterns.add('wikilink', wikilinkPattern, "<not_strong")


class WikiLinks(Pattern):
class WikiLinksInlineProcessor(InlineProcessor):
def __init__(self, pattern, config):
super(WikiLinks, self).__init__(pattern)
super(WikiLinksInlineProcessor, self).__init__(pattern)
self.config = config

def handleMatch(self, m):
if m.group(2).strip():
def handleMatch(self, m, data):
if m.group(1).strip():
base_url, end_url, html_class = self._getMeta()
label = m.group(2).strip()
label = m.group(1).strip()
url = self.config['build_url'](label, base_url, end_url)
a = etree.Element('a')
a.text = label
Expand All @@ -68,7 +68,7 @@ def handleMatch(self, m):
a.set('class', html_class)
else:
a = ''
return a
return a, m.start(0), m.end(0)

def _getMeta(self):
""" Return meta data or config data. """
Expand Down
Loading

0 comments on commit d18c3d0

Please sign in to comment.