Skip to content

Commit

Permalink
Tune attr list regex
Browse files Browse the repository at this point in the history
Ignore empty braces. Braces must contain at least one non-whitepsace
character to be recognized as an attr list.

Attr lists for table cells must be at the end of the cell content and must
be seperated from the content by at least one space. This appears to be
a breaking change. However, it is consistent with the behavior elsewhere.

Fixes Python-Markdown#898.
  • Loading branch information
waylan committed Jun 30, 2020
1 parent 071c4f1 commit 706d1fd
Show file tree
Hide file tree
Showing 8 changed files with 134 additions and 34 deletions.
16 changes: 15 additions & 1 deletion docs/change_log/release-3.3.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,20 @@ markdown.markdown(src, extensions=[FencedCodeExtension(lang_prefix='')])
provide an option to include the language class in the output, let alone prefix it. Therefore, any language prefix
is only applied when syntax highlighting is disabled.

### Attribute Lists are more strict (#898).

Empty curly braces are now completely ignored by the [Attribute List] extension. Previously, the extension would
recognize them as attribute lists and remove them from the document. Therefore, it is no longer necessary to backslash
escape a set of curly braces which are empty or only contain whitespace.

Despite not being documented, previously an attribute list could be defined anywhere within a table cell and get
applied to the cell (`<td>` element). Now the attribute list must be defined at the end of the cell content and must
be separated from the rest of the content by at least one space. This makes it easy to differentiate between attribute
lists defined on inline elements within a cell and the attribute list for the cell itself. It is also more consistent
with how attribute lists are defined on other types of elements.

In addition, the documentation for the extensions received an overhaul. The features (#987) and limitations (#965) of the extension are now fully documented.

## New features

The following new features have been included in the 3.3 release:
Expand All @@ -55,10 +69,10 @@ The following bug fixes are included in the 3.3 release:

* Avoid a `RecursionError` from deeply nested blockquotes (#799).
* Fix issues with complex emphasis (#979).
* Limitations of `attr_list` extension are Documented (#965).
* Fix unescaping of HTML characters `<>` in CodeHilite (#990).

[spec]: https://www.w3.org/TR/html5/text-level-semantics.html#the-code-element
[fenced_code]: ../extensions/fenced_code_blocks.md
[codehilite]: ../extensions/code_hilite.md
[enabled]: ../extensions/fenced_code_blocks.md#enabling-syntax-highlighting
[Attribute List]: ../extensions/attr_list.md
39 changes: 38 additions & 1 deletion docs/extensions/attr_list.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,9 @@ Curly braces can be backslash escaped to avoid being identified as an attribute
\{ not an attribute list }
```

Opening and closing curly braces which are empty or only contain whitespace are ignored whether they are escaped or
not. Additionally, any attribute lists which are not located in the specific locations documented below are ignored.

The colon after the opening brace is optional, but is supported to maintain consistency with other implementations.
Therefore, the following is also a valid attribute list:

Expand Down Expand Up @@ -119,6 +122,40 @@ The above results in the following output:
<p><a href="http://example.com" class="foo bar" title="Some title!">link</a></p>
```

If the [tables](./tables.md) extension is enabled, attribute lists can be defined on table cells. To differentiate
attributes for an inline element from attributes for the containing cell, the attribute list must be separated from
the content by at least one space and be defined at the end of the cell content. As table cells can only ever be on
a single line, the attribute list must remain on the same line as the content of the cell.

```text
| set on td | set on em |
|--------------|-------------|
| *a* { .foo } | *b*{ .foo } |
```

The above example results in the following output:

```html
<table>
<thead>
<tr>
<th>set on td</th>
<th>set on em</th>
</tr>
</thead>
<tbody>
<tr>
<td class="foo"><em>a</em></td>
<td><em class="foo">b</em></td>
</tr>
</tbody>
</table>
```

Note that in the first column, the attribute list is preceded by a space; therefore, it is assigned to the table cell
(`<td>` element). However, in the second column, the attribute list is not preceded by a space; therefore, it is
assigned to the inline element (`<em>`) which immediately preceded it.

### Limitations

There are a few types of elements which attribute lists do not work with. As a reminder, Markdown is a subset of HTML
Expand Down Expand Up @@ -147,7 +184,7 @@ __Implied Elements:__
: There are various HTML elements which are not represented in Markdown text, but only implied. For example, the
`ul` and `ol` elements do not exist in Markdown. They are only implied by the presence of list items (`li`). There
is no way to use an attribute list to define attributes on implied elements, including but not limited to the
following: `ul`, `ol`, `dl`, `table`, `thead`, `tbody`, `tr`, and `th`.
following: `ul`, `ol`, `dl`, `table`, `thead`, `tbody`, and `tr`.

## Usage

Expand Down
3 changes: 3 additions & 0 deletions docs/extensions/tables.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,9 @@ will be rendered as:
</table>
```

!!! seealso "See Also"
The [Attribute Lists](./attr_list.md) extension includes support for defining attributes on table cells.

Usage
-----

Expand Down
14 changes: 6 additions & 8 deletions markdown/extensions/attr_list.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,10 +64,10 @@ def isheader(elem):

class AttrListTreeprocessor(Treeprocessor):

BASE_RE = r'\{\:?([^\}\n]*)\}'
HEADER_RE = re.compile(r'[ ]+%s[ ]*$' % BASE_RE)
BLOCK_RE = re.compile(r'\n[ ]*%s[ ]*$' % BASE_RE)
INLINE_RE = re.compile(r'^%s' % BASE_RE)
BASE_RE = r'\{\:?[ ]*([^\}\n ][^\}\n]*)[ ]*\}'
HEADER_RE = re.compile(r'[ ]+{}[ ]*$'.format(BASE_RE))
BLOCK_RE = re.compile(r'\n[ ]*{}[ ]*$'.format(BASE_RE))
INLINE_RE = re.compile(r'^{}'.format(BASE_RE))
NAME_RE = re.compile(r'[^A-Z_a-z\u00c0-\u00d6\u00d8-\u00f6\u00f8-\u02ff'
r'\u0370-\u037d\u037f-\u1fff\u200c-\u200d'
r'\u2070-\u218f\u2c00-\u2fef\u3001-\ud7ff'
Expand All @@ -79,8 +79,8 @@ def run(self, doc):
if self.md.is_block_level(elem.tag):
# Block level: check for attrs on last line of text
RE = self.BLOCK_RE
if isheader(elem) or elem.tag == 'dt':
# header or def-term: check for attrs at end of line
if isheader(elem) or elem.tag in ['dt', 'td']:
# header, def-term, or table cell: check for attrs at end of element
RE = self.HEADER_RE
if len(elem) and elem.tag == 'li':
# special case list items. children may include a ul or ol.
Expand Down Expand Up @@ -120,8 +120,6 @@ def run(self, doc):
elif elem.text:
# no children. Get from text.
m = RE.search(elem.text)
if not m and elem.tag == 'td':
m = re.search(self.BASE_RE, elem.text)
if m:
self.assign_attrs(elem, m.group(1))
elem.text = elem.text[:m.start()]
Expand Down
18 changes: 0 additions & 18 deletions tests/extensions/extra/tables_and_attr_list.html

This file was deleted.

4 changes: 0 additions & 4 deletions tests/extensions/extra/tables_and_attr_list.txt

This file was deleted.

2 changes: 0 additions & 2 deletions tests/test_legacy.py
Original file line number Diff line number Diff line change
Expand Up @@ -184,8 +184,6 @@ class TestExtensionsExtra(LegacyTestCase):

tables = Kwargs(extensions=['tables'])

tables_and_attr_list = Kwargs(extensions=['tables', 'attr_list'])

extra_config = Kwargs(
extensions=['extra'],
extension_configs={
Expand Down
72 changes: 72 additions & 0 deletions tests/test_syntax/extensions/test_attr_list.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
"""
Python Markdown
A Python implementation of John Gruber's Markdown.
Documentation: https://python-markdown.github.io/
GitHub: https://github.com/Python-Markdown/markdown/
PyPI: https://pypi.org/project/Markdown/
Started by Manfred Stienstra (http://www.dwerg.net/).
Maintained for a few years by Yuri Takhteyev (http://www.freewisdom.org).
Currently maintained by Waylan Limberg (https://github.com/waylan),
Dmitry Shachnev (https://github.com/mitya57) and Isaac Muse (https://github.com/facelessuser).
Copyright 2007-2020 The Python Markdown Project (v. 1.7 and later)
Copyright 2004, 2005, 2006 Yuri Takhteyev (v. 0.2-1.6b)
Copyright 2004 Manfred Stienstra (the original version)
License: BSD (see LICENSE.md for details).
"""

from markdown.test_tools import TestCase


class TestAttrList(TestCase):

maxDiff = None

# TODO: Move the rest of the attr_list tests here.

def test_empty_list(self):
self.assertMarkdownRenders(
'*foo*{ }',
'<p><em>foo</em>{ }</p>',
extensions=['attr_list']
)

def test_table_td(self):
self.assertMarkdownRenders(
self.dedent(
"""
| valid on td | inline | empty | missing space | not at end |
|-------------|-------------|-------|---------------|--------------|
| a { .foo } | *b*{ .foo } | c { } | d{ .foo } | e { .foo } f |
"""
),
self.dedent(
"""
<table>
<thead>
<tr>
<th>valid on td</th>
<th>inline</th>
<th>empty</th>
<th>missing space</th>
<th>not at end</th>
</tr>
</thead>
<tbody>
<tr>
<td class="foo">a</td>
<td><em class="foo">b</em></td>
<td>c { }</td>
<td>d{ .foo }</td>
<td>e { .foo } f</td>
</tr>
</tbody>
</table>
"""
),
extensions=['attr_list', 'tables']
)

0 comments on commit 706d1fd

Please sign in to comment.