Skip to content

Commit

Permalink
bpo-17239: Disable external entities in SAX parser (pythonGH-9217)
Browse files Browse the repository at this point in the history
The SAX parser no longer processes general external entities by default
to increase security. Before, the parser created network connections
to fetch remote files or loaded local files from the file system for DTD
and entities.

Signed-off-by: Christian Heimes <[email protected]>



https://bugs.python.org/issue17239
  • Loading branch information
tiran authored and miss-islington committed Sep 23, 2018
1 parent 9fb051f commit 17b1d5d
Show file tree
Hide file tree
Showing 9 changed files with 120 additions and 5 deletions.
14 changes: 14 additions & 0 deletions Doc/library/xml.dom.pulldom.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,20 @@ events until either processing is finished or an error condition occurs.
maliciously constructed data. If you need to parse untrusted or
unauthenticated data see :ref:`xml-vulnerabilities`.

.. versionchanged:: 3.8

The SAX parser no longer processes general external entities by default to
increase security by default. To enable processing of external entities,
pass a custom parser instance in::

from xml.dom.pulldom import parse
from xml.sax import make_parser
from xml.sax.handler import feature_external_ges

parser = make_parser()
parser.setFeature(feature_external_ges, True)
parse(filename, parser=parser)


Example::

Expand Down
6 changes: 4 additions & 2 deletions Doc/library/xml.rst
Original file line number Diff line number Diff line change
Expand Up @@ -65,8 +65,8 @@ kind sax etree minidom p
========================= ============== =============== ============== ============== ==============
billion laughs **Vulnerable** **Vulnerable** **Vulnerable** **Vulnerable** **Vulnerable**
quadratic blowup **Vulnerable** **Vulnerable** **Vulnerable** **Vulnerable** **Vulnerable**
external entity expansion **Vulnerable** Safe (1) Safe (2) **Vulnerable** Safe (3)
`DTD`_ retrieval **Vulnerable** Safe Safe **Vulnerable** Safe
external entity expansion Safe (4) Safe (1) Safe (2) Safe (4) Safe (3)
`DTD`_ retrieval Safe (4) Safe Safe Safe (4) Safe
decompression bomb Safe Safe Safe Safe **Vulnerable**
========================= ============== =============== ============== ============== ==============

Expand All @@ -75,6 +75,8 @@ decompression bomb Safe Safe Safe S
2. :mod:`xml.dom.minidom` doesn't expand external entities and simply returns
the unexpanded entity verbatim.
3. :mod:`xmlrpclib` doesn't expand external entities and omits them.
4. Since Python 3.8.0, external general entities are no longer processed by
default since Python.


billion laughs / exponential entity expansion
Expand Down
8 changes: 8 additions & 0 deletions Doc/library/xml.sax.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,14 @@ the SAX API.
constructed data. If you need to parse untrusted or unauthenticated data see
:ref:`xml-vulnerabilities`.

.. versionchanged:: 3.8

The SAX parser no longer processes general external entities by default
to increase security. Before, the parser created network connections
to fetch remote files or loaded local files from the file
system for DTD and entities. The feature can be enabled again with method
:meth:`~xml.sax.xmlreader.XMLReader.setFeature` on the parser object
and argument :data:`~xml.sax.handler.feature_external_ges`.

The convenience functions are:

Expand Down
12 changes: 12 additions & 0 deletions Doc/whatsnew/3.8.rst
Original file line number Diff line number Diff line change
Expand Up @@ -155,6 +155,15 @@ venv
activating virtual environments under PowerShell Core 6.1.
(Contributed by Brett Cannon in :issue:`32718`.)

xml
---

* As mitigation against DTD and external entity retrieval, the
:mod:`xml.dom.minidom` and mod:`xml.sax` modules no longer process
external entities by default.
(Contributed by Christian Heimes in :issue:`17239`.)


Optimizations
=============

Expand Down Expand Up @@ -333,6 +342,9 @@ Changes in the Python API
* :class:`uuid.UUID` now uses ``__slots__``, therefore instances can no longer
be weak-referenced and attributes can no longer be added.

* :mod:`xml.dom.minidom` and mod:`xml.sax` modules no longer process
external entities by default.
(Contributed by Christian Heimes in :issue:`17239`.)

CPython bytecode changes
------------------------
Expand Down
7 changes: 7 additions & 0 deletions Lib/test/test_pulldom.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
import xml.sax

from xml.sax.xmlreader import AttributesImpl
from xml.sax.handler import feature_external_ges
from xml.dom import pulldom

from test.support import findfile
Expand Down Expand Up @@ -166,6 +167,12 @@ def test_getitem_deprecation(self):
# This should have returned 'END_ELEMENT'.
self.assertEqual(parser[-1][0], pulldom.START_DOCUMENT)

def test_external_ges_default(self):
parser = pulldom.parseString(SMALL_SAMPLE)
saxparser = parser.parser
ges = saxparser.getFeature(feature_external_ges)
self.assertEqual(ges, False)


class ThoroughTestCase(unittest.TestCase):
"""Test the hard-to-reach parts of pulldom."""
Expand Down
60 changes: 58 additions & 2 deletions Lib/test/test_sax.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,14 @@
from xml.sax.saxutils import XMLGenerator, escape, unescape, quoteattr, \
XMLFilterBase, prepare_input_source
from xml.sax.expatreader import create_parser
from xml.sax.handler import feature_namespaces
from xml.sax.handler import feature_namespaces, feature_external_ges
from xml.sax.xmlreader import InputSource, AttributesImpl, AttributesNSImpl
from io import BytesIO, StringIO
import codecs
import gc
import os.path
import shutil
from urllib.error import URLError
from test import support
from test.support import findfile, run_unittest, TESTFN

Expand Down Expand Up @@ -911,6 +912,18 @@ def notationDecl(self, name, publicId, systemId):
def unparsedEntityDecl(self, name, publicId, systemId, ndata):
self._entities.append((name, publicId, systemId, ndata))


class TestEntityRecorder:
def __init__(self):
self.entities = []

def resolveEntity(self, publicId, systemId):
self.entities.append((publicId, systemId))
source = InputSource()
source.setPublicId(publicId)
source.setSystemId(systemId)
return source

def test_expat_dtdhandler(self):
parser = create_parser()
handler = self.TestDTDHandler()
Expand All @@ -927,6 +940,32 @@ def test_expat_dtdhandler(self):
[("GIF", "-//CompuServe//NOTATION Graphics Interchange Format 89a//EN", None)])
self.assertEqual(handler._entities, [("img", None, "expat.gif", "GIF")])

def test_expat_external_dtd_enabled(self):
parser = create_parser()
parser.setFeature(feature_external_ges, True)
resolver = self.TestEntityRecorder()
parser.setEntityResolver(resolver)

with self.assertRaises(URLError):
parser.feed(
'<!DOCTYPE external SYSTEM "unsupported://non-existing">\n'
)
self.assertEqual(
resolver.entities, [(None, 'unsupported://non-existing')]
)

def test_expat_external_dtd_default(self):
parser = create_parser()
resolver = self.TestEntityRecorder()
parser.setEntityResolver(resolver)

parser.feed(
'<!DOCTYPE external SYSTEM "unsupported://non-existing">\n'
)
parser.feed('<doc />')
parser.close()
self.assertEqual(resolver.entities, [])

# ===== EntityResolver support

class TestEntityResolver:
Expand All @@ -936,8 +975,9 @@ def resolveEntity(self, publicId, systemId):
inpsrc.setByteStream(BytesIO(b"<entity/>"))
return inpsrc

def test_expat_entityresolver(self):
def test_expat_entityresolver_enabled(self):
parser = create_parser()
parser.setFeature(feature_external_ges, True)
parser.setEntityResolver(self.TestEntityResolver())
result = BytesIO()
parser.setContentHandler(XMLGenerator(result))
Expand All @@ -951,6 +991,22 @@ def test_expat_entityresolver(self):
self.assertEqual(result.getvalue(), start +
b"<doc><entity></entity></doc>")

def test_expat_entityresolver_default(self):
parser = create_parser()
self.assertEqual(parser.getFeature(feature_external_ges), False)
parser.setEntityResolver(self.TestEntityResolver())
result = BytesIO()
parser.setContentHandler(XMLGenerator(result))

parser.feed('<!DOCTYPE doc [\n')
parser.feed(' <!ENTITY test SYSTEM "whatever">\n')
parser.feed(']>\n')
parser.feed('<doc>&test;</doc>')
parser.close()

self.assertEqual(result.getvalue(), start +
b"<doc></doc>")

# ===== Attributes support

class AttrGatherer(ContentHandler):
Expand Down
13 changes: 13 additions & 0 deletions Lib/test/test_xml_etree.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,12 @@
<document>&entity;</document>
"""

EXTERNAL_ENTITY_XML = """\
<!DOCTYPE points [
<!ENTITY entity SYSTEM "file:///non-existing-file.xml">
]>
<document>&entity;</document>
"""

def checkwarnings(*filters, quiet=False):
def decorator(test):
Expand Down Expand Up @@ -861,6 +867,13 @@ def test_entity(self):
root = parser.close()
self.serialize_check(root, '<document>text</document>')

# 4) external (SYSTEM) entity

with self.assertRaises(ET.ParseError) as cm:
ET.XML(EXTERNAL_ENTITY_XML)
self.assertEqual(str(cm.exception),
'undefined entity &entity;: line 4, column 10')

def test_namespace(self):
# Test namespace issues.

Expand Down
2 changes: 1 addition & 1 deletion Lib/xml/sax/expatreader.py
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ def __init__(self, namespaceHandling=0, bufsize=2**16-20):
self._lex_handler_prop = None
self._parsing = 0
self._entity_stack = []
self._external_ges = 1
self._external_ges = 0
self._interning = None

# XMLReader methods
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
The xml.sax and xml.dom.minidom parsers no longer processes external
entities by default. External DTD and ENTITY declarations no longer
load files or create network connections.

0 comments on commit 17b1d5d

Please sign in to comment.