pypandoc provides a thin wrapper for pandoc, a universal document converter.
The basic invocation looks like this: pypandoc.convert('input', 'output format')
. pypandoc
tries to infer the type of the input automatically. If it's a file, it will load it. In case you pass a string, you can define the format
using the parameter. The example below should clarify the usage:
import pypandoc
output = pypandoc.convert('somefile.md', 'rst')
# alternatively you could just pass some string to it and define its format
output = pypandoc.convert('#some title', 'rst', format='md')
# output == 'some title\r\n==========\r\n\r\n'
In addition to format
, it is possible to pass extra_args
. That makes it possible to access various pandoc options easily.
output = pypandoc.convert('<h1>Primary Heading</h1>', 'md', format='html', extra_args=['--atx-headers'])
# output == '# Primary Heading\r\n'
output = pypandoc.convert('# Primary Heading', 'html', format='md', extra_args=['--base-header-level=2'])
# output == '<h2 id="primary-heading">Primary Heading</h2>\r\n'
Please refer to pandoc -h
and the official documentation for further details.
See also pyandoc for an alternative implementation.
See services.py
at the project root for implementation. Use it like this:
from .services import PandocDocxService
service = PandocDocxService()
doc_file = service.generate(html='<html><body><h1>Heading 1</h1><p>testing testing 123</p></body></html>')
- Valentin Haenel - String conversion fix
- Daniel Sanchez - Automatic parsing of input/output formats
- Thomas G. - Python 3 support
- Ben Jao Ming - Fail gracefully if
pandoc
is missing - Ross Crawford-d'Heureuse - Encode input in UTF-8 and add Django example
- Michael Chow - Decode output in UTF-8
- Janusz Skonieczny - Support Windows newlines and allow encoding to be specified.
- gabeos - Fix help parsing
- Marc Abramowitz - Make
setup.py
fail hard ifpandoc
is missing - Daniel L. - Add
extra_args
example to README
pypandoc
is available under MIT license. See LICENSE for more details.