pypandoc provides a thin wrapper for pandoc, a universal document converter.
- Install pandoc
- Ubuntu/Debian:
sudo apt-get install pandoc
- Fedora/Red Hat:
sudo yum install pandoc
- Mac OS X with Homebrew:
brew install pandoc
- Machine with Haskell:
cabal-install pandoc
- Windows: There is an installer available here
- FreeBSD port
- Or see http://johnmacfarlane.net/pandoc/installing.html
- Ubuntu/Debian:
pip install pypandoc
- To use pandoc filters, you must have the relevant filter installed on your machine
The basic invocation looks like this: pypandoc.convert('input', 'output format')
. pypandoc
tries to infer the type of the input automatically. If it's a file, it will load it. In case you
pass a string, you can define the format
using the parameter. The example below should clarify
the usage:
import pypandoc
output = pypandoc.convert('somefile.md', 'rst')
# alternatively you could just pass some string to it and define its format
output = pypandoc.convert('#some title', 'rst', format='md')
# output == 'some title\r\n==========\r\n\r\n'
If you pass in a string (and not a filename), convert
expects this string to be unicode or
utf-8 encoded bytes. convert
will always return a unicode string.
It's also possible to directly let pandoc write the output to a file. This is the only way to
convert to some output formats (e.g. odt, docx, epub, epub3). In that case convert()
will
return an empty string.
import pypandoc
output = pypandoc.convert('somefile.md', 'docx', outputfile="somefile.docx")
assert output == ""
In addition to format
, it is possible to pass extra_args
.
That makes it possible to access various pandoc options easily.
output = pypandoc.convert(
'<h1>Primary Heading</h1>',
'md', format='html',
extra_args=['--atx-headers'])
# output == '# Primary Heading\r\n'
output = pypandoc.convert(
'# Primary Heading',
'html', format='md',
extra_args=['--base-header-level=2'])
# output == '<h2 id="primary-heading">Primary Heading</h2>\r\n'
pypandoc now supports easy addition of pandoc filters.
filters = ['pandoc-citeproc']
pdoc_args = ['--mathjax',
'--smart']
output = pd.convert(source=filename,
to='html5',
format='md',
extra_args=pdoc_args,
filters=filters)
Please pass any filters in as a list and not a string.
Please refer to pandoc -h
and the
official documentation for further details.
As it can be useful sometimes to check what Pandoc version is available at your system, pypandoc
provides an utility for this. Example:
version = pypandoc.get_pandoc_version()
pydocverter is a client for a service called Docverter, which offers pandoc as a service (plus some extra goodies). It has the same API as pypandoc, so you can easily write code that uses one and falls back to the other. E.g.:
try:
import pypandoc as converter
except ImportError:
import pydocverter as converter
converter.convert('somefile.md', 'rst')
See pyandoc for an alternative implementation of a pandoc wrapper from Kenneth Reitz. This one hasn't been active in a while though.
Contributions are welcome. When opening a PR, please keep the following guidelines in mind:
- Before implementing, please open an issue for discussion.
- Make sure you have tests for the new logic.
- Make sure your code passes
flake8 pypandoc.py tests.py
- Add yourself to contributors at
README.md
unless you are already there. In that case tweak your contributions.
- Valentin Haenel - String conversion fix
- Daniel Sanchez - Automatic parsing of input/output formats
- Thomas G. - Python 3 support
- Ben Jao Ming - Fail gracefully if
pandoc
is missing - Ross Crawford-d'Heureuse - Encode input in UTF-8 and add Django example
- Michael Chow - Decode output in UTF-8
- Janusz Skonieczny - Support Windows newlines and allow encoding to be specified.
- gabeos - Fix help parsing
- Marc Abramowitz - Make
setup.py
fail hard ifpandoc
is missing, Travis, Dockerfile, PyPI badge, Tox, PEP-8, improved documentation - Daniel L. - Add
extra_args
example to README - Amy Guy - Exception handling for unicode errors
- Florian Eßer - Allow Markdown extensions in output format
- Philipp Wendler - Allow Markdown extensions in input format
- Jan Schulz - Handling output to a file, Travis to work on newer version of Pandoc, return code checking, get_pandoc_version
- Aaron Gonzales - Added better filter handling
pypandoc
is available under MIT license. See LICENSE for more details.