Skip to content

thatcher/pypandoc

 
 

Repository files navigation

pypandoc

Latest Version Build Status

pypandoc provides a thin wrapper for pandoc, a universal document converter.

Installation

  • Install pandoc
  • pip install pypandoc
  • To use pandoc filters, you must have the relevant filter installed on your machine

Usage

The basic invocation looks like this: pypandoc.convert('input', 'output format'). pypandoc tries to infer the type of the input automatically. If it's a file, it will load it. In case you pass a string, you can define the format using the parameter. The example below should clarify the usage:

import pypandoc

output = pypandoc.convert('somefile.md', 'rst')

# alternatively you could just pass some string to it and define its format
output = pypandoc.convert('#some title', 'rst', format='md')
# output == 'some title\r\n==========\r\n\r\n'

If you pass in a string (and not a filename), convert expects this string to be unicode or utf-8 encoded bytes. convert will always return a unicode string.

It's also possible to directly let pandoc write the output to a file. This is the only way to convert to some output formats (e.g. odt, docx, epub, epub3). In that case convert() will return an empty string.

import pypandoc

output = pypandoc.convert('somefile.md', 'docx', outputfile="somefile.docx")
assert output == ""

In addition to format, it is possible to pass extra_args. That makes it possible to access various pandoc options easily.

output = pypandoc.convert(
    '<h1>Primary Heading</h1>',
    'md', format='html',
    extra_args=['--atx-headers'])
# output == '# Primary Heading\r\n'
output = pypandoc.convert(
    '# Primary Heading',
    'html', format='md',
    extra_args=['--base-header-level=2'])
# output == '<h2 id="primary-heading">Primary Heading</h2>\r\n'

pypandoc now supports easy addition of pandoc filters.

filters = ['pandoc-citeproc']
pdoc_args = ['--mathjax',
             '--smart']
output = pd.convert(source=filename,
                    to='html5',
                    format='md',
                    extra_args=pdoc_args,
                    filters=filters)

Please pass any filters in as a list and not a string.

Please refer to pandoc -h and the official documentation for further details.

Related

pydocverter is a client for a service called Docverter, which offers pandoc as a service (plus some extra goodies). It has the same API as pypandoc, so you can easily write code that uses one and falls back to the other. E.g.:

try:
    import pypandoc as converter
except ImportError:
    import pydocverter as converter

converter.convert('somefile.md', 'rst')

See pyandoc for an alternative implementation of a pandoc wrapper from Kenneth Reitz. This one hasn't been active in a while though.

Contributing

Contributions are welcome. When opening a PR, please keep the following guidelines in mind:

  1. Before implementing, please open an issue for discussion.
  2. Make sure you have tests for the new logic.
  3. Make sure your code passes flake8 pypandoc.py tests.py
  4. Add yourself to contributors at README.md unless you are already there. In that case tweak your contributions.

Contributors

License

pypandoc is available under MIT license. See LICENSE for more details.

Packages

No packages published

Languages

  • Python 100.0%