Skip to content
forked from podoc/podoc

Minimalistic pure Python pandoc companion (not ready for public use yet)

License

Notifications You must be signed in to change notification settings

sundeepteki/podoc

 
 

Repository files navigation

podoc

Build Status codecov.io Documentation Status PyPI release GitHub release Join the chat at https://gitter.im/podoc/podoc

This library is in a very early development stage and it's not ready for public use yet.

podoc will be a pure Python library for converting markup documents in a way that is 100% compatible with pandoc. You'll be able to convert documents bidirectionally between the following formats:

  • Without pandoc installed:
    • CommonMark/Markdown
    • Jupyter Notebook
    • OpenDocument
    • O'Reilly Atlas
    • ReST (later?)
    • LaTeX (later?)
    • HTML (later?)
  • With pandoc installed:
    • all formats above
    • the dozens of formats supported by pandoc

podoc implements no parser. Instead, it uses other parsing libraries like CommonMark-py, Jupyter, odfpy, etc.

Conversion will be entirely customizable and will allow many use-cases (see the Plugin ideas section below).

Plugins

podoc implements a very light core. Most functionality is provided by built-in plugins, and you can implement your own plugins easily. Examples of included plugins are support for all natively supported formats like CommonMark, Notebook, AST, etc.

AST

podoc features a language-independent representation for documents, also known as Abstract Syntax Tree (AST). This structure is very close the the internal AST used by pandoc, and podoc provides 100% compatible import/export facilities to the pandoc AST JSON format.

Custom AST nodes

The AST supports a small set of built-in node types, like Header, Para, or CodeBlock (the same names as in pandoc). You can also implement your own custom node types which allow for a rich set of possibilities.

For example, the Notebook plugin implements the CodeCell node. Its children are the input cell as a CodeBlock and output cells as CodeBlocks.

When you define a custom node type, make sure that its children are native, such that writers that don't support the node type can still process sensible contents. This is because the default behavior for writers is to just skip unknown nodes and proceed with the children as usual, recursively. In the CodeCell example, you can see that the children are native CodeBlocks so that writers that don't support CodeCell will still render a list of code blocks. Writers that do support Notebook CodeCell will have a chance to render them in a specific way.

Formats

In podoc, there is a list of formats which are nodes in a conversion (directed) graph, and a list of conversion functions which are edges in that graph. To go from one format to another, the shortest path is found and the conversion is performed. In practice, the conversion path is almost always source -> AST -> target, and most formats implement both readers (source -> AST) and writers (AST -> target).

Some conversion paths don't require a full AST parsing, for example notebook -> CommonMark (since notebooks already contain Markdown cells), which is significantly faster than notebook -> AST -> CommonMark.

Filters

You can register filters that transform a document without changing the format. For example, prefilters that transform the source document or AST filters that implement custom features.

See a list of possible filters below.

Plugin ideas

  • ASCIIImage: replace images by ASCII art to display documents with images in the console.
  • Atlas: filter replacing code blocks in a given language by executable <pre> HTML code blocks, and LaTeX equations by <span> HTML blocks. This is used by the O'Reilly Atlas platform.
  • CodeEval: evaluate some code elements and replace them by their output. This allows for literate programming using Python.
  • Graph: describe a graph or a tree in a human-readable format and have it converted automatically to an image (e.g. mermaid)
  • Include: just include several documents in a single document.
  • Macros: perform regex substitutions. The macro substitutions can be listed in the macros metadata array in the document, or in c.Macros.substitutions = [(regex, repl), ...] in your .podoc/config.py.
  • Prompt: parse and write prompt prefix in an input code cell.
  • SlideShow: convert Markdown documents or Jupyter notebooks to slideshows
  • UrlChecker: find all broken hypertext links and generate a report.

About

Minimalistic pure Python pandoc companion (not ready for public use yet)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.4%
  • Makefile 0.6%