TenorSax is a tool to turn plain text formats such as troff and AsciiDoc into an XML stream and optionally transform the output.
This code requires Python 3. lxml is required for XSLT transformations, but is otherwise optional. Note that since many output formats implicitly make use of XSLT transformations, lxml isn’t as optional as you might like.
The code is ready to run out-of-the-box. To run the testsuite, type make
test
. To generate the documentation, which requires xsltproc and fop, type
make doc
. The makefile uses GNU syntax.
Simply run ./troff
. Output devices are selected with the -T
option; they
include troff-xml (an XSL-FO like representation of the troff input), xml
(generic XML output), fo (XSL-FO), and test (a UTF-8 representation identical to
that used by the testsuite). The default is troff-xml. Macro sets are selected
with the -m
option.
Many of the troff requests are not implemented, so the code is not at the moment
usable as a full troff engine. An example set of macros is available in
tmac/xd.tmac
and an example document is available as doc/quick-test.mxd
.
In general, XML output formats other than the troff-xml format and XSL-FO should
use the xml output device. This is true for the DocBook 5.0 output produced by
the xd macro set. Other macro sets generating generic XML should probably
include tmac/roff-cleanup.tmac
to remove requests that would generate bizarre
and probably corrupt XML output.
Run ./textproc
. Output devices are selected with the -T
option; they
include xml (the default), fo (for XSL-FO), and db5 (for DocBook 5). The xml
output produces XML in the http://ns.crustytoothpaste.net/text-markup
namespace. Other output devices consist of an XSLT stylesheet that converts XML
in this namespace to the appropriate format.
Output options depend on the input tool. The troff input tool generally uses macro sets which produce some sort of XML output. Formats that use the textproc input tool can support any type of output that can be produced using an XSLT stylesheet. Any input format for the textproc input tool will generate output in the text-markup namespace and therefore any stylesheet which can process this format will work with any of those input formats.
Copyright © 2010–2011 brian m. carlson
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation: version 2 of the License, dated June 1991.
This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. See the GNU General Public License for more details.