Pandoc Plug-in

Background

Pandoc is a Haskell library for converting from one markup format to another, and a command-line tool that uses this library. It can convert from the following formats:

Markdown:commonmark (CommonMark Markdown) ,gfm (GitHub-Flavored Markdown) ,markdown (Pandoc’s Markdown) ,markdown_mmd (MultiMarkdown) ,markdown_phpextra (PHP Markdown Extra) ,markdown_strict (original unextended Markdown)
Wiki Formats:dokuwiki (DokuWiki markup) ,mediawiki (MediaWiki markup) ,muse (Muse) ,tikiwiki (TikiWiki markup) ,twiki (TWiki markup) ,vimwiki (Vimwiki)
Other Formats:creole (Creole 1.0) ,docbook (DocBook) ,docx (Word docx) ,epub (EPUB) ,fb2 (FictionBook2 e-book) ,haddock (Haddock markup) ,html (HTML) ,ipynb (Jupyter notebook) ,jats (JATS XML) ,json (JSON version of native AST) ,latex (LaTeX) ,man (roff man) ,native (native Haskell) ,odt (ODT) ,opml (OPML) ,org (Emacs Org mode) ,rst (reStructuredText) ,t2t (txt2tags) ,textile (Textile)

More information about Pandoc can be found at pandoc.org.

This plug-in contains a Lua template which extends the output formats supported by Pandoc to include DITA. The output consists of a single DITA topic for each input file added to the ditamap.

Unlike the standard Markdown Plug-in, this plug-in does not fail if the <h1...h6> headers are incorrectly incremented. This is because the Lua template has been designed to calculate that headers are incrementing at most one level at a time - the downside of this is that the output maybe unexpected.

Note: Note that because Pandoc’s intermediate representation of a document is less expressive than many of the formats it converts between, one should not expect perfect conversions between every format and every other. Pandoc attempts to preserve the structural elements of a document, but not formatting details such as margin size. And some document elements, such as complex tables, may not fit into Pandoc’s simple document model. While conversions from Pandoc’s Markdown to all formats aspire to be perfect, conversions from formats more expressive than Pandoc’s Markdown can be expected to be lossy.

Install

This plug-in needs Pandoc running on user's machine to function correctly. It also requires the presence of the xmltask jar to edit XML files as part of the ANT build. It therefore requires a series of commands to install the relevant plug-in dependencies and configure Pandoc.

Run the plug-in installation commands:

dita install https://github.com/doctales/org.doctales.xmltask/archive/master.zip
dita install https://github.com/jason-fox/fox.jason.passthrough/archive/master.zip
dita install https://github.com/jason-fox/fox.jason.passthrough.pandoc/archive/master.zip

The dita command line tool requires no additional configuration.

Installing Pandoc

To download a copy of Pandoc follow the instructions on the Install page.

Usage

To mark a file to be passed through for Pandoc processing, label the <topicref> with @format="pandoc" within the <map> as shown:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE bookmap PUBLIC "-//OASIS//xmlattDTD DITA BookMap//EN" "bookmap.dtd">
<bookmap>
    ...etc
    <chapter format="pandoc" href="sample.docx"/>
</bookmap>

Figure 1. Usage

The additional file will run against the PandocXXX-to-DITA lua filter to be converted to a *.dita file and will be added to the build job without further processing. The @navtitle of the included <topic> will be the same as root name of the file. Any underscores in the filename will be replaced by spaces in title.

Annotating files

The examples below use Markdown as a passthrough format, other formats need to provide equivalent annotations to obtain full functionality. Where possible, annotation aligns with the Markdown DITA syntax reference based on CommonMark. The chapter <title> is taken from the first header found. Thereafter the document is processed as expected:

# Chapter title

The abstract (if any) goes here...

## Topic 1

Body of topic 1 goes here.

## Topic 2

Body of topic 2 goes here.
...etc

Figure 2. Markdown

Ideally input files should only contain a single # header.

Pandoc header_attributes can be used to define @id or @outputclass attributes:

# Topic title {#carrot .juice}

Figure 3. ID and outputclass

The following class values in header_attributes have a special meaning on header levels.

section
example

They are used to generate <section> and <example> elements:

# Topic title

## Section title {.section}
## Example title {.example}

Figure 4. Sections

Metadata

YAML metadata block as defined in Pandoc pandoc_metadata_block can be used to specify different metadata elements. The supported elements are:

author
source
publisher
permissions
audience
category
keyword
resourceid
shortdesc

Unrecognized keys are output using data element.

---
author:
    - Author One
    - Author Two
source: Source
publisher: Publisher
permissions: Permissions
audience: Audience
category: Category
keyword:
    - Keyword1
    - Keyword2
resourceid:
    - Resourceid1
    - Resourceid2
workflow: review
---

Figure 5. Markdown with Metadata

<title>Sample with YAML header</title>
<prolog>
  <author>Author One</author>
  <author>Author Two</author>
  <source>Source</source>
  <publisher>Publisher</publisher>
  <permissions view="Permissions"/>
  <metadata>
    <audience audience="Audience"/>
    <category>Category</category>
    <keywords>
      <keyword>Keyword1</keyword>
      <keyword>Keyword2</keyword>
    </keywords>
  </metadata>
  <resourceid appid="Resourceid1"/>
  <resourceid appid="Resourceid2"/>
  <data name="workflow" value="review"/>
</prolog>

Figure 6. Sample output with YAML header

Ditamap <topicmeta> processing is also supported.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE bookmap PUBLIC "-//OASIS//DTD DITA BookMap//EN" "bookmap.dtd">
<bookmap>
    <chapter format="pandoc" processing-role="normal" type="topic" href="markdown.md">
        <topicmeta>
            <shortdesc>This is where the shortdesc goes</shortdesc>
            <metadata>
                 <keywords>
                    <keyword>Keyword1</keyword>
                    <keyword>Keyword2</keyword>
                </keywords>
            </metadata>
        </topicmeta>
    </chapter>
</bookmap>

Figure 7. Ditamap TopicMeta for Pandoc Files

This allows for topic metadata to be added to files for formats other than Markdown.