DITA Prettifier Plug-in

This is a DITA prettifier DITA-OT Plug-in which formats DITA XML in an aesthetically pleasing manner. <topic> elements, <section> elements, <p> elements etc. are regularly indented so the raw DITA XML files can be scanned by humans:

Example

A typical DITA file can contain long lines, missing carriage returns and un-aligned elements:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN" "topic.dtd">
<topic id="basic-usage"><title>Basic usage</title><body outputclass="language-markup">
<p>You will need to include the <codeph>prism.css</codeph> and <codeph>prism.js</codeph> files you downloaded in your page. Example:
</p>
<codeblock>&amp;lt;!DOCTYPE html&amp;gt;
&amp;lt;html&amp;gt;
&amp;lt;head&amp;gt;
  ...
  &amp;lt;link href="themes/prism.css" rel="stylesheet" /&amp;gt;
  &amp;gt;&amp;lt;/head&amp;gt;
&amp;lt;body&amp;gt;
  ...
  &amp;lt;script src="prism.js"&amp;gt;&amp;lt;/script&amp;gt;
  &amp;lt;/body&amp;gt;
&amp;lt;/html&amp;gt;</codeblock>
   <p>Prism does its best to encourage good authoring practices.  Therefore,it only works with <xmlelement>code</xmlelement> elements,  since marking upcode without a <xmlelement>code</xmlelement> element is semantically invalid.<xref format="html" scope="external" href="https://www.w3.org/TR/html52/textlevel-semantics.html#the-code-element">According to the HTML5 spec</xref>, the recommended way to define a code language is a <codeph>language-xxxx</codeph> class, which is what Prism uses. Alternatively, Prism also supports a shorter version: <codeph>lang-xxxx</codeph>.
</p>
    <p> To make things easier however, Prism assumes that this language definition is inherited. Therefore, if multiple <xmlelement>code</xmlelement> elements have the same language, you can add the <codeph>language-xxxx</codeph> class on one of their common ancestors. This way, you can also define a document-wide default language, by adding a <codeph>language-xxxx</codeph> class on the <xmlelement>body</xmlelement> or <xmlelement>html</xmlelement> element.</p>
   <p> If you want to opt-out of highlighting for a <xmlelement>code</xmlelement> element that is a descendant of an element with a declared code language, you can add the class <codeph>language-none</codeph> to it (or any non-existing language, really).
  </p>
<p> The <xref format="html" scope="external" href="https://www.w3.org/TR/html5/grouping-content.html#the-pre-element">recommended way to mark up a code block</xref> (both for semantics and for Prism) is a <xmlelement>pre</xmlelement> element with a <xmlelement>code</xmlelement> element inside, like so:
</p>
<codeblock>&amp;lt;pre&amp;gt;&amp;lt;code class="language-css"&amp;gt;p { color: red }&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;</codeblock>
<p> If you use that pattern, the <xmlelement>pre</xmlelement> will automatically get the <codeph>language-xxxx</codeph> class (if it doesn’t already have it) and will be styled as a code block.
</p>
  <p> If you want to prevent any elements from being automatically highlighted, you can use the attribute <codeph>data-manual</codeph> on the <xmlelement>script</xmlelement> element you used for prism and use the <xref format="html" scope="external" href="https://prismjs.com/extending.html#api">API</xref>. Example:
    </p>
<section id="usage-with-webpack"><title>Usage with Webpack, Browserify, &amp; Other Bundlers</title><p>If you want to use Prism with a bundler, install Prism with <codeph>npm</codeph>:</p><codeblock>$ npm install prismjs</codeblock><p>You can then <codeph outputclass="language-js">import</codeph> into your bundle</p><codeblock outputclass="language-js">import Prism from 'prismjs';</codeblock><p>To make it easy to configure your Prism instance with only thelanguages and plugins you need, use the babel plugin, <xref format="html" scope="external" href="https://github.com/mAAdhaTTah/babel-plugin-prismjs">babel-plugin-prismjs</xref>. This will allow you to load the minimum number of languages and plugins to satisfy your needs. See that plugin's documentation for configuration details</p>
</section></body>
</topic>
Figure 1. Unformatted DITA

After running the pretty-dita transform, the same file will have all its elements aligned, each block element on a new line and text should not overrun the side of a typical view screen (approx 120 characters).

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN" "topic.dtd">
<topic id="basic-usage">
    <title>Basic usage</title>
    <body outputclass="language-markup">
        <p>
          You will need to include the <codeph>prism.css</codeph> and <codeph>prism.js</codeph>
          files you downloaded in your page. Example:
        </p>
        <codeblock>&amp;lt;!DOCTYPE html&amp;gt;
&amp;lt;html&amp;gt;
&amp;lt;head&amp;gt;
  ...
  &amp;lt;link href="themes/prism.css" rel="stylesheet" /&amp;gt;
  &amp;gt;&amp;lt;/head&amp;gt;
&amp;lt;body&amp;gt;
  ...
  &amp;lt;script src="prism.js"&amp;gt;&amp;lt;/script&amp;gt;
  &amp;lt;/body&amp;gt;
&amp;lt;/html&amp;gt;</codeblock>
        <p>
          Prism does its best to encourage good authoring practices.  Therefore,it only
          works with <xmlelement>code</xmlelement> elements,  since marking upcode
          without a <xmlelement>code</xmlelement> element is semantically invalid.<xref
          format="html" scope="external"
          href="https://www.w3.org/TR/html52/textlevel-semantics.html#the-code-element">According
          to the HTML5 spec</xref>, the recommended way to define a code language is a <codeph>language-xxxx</codeph>
          class, which is what Prism uses. Alternatively, Prism also supports a shorter version: <codeph>lang-xxxx</codeph>.
        </p>
        <p>
          To make things easier however, Prism assumes that this language definition is
          inherited. Therefore, if multiple <xmlelement>code</xmlelement> elements have
          the same language, you can add the <codeph>language-xxxx</codeph> class on one
          of their common ancestors. This way, you can also define a document-wide
          default language, by adding a <codeph>language-xxxx</codeph> class on the <xmlelement>body</xmlelement>
          or <xmlelement>html</xmlelement> element.
        </p>
        <p>
          If you want to opt-out of highlighting for a <xmlelement>code</xmlelement>
          element that is a descendant of an element with a declared code language, you
          can add the class <codeph>language-none</codeph> to it (or any non-existing
          language, really).
        </p>
        <p>
          The <xref format="html" scope="external"
          href="https://www.w3.org/TR/html5/grouping-content.html#the-pre-element">recommended
          way to mark up a code block</xref> (both for semantics and for Prism) is a <xmlelement>pre</xmlelement>
          element with a <xmlelement>code</xmlelement> element inside, like so:
        </p>
        <codeblock>&amp;lt;pre&amp;gt;&amp;lt;code class="language-css"&amp;gt;p { color: red }&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;</codeblock>
        <p>
          If you use that pattern, the <xmlelement>pre</xmlelement> will automatically get
          the <codeph>language-xxxx</codeph> class (if it doesn’t already have it) and
          will be styled as a code block.
        </p>
        <p>
          If you want to prevent any elements from being automatically highlighted, you
          can use the attribute <codeph>data-manual</codeph> on the <xmlelement>script</xmlelement>
          element you used for prism and use the <xref format="html" scope="external"
          href="https://prismjs.com/extending.html#api">API</xref>. Example:
        </p>
        <section id="usage-with-webpack">
            <title>Usage with Webpack, Browserify, &amp; Other Bundlers</title>
            <p>
              If you want to use Prism with a bundler, install Prism with <codeph>npm</codeph>:
            </p>
            <codeblock>$ npm install prismjs</codeblock>
            <p>
              You can then <codeph outputclass="language-js">import</codeph> into your bundle.
            </p>
            <codeblock outputclass="language-js">import Prism from 'prismjs';</codeblock>
            <p>
              To make it easy to configure your Prism instance with only thelanguages and
              plugins you need, use the babel plugin, <xref format="html" scope="external"
              href="https://github.com/mAAdhaTTah/babel-plugin-prismjs">babel-plugin-prismjs</xref>.
              This will allow you to load the minimum number of languages and plugins to
              satisfy your needs. See that plugin's documentation for configuration details.
            </p>
        </section>
    </body>
</topic>
Figure 2. Formatted DITA

Install

This is a standalone plug-in without dependencies which can be installed from the command line.

Run the plug-in installation commands:

dita install https://github.com/jason-fox/fox.jason.pretty-dita/archive/master.zip

The dita command line tool requires no additional configuration.

Usage

Like any other transform, when invoked directly, the prettier requires an input document.

Prettifying DITA files for a document

To prettify DITA files for a document, set the transtype parameter to pretty-dita, or pass the --format=pretty-dita option to the dita command line.

dita --format pretty-dita \
    --input document.ditamap

All *.dita<topic> and *.ditamap<map> files under that directory will be updated in place.

Prettifying a single DITA file

Alternatively, to prettify a single DITA file, set the --input parameter to point to a *.dita file:

dita --format pretty-dita \
    --input topic.dita

The specified file will be updated in place.

Parameter Reference

  • args.indent - How many characters to indent (default 4)
  • args.style - Whether to indent using tabs or spaces (default spaces)
  • args.print-width - Specify the line length that the printer will wrap on (default 80)
  • args.require-pragma - Restrict the plug-in to only format files that contain a special comment, called a pragma, at the top of the file (default false).

    This is very useful when gradually transitioning large, unformatted codebases to pretty-dita.

    For example, a file containing the following comment will be formatted when args.require-pragma is supplied if either of the following are present:

    <!-- @prettier -->
    <!-- @format -->

    args.insert-pragma - Insert a special @format marker at the top of files specifying that the file has been formatted with the plugin (default false)

Ignoring DITA files

The prettifier will ignore any DITA file containing a comment starting prettier-ignore - the file will not be updated.

...
<topic id="basic-usage">
    <!-- prettier-ignore -->
    <title>Basic usage</title>
    <body outputclass="language-markup">
<lines>
This file really doesn't need formatting
    We want it to look this way.
</lines>
        <p>This will also be left alone.</p>
            <p>This will be left as well
        </p>
    </body>
</topic>

Formatting Rules

The pretty-dita DITA-OT Plug-in is an opinionated code formatter, DITA files are formatted to according to a well-defined set of rules.

Basic Block Elements

By default all DITA elements (not listed in the categories below) are indented one level further than the containing DITA element.

<topic id="basic-usage">
    <title>Basic usage</title>
    <body outputclass="language-markup">
        ...etc
    </body>
</topic>

Indented Block Elements

The following elements frequently contain a large body of text within them. The opening and closing tags are therefore always placed on a separate line before displaying the text found within them:

  • Topic elements: <abstract>, <shortdesc>
  • Body elements: <p>, <li>, <note>, <lq>
<ul>
    <li>
      This is an item in an unordered list.
    </li>
    <li>
      To separate it from other items in the list, the formatter puts a bullet beside it.
    </li>
    <li>
      The following paragraph, contained in the list item element, is part of the list
      item which contains it.
        <p>
          This is the contained paragraph.
        </p>
    </li>
    <li>
      This is the last list item in our unordered list.
    </li>
</ul>

Inline Elements

The following elements are treated as inline elements, they do not warrant an additional line and are kept within the surrounding text.

  • Body elements: <ph>, <codeph>, <synph>, <term>, <xref>, <cite>, <q>, <boolean>, <state>, <keyword>, <option>, <tm>, <fn>, <xref>
  • Programming elements: <parmname>, <apiname>
  • Typographic elements: <b>, <i>, <sup>, <sub>, <tt>, <u>
  • Software elements: <filepath>, <msgph>, <userinput>, <systemoutput>, <cmdname>, <msgnum>, <varname>
  • Userinteface elements: <uicontrol>, <menucascade>, <wintitle>
  • XML Mention Domain: <numcharref>, <parameterentity>, <textentity>, <xmlatt>, <xmlelement>, <xmlnsname>, <xmlpi>
<p>
  <b>STOP!</b> This is <b>very</b> important! Unplug the unit <i>before</i> placing the
  metal screwdriver against the terminal screw.
</p>

Text Elements

Text elements on a single line are kept within the containing element Text element on multiple lines are indented one level further than the surrounding text. Long lines of text are truncated to approximately 80 characters length by default before adding a carriage return. Carriage returns are usually placed so as not to split inline elements, but this is sometimes not feasible within the line limits, so a line break may occur before an inline attribute.

<p>
  The <xref format="html" scope="external"
  href="https://www.w3.org/TR/html5/grouping-content.html#the-pre-element">recommended
  way to mark up a code block</xref> (both for semantics and for Prism) is a <codeph><pre></codeph>
  element with a <codeph><code></codeph> element inside, like so:
</p>

Whitespace sensitive elements

The following elements are whitespace sensitive and require special processing:

  • <codeblock>, <lines>, <msgblock>, <pre>, <foreign>,

The opening tag of a <codeblock> is indented normally, the text within a <codeblock> (if any) is not offset by any additional indentation

<topic id="basic-usage">
    <title>Basic usage</title>
    <body outputclass="language-java">
        <p>
          Hello World in Java
        </p>
        <codeblock>public class java {
    public static void main(String[] args) {
        System.out.println("Hello World");
    }
}</codeblock>
        ...etc

<codeblock> elements containing <coderef> elements are indented as shown:

<codeblock outputclass="language-markup"><coderef href="../src/logo.svg"/>
</codeblock>

Other white-space sensitive elements such as <lines> are supported in a similar manner. If processing is found to be incorrect due to embedded elements, it is suggested that the author uses the pretty-ignore directive to maintain whitespace.