DITA Prettifier Plug-in
This is a DITA prettifier DITA-OT Plug-in which formats DITA XML in an
aesthetically pleasing manner. <topic>
elements, <section>
elements, <p>
elements etc. are regularly indented so
the raw DITA XML files can be scanned by humans:
Example
A typical DITA file can contain long lines, missing carriage returns and un-aligned elements:
After running the pretty-dita transform, the same file will have all its elements aligned, each block element on a new line and text should not overrun the side of a typical view screen (approx 120 characters).
Install
This is a standalone plug-in without dependencies which can be installed from the command line.
Run the plug-in installation commands:
dita install https://github.com/jason-fox/fox.jason.pretty-dita/archive/master.zip
The dita command line tool requires no additional configuration.
Usage
Like any other transform, when invoked directly, the prettier requires an input document.
Prettifying DITA files for a document
To prettify DITA files for a document, set the transtype parameter to pretty-dita, or pass the --format=pretty-dita option to the dita command line.
dita --format pretty-dita \
--input document.ditamap
All *.dita
<topic>
and *.ditamap
<map>
files under that directory will be updated in place.
Prettifying a single DITA file
Alternatively, to prettify a single DITA file, set the --input
parameter to point to a *.dita
file:
dita --format pretty-dita \
--input topic.dita
The specified file will be updated in place.
Parameter Reference
- args.indent - How many characters to indent (default 4)
- args.style - Whether to indent using tabs or spaces (default spaces)
-
args.print-width - Specify the line length that the
printer will wrap on (default
80
) -
args.require-pragma - Restrict the plug-in to only format files that contain a special comment, called a pragma, at the top of the file (default false).
This is very useful when gradually transitioning large, unformatted codebases to pretty-dita.
For example, a file containing the following comment will be formatted when args.require-pragma is supplied if either of the following are present:
<!-- @prettier -->
<!-- @format -->
args.insert-pragma - Insert a special
@format
marker at the top of files specifying that the file has been formatted with the plugin (defaultfalse
)
Ignoring DITA files
The prettifier will ignore any DITA file containing a comment starting prettier-ignore
- the file will not be updated.
...
<topic id="basic-usage">
<!-- prettier-ignore -->
<title>Basic usage</title>
<body outputclass="language-markup">
<lines>
This file really doesn't need formatting
We want it to look this way.
</lines>
<p>This will also be left alone.</p>
<p>This will be left as well
</p>
</body>
</topic>
Formatting Rules
The pretty-dita DITA-OT Plug-in is an opinionated code formatter, DITA files are formatted to according to a well-defined set of rules.
Basic Block Elements
By default all DITA elements (not listed in the categories below) are indented one level further than the containing DITA element.
<topic id="basic-usage">
<title>Basic usage</title>
<body outputclass="language-markup">
...etc
</body>
</topic>
Indented Block Elements
The following elements frequently contain a large body of text within them. The opening and closing tags are therefore always placed on a separate line before displaying the text found within them:
-
Topic elements:
<abstract>
,<shortdesc>
-
Body elements:
<p>
,<li>
,<note>
,<lq>
<ul>
<li>
This is an item in an unordered list.
</li>
<li>
To separate it from other items in the list, the formatter puts a bullet beside it.
</li>
<li>
The following paragraph, contained in the list item element, is part of the list
item which contains it.
<p>
This is the contained paragraph.
</p>
</li>
<li>
This is the last list item in our unordered list.
</li>
</ul>
Inline Elements
The following elements are treated as inline elements, they do not warrant an additional line and are kept within the surrounding text.
-
Body elements:
<ph>
,<codeph>
,<synph>
,<term>
,<xref>
,<cite>
,<q>
,<boolean>
,<state>
,<keyword>
,<option>
,<tm>
,<fn>
,<xref>
-
Programming elements:
<parmname>
,<apiname>
-
Typographic elements:
<b>
,<i>
,<sup>
,<sub>
,<tt>
,<u>
-
Software elements:
<filepath>
,<msgph>
,<userinput>
,<systemoutput>
,<cmdname>
,<msgnum>
,<varname>
-
Userinteface elements:
<uicontrol>
,<menucascade>
,<wintitle>
-
XML Mention Domain:
<numcharref>
,<parameterentity>
,<textentity>
,<xmlatt>
,<xmlelement>
,<xmlnsname>
,<xmlpi>
<p>
<b>STOP!</b> This is <b>very</b> important! Unplug the unit <i>before</i> placing the
metal screwdriver against the terminal screw.
</p>
Text Elements
Text elements on a single line are kept within the containing element Text element on multiple lines are indented one level further than the surrounding text. Long lines of text are truncated to approximately 80 characters length by default before adding a carriage return. Carriage returns are usually placed so as not to split inline elements, but this is sometimes not feasible within the line limits, so a line break may occur before an inline attribute.
<p>
The <xref format="html" scope="external"
href="https://www.w3.org/TR/html5/grouping-content.html#the-pre-element">recommended
way to mark up a code block</xref> (both for semantics and for Prism) is a <codeph><pre></codeph>
element with a <codeph><code></codeph> element inside, like so:
</p>
Whitespace sensitive elements
The following elements are whitespace sensitive and require special processing:
-
<codeblock>
,<lines>
,<msgblock>
,<pre>
,<foreign>
,
The opening tag of a <codeblock>
is indented normally,
the text within a <codeblock>
(if any) is not offset by
any additional indentation
<topic id="basic-usage">
<title>Basic usage</title>
<body outputclass="language-java">
<p>
Hello World in Java
</p>
<codeblock>public class java {
public static void main(String[] args) {
System.out.println("Hello World");
}
}</codeblock>
...etc
<codeblock>
elements containing <coderef>
elements are indented as shown:
<codeblock outputclass="language-markup"><coderef href="../src/logo.svg"/>
</codeblock>
Other white-space sensitive elements such as <lines>
are
supported in a similar manner. If processing is found to be incorrect due to
embedded elements, it is suggested that the author uses the pretty-ignore
directive to maintain whitespace.