2016-08-31

To write a book with AsciiDoc

During the summer of 2016 I've been working on a book, the contents of the book isn't relevant for this article, but how I've written the book is what I'm going to express here. Previously, well, a few years ago, I started writing the same book and then used OpenOffice. Unfortunately, the files were destroyed, some years passed, and eventually I decided to rewrite everything from the start. This time I was going to store files and versions in a repository, and I wanted to write the book in a text format, instead of using OpenOffice. Some googling later, I found a tool called AsciiDoc — which seemed perfect!

I'm used to writing documentation in Wiki Markup, and Markdown, I've even written a couple of Markdown parsers in both PHP and C; and so, using a format similar to Markdown when writing books was exactly what I was looking for. Shortly after the AsciiDoc software was installed and I had written a few chapters. It wasn't as simple as that, however.

Writing a book in AsciiDoc can be done without trouble, because, after all, all you need is a text editor — since it's a format similar to Markdown. In fact, all files are stored as .adoc files, and are simple text files. It's when I tried to build a PDF from the sources that the difficulties arose. Suddenly, I realised, I had to learn no less than three XML type of languages: DocBook, DocBook XSL, and XSL-FO, to style the PDF output. AsciiDoc text can easily be transformed into PDF documents, but will use a default style template which need to be modified unless you feel satisfied with the style you get — and I wasn't.

Eventually, I found all the documentation needed for learning all of the languages, and to set up a working environment for completing my book. When I had been writing using OpenOffice (and this would have been exactly the same with Word or any other word processor software), I noticed that more time went into styling the text than was put into actually writing. With AsciiDoc, everything is very clear, all the styling is in the text. For example: let's say I have a set of keywords, which I'll use throughout the book, and I want all instances of the keywords to be styled with a special font. In a word processor, I'd have to generate the style, then mark all such words and select the style.

With the system I've designed, all keywords are written within curly braces "{keyword}". I've created a Makefile for generating the PDF, and a PHP script to pre-process all files before they are being built. The PHP script accepts keywords and have lists with directions for how to transform them. Some keywords are output as text, others as Unicode symbols, and I can select in the script what styles to use for each keyword. Since all files are simple text files, it's possible to make advanced Regex search and replaces, fast and reliable.

It may seem a bit more than using a word processor, such at OpenOffice or Word, but this is a one time set up; once all stylesheets and other files are prepared, all the time can be put into writing the book. What actually takes time, is learning the tools, and after that you can just write away.

AsciiDoc documents can be formatted into many document formats: HTML and EPUB, for example. This post will only focus on explaining how to transform into PDF documents.

So, this is basically the system I've designed for building my book:

├─ book/*.adoc  <- All chapters are stored here as individual files
├─ docbook/fo.xsl
├─ docbook/fop.xml
├─ fonts/*.ttf  <- All fonts used are stored in the project
├─ icons/*.svg  <- Icons for admonitions, as SVG images
├─ images/*.svg  <- All images are Scalar Vector Graphics
├─ asciidocx  <- PHP script for pre-processing keywords
├─ asciidocx-keywords  <- PHP script containing lists of keywords
├─ book.adoc  <- Main AsciiDoc file
├─ Makefile  <- Makefile for building the PDF
└─ book.pdf  <- The book is output to PDF

The Makefile looks like this:

BOOK_NAME = book

all: docbook5 pdf

docbook5:
   mkdir -p asciidoc
   cp book/*.adoc asciidoc
   ./asciidocx asciidoc/*.adoc
   asciidoctor -o book.xml -b docbook5 -d book book.adoc

fo: docbook5
   xsltproc -o book.fo docbook/fo.xsl book.xml

pdf: fo
   fop -c docbook/fop.xml -pdf $(BOOK_NAME).pdf -fo book.fo

clean:
   rm -rf asciidoc book.xml $(BOOK_NAME).html book.fo $(BOOK_NAME).pdf docbook/titlepage.xsl

You may notice that three programs are needed: asciidoctor, xsltproc, and fop. "asciidoctor" is a Ruby version of AsciiDoc, that can be found here. "xsltproc" is a processor for XSL, which is a XML formatting language, and instructions for installing it can be found here. "fop" is a tool in Java developed by the Apache Software Foundation, and instructions for installing it are found here.

To install it on Ubuntu and Debian, open a terminal and run:

sudo apt-get install asciidoctor xsltproc fop

Now we will look at the main book document "book.adoc":

= My Book
Per Löwgren <per.lowgren@gmail.com>
v0.1, 2016-07-22
:homepage: http://asciidoctor.org
:doctype: book
:listing-caption: Listing
:imagesdir: images
:stylesdir: css
:lang: en
:toc:
:toclevels: 2
:icons: font
:version: 0.1

include::asciidoc/introduction.adoc[]
include::asciidoc/chapter-about.adoc[]
include::asciidoc/chapter-1.adoc[]
include::asciidoc/chapter-2.adoc[]
include::asciidoc/chapter-3.adoc[]
include::asciidoc/chapter-etc.adoc[]
include::asciidoc/appendix.adoc[]
include::asciidoc/bibliography.adoc[]

[index]
[[index]]
== Index

To learn how to write AsciiDoc, there is very good online documentation:

  1. http://asciidoctor.org/docs/
  2. http://asciidoctor.org/docs/user-manual/
  3. https://powerman.name/doc/asciidoc/
  4. http://asciidoctor.org/docs/asciidoc-recommended-practices/
  5. http://chimera.labs.oreilly.com/books/1234000001578/ch02.html

All chapters are placed in separate files in the asciidoc-directory, generated by the PHP script "asciidocx":

#!/usr/bin/env php
<?php

array_shift($argv);

require(__DIR__.'/asciidocx-keywords');

foreach($argv as $file) {
 $txt = file_get_contents($file);
 if($txt) {
  $txt = preg_replace_callback(
   '/\{([\^\~]?)([\w\-]+)(?:\:?\s*([^\}]+?)\s*)?\}/',
   function($m) {
    global $keywords,$symbols;

    $prefix = $m[1];
    $str = $m[2];
    $key = strtolower($str);
    $suffix = isset($m[3])? $m[3] : false;

    if(isset($symbols[$key])) {
     $sym = $symbols[$key];
     $role = $sym[0];
     $ret = '';
     if(!$suffix) $ret = $sym[1];
     else {
      $arr = explode(' ',$key.' '.strtolower($suffix));
      foreach($arr as $s)
       if(isset($symbols[$s])) {
        $sym = $symbols[$s];
        if($sym[0]==$role) $ret .= $sym[1];
        else echo "Improper tag \"{$m[0]}\", symbol \"{$s}\" has different style.\n";
       }
     }
     return "[{$role}]*{$ret}*";
    } elseif(isset($keywords[$key])) {
     $kw = $keywords[$key];
     $role = $kw[0];
     $key = $kw[1];
     if($key!==false) {
      if(!ctype_upper($str[0])) $str = $key;
      else $str = strtoupper($key[0]).substr($key,1);
     }
     if($role===false || $prefix=='^') return "{$str}{$suffix}";
     return "[{$role}]_{$str}{$suffix}_"; 
    } elseif($prefix=='~') {
     return "[".DEFAULT_STYLE."]_{$str}_";
    } else {
     return $m[0];
    }

   },
   $txt
  );
 }
 echo "{$file}\n";
 file_put_contents($file,$txt);
}

And the PHP script "asciidocx-keywords":

<?php

define('DEFAULT_STYLE','style');

$keywords = array(
 'term'           => array('style-term',   'Term'),
 ...

 'word'           => array('style-word',   false),
 ...
);

$symbols = array(
 'right-arrow'    => array('style-symbol', '➙'),
 ...
);

Fill in the arrays with keywords, and enter what styles to use. The styles are defined in another script called "docbook/fo.xsl":

<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:fo="http://www.w3.org/1999/XSL/Format"
                xmlns:db="http://docbook.org/ns/docbook" version="1.0">
<xsl:import href="/usr/share/xml/docbook/stylesheet/docbook-xsl-ns/fo/docbook.xsl"/>

<xsl:output method="xml" indent="yes"/>
<xsl:param name="use.extensions">0</xsl:param>
<xsl:param name="fop1.extensions">1</xsl:param>

<xsl:param name="base.dir"></xsl:param>

<xsl:param name="alignment">justify</xsl:param>
<xsl:param name="hyphenate">true</xsl:param>
<xsl:param name="draft.mode">no</xsl:param>
<xsl:param name="show.comments">1</xsl:param>

<xsl:param name="header.rule">0</xsl:param>
<xsl:param name="footer.rule">0</xsl:param>

<xsl:param name="body.start.indent">0pt</xsl:param>

<xsl:param name="body.font.family">serif</xsl:param>
<xsl:param name="body.font.master">10</xsl:param>
<xsl:param name="dingbat.font.family">Symbola</xsl:param>
<xsl:param name="monospace.font.family">monospace</xsl:param>
<xsl:param name="sans.font.family">sans-serif</xsl:param>
<xsl:param name="title.font.family">Cinzel,serif</xsl:param>
<xsl:param name="symbol.font.family">Symbola,FontAwesome,Lucida Sans Unicode</xsl:param>

<xsl:param name="footnote.font.size">9</xsl:param>

<xsl:param name="toc.max.depth">3</xsl:param>

<xsl:param name="part.autolabel">I</xsl:param>
<xsl:param name="chapter.autolabel">1</xsl:param>
<xsl:param name="section.autolabel">1</xsl:param>
<xsl:param name="appendix.autolabel">A</xsl:param>
<xsl:param name="reference.autolabel">I</xsl:param>

<xsl:param name="section.autolabel.max.depth">2</xsl:param>
<xsl:param name="section.label.includes.component.label">1</xsl:param>

<xsl:param name="xref.with.number.and.title">1</xsl:param>
<xsl:param name="insert.xref.page.number">yes</xsl:param>

<xsl:template match="db:*[@role='style-term']">
 <fo:inline font-family="Cinzel" font-weight="bold" font-style="normal" font-size="9.5">
  <xsl:apply-templates/>
 </fo:inline>
</xsl:template>

<xsl:template match="db:*[@role='style-word']">
 <fo:inline font-family="Cinzel" font-weight="normal" font-style="italic" font-size="9.5">
  <xsl:apply-templates/>
 </fo:inline>
</xsl:template>

<xsl:template match="db:*[@role='style-symbol']">
 <fo:inline font-family="Symbola" font-weight="normal" font-style="normal">
  <xsl:apply-templates/>
 </fo:inline>
</xsl:template>

</xsl:stylesheet>

This is not my entire XSL stylesheet, only an example. Learning how to design the XSL stylesheet requires some studying, but first you need to learn the DocBook XML, which is generated by asciidoctor. In other words: asciidoctor transforms the .adoc files into one XML document, in the Makefile it's called "book.xml". We will not look at "book.xml" here; however, I provide a list of documentation for learning DocBook XML:

  1. http://www.docbook.org/tdg5/en/html/docbook.html
  2. http://doccookbook.sourceforge.net/html/en/index.html

Now we're almost ready to learn DocBook XSL, but first we need to look at XSL-FO, the language that the docbook document transforms into before again being transformed into the PDF document. Fortunately, we won't have to learn this language, only understand its basics:

  1. https://xmlgraphics.apache.org/fop/fo.html
  2. https://www.w3.org/2002/08/XSLFOsummary.html

DocBook XSL stylesheets are similar to ordinary CSS documents, but much more complex and even contains a Turing complete language, so that you can program exactly how you want your books to look, more or less. DocBook XSL has its own format, with attributes and parameters, but by learning XSL-FO it's possible to use templates for styling all parts in detail.

  1. http://www.sagehill.net/docbookxsl/index.html
  2. http://nwalsh.com/docs/articles/dbdesign/
  3. http://docbook.sourceforge.net/release/xsl/current/doc/fo/index.html
  4. https://www.w3.org/TR/2001/REC-xsl-20011015/xslspec.html
  5. http://www.w3schools.com/xsl/

With this information, we now can write books and design in any way we like! Oh, I forgot! We need to declare one more file "docbook/fop.xml":

<?xml version="1.0"?>
<fop version="1.0">
 <renderers>
  <renderer mime="application/pdf">
   <fonts>
    <font kerning="yes" embed-url="./fonts/Cinzel-Regular.ttf" embedding-mode="subset">
     <font-triplet name="Cinzel" style="normal" weight="normal"/>
    </font>
    <font kerning="yes" embed-url="./fonts/Cinzel-Bold.ttf" embedding-mode="subset">
     <font-triplet name="Cinzel" style="normal" weight="bold"/>
    </font>
    <font kerning="yes" embed-url="./fonts/fontawesome-webfont.ttf" embedding-mode="subset">
     <font-triplet name="FontAwesome" style="normal" weight="normal"/>
    </font>
    <font kerning="yes" embed-url="./fonts/fontawesome-webfont.ttf" embedding-mode="subset">
     <font-triplet name="FontAwesome" style="normal" weight="bold"/>
    </font>
    <font kerning="yes" embed-url="./fonts/Symbola.ttf" embedding-mode="subset">
     <font-triplet name="Symbola" style="normal" weight="normal"/>
    </font>
    <font kerning="yes" embed-url="./fonts/Symbola.ttf" embedding-mode="subset">
     <font-triplet name="Symbola" style="normal" weight="bold"/>
    </font>
   </fonts>
  </renderer>
 </renderers>
</fop>

There we are, now we can use the fonts we like too. Simple as that! Should you find any trouble in designing your books, perhaps I can help? Just let me know!

No comments:

Post a Comment