Document Conversion and Docs-as-Code
If you have a document in any of pandoc’s many supported file formats, converting it to any of the others is a cinch. That’s a handy tool to have!
But the real power of pandoc becomes apparent when you use it as the basis of a simple docs-as-code system. The premise of docs-as-code is to adopt some of the techniques and principles of software development and apply them to writing documentation, especially for software development projects. You can apply it to the development of any kind of documentation, though.
Software developers use their favorite editor or integrated development environment (IDE) to write their programs. The code they type is saved in text files. These contain the source code for the program.
They use a version control system, or VCS (Git is the most popular), to capture changes to the source code as it’s developed and enhanced. This means the programmer has a complete history of all versions of the source code files. He or she can quickly access any previous version of a file. Git stores files in a repository. There’s a local repository on each developer’s computer and a central, shared, remote repository that’s often cloud-hosted.
When they’re ready to produce a working version of the program, they use a compiler to read the source code and generate a binary executable.
By writing your documents in a lightweight, text-based markup language, you can use a VCS to version control your writing. When you’re ready to distribute or publish a document, you can use pandoc to generate as many different versions of your documentation as you need, including web-based (HTML), word-processed or typeset (LibreOffice, Microsoft Word, TeX), portable document format (PDF), e-book (ePub), and so on.
You can do all of this from one set of version-controlled, lightweight text files.
Installing pandoc
To install pandoc on Ubuntu, use this command:
On Fedora, the command you need is the following:
On Manjaro, you need to type:
You can check which version you have installed by using the –version option:
Using pandoc Without Files
If you use pandoc without any command-line options, it also accepts typed input. You just press Ctrl+D to indicate you’ve finished typing. pandoc expects you to type in Markdown format, and it generates HTML output.
Let’s look at an example:
We’ve typed a few lines of Markdown and are about to hit Ctrl+D.
As soon as we do, pandoc generates the equivalent HTML output.
To do anything useful with pandoc, though, we really need to use files.
Markdown Basics
Markdown is a lightweight markup language, and special meaning is given to certain characters. You can use a plain text editor to create a Markdown file.
Markdown can be read easily, as there are no visually cumbersome tags to distract from the text. Formatting in Markdown documents resembles the formatting it represents. Below are some of the basics:
To emphasize text with italics, wrap it in asterisks. This will be emphasized To bold text, use two asterisks. This will be in bold Headings are represented by the number sign/hash mark (#). Text is separated from the hash by a space. Use one hash for a top-level heading, two for a second-level, and so on. To create a bulleted list, start each line of the list with an asterisk and insert a space before the text. To create a numbered list, start each line with a digit followed by a period, and then insert a space before the text. To create a hyperlink, enclose the name of the site in square brackets ([]), and the URL in parentheses [()] like so: [Link to How to Geek](https://www. howtogeek. com/). To insert an image, type an exclamation point immediately before brackets (![]). Type any alternative text for the image in the brackets. Then, enclose the path to the image in parentheses [()“]. Here’s an example: ![The Geek](HTG. png).
We’ll cover more examples of all of these in the next section.
RELATED: What Is Markdown, and How Do You Use It?
Converting Files
File conversions are straightforward. pandoc can usually work out which file formats you’re working with from their filenames. Here, we’re going to generate an HTML file from a Markdown file. The -o (output) option tells pandoc the name of the file we wish to create:
Our sample Markdown file, sample.md, contains the short section of Markdown shown in the image below.
A file called sample.html is created. When we double-click the file, our default browser will open it.
Now, let’s generate an Open Document Format text document we can open in LibreOffice Writer:
The ODT file has the same content as the HTML file.
A neat touch is the alternative text for the image is also used to automatically generate a caption for the figure.
Specifying File Formats
The -f (from) and -t (to) options are used to tell pandoc which file formats you want to convert from and to. This can be useful if you’re working with a file format that shares a file extension with other related formats. For example, TeX, and LaTeX both use the “.tex” extension.
We’re also using the -s (standalone) option so pandoc will generate all the LaTeX preamble required for a document to be a complete, self-contained, and well-formed LaTeX document. Without the -s (standalone) option, the output would still be well-formed LaTeX that could be slotted into another LaTeX document, it wouldn’t parse properly as a standalone LaTeX document.
We type the following:
If you open the “sample.tex” file in a text editor, you’ll see the generated LaTeX. If you have a LaTeX editor, you can open the TEX file to see a preview of how the LaTeX typesetting commands are interpreted. Shrinking the window to fit the image below made the display look cramped, but, in reality, it was fine.
We used a LaTeX editor called Texmaker. If you want to install it in Ubuntu, type the following:
In Fedora, the command is:
In Manjaro, use:
Converting Files with Templates
You’re probably starting to understand the flexibility that pandoc provides. You can write once and publish in almost any format. That’s a great feat, but the documents do look a little vanilla.
With templates, you can dictate which styles pandoc uses when it generates documents. For example, you can tell pandoc to use the styles defined in a Cascading Style Sheets (CSS) file with the –css option.
We’ve created a small CSS file containing the text below. It changes the spacing above and below the level header one style. It also changes the text color to white, and the background color to a shade of blue:
The full command is below—note that we also used the standalone option (-s):
pandoc uses the single style from our minimalist CSS file and applies it to the level one header.
Another fine-tuning option you have available when working with HTML files is to include HTML markup in your Markdown file. This will be passed through to the generated HTML file as standard HTML markup.
This technique should be reserved for when you’re only generating HTML output, though. If you’re working with multiple file formats, pandoc will ignore the HTML markup for non-HTML files, and it will be passed to those as text.
We can specify which styles are used when ODT files are generated, too. Open a blank LibreOffice Writer document and adjust the heading and font styles to suit your needs. In our example, we also added a header and footer. Save your document as “odt-template.odt.”
We can now use this as a template with the –reference-doc option:
Compare this with the ODT example from earlier. This document uses a different font, has colored headings, and includes headers and footers. However, it was generated from the exact same “sample.md” Markdown file.
Reference document templates can be used to indicated different stages of a document’s production. For example, you might have templates that have “Draft” or “For Review” watermarks. A template without a watermark would be used for a finalized document.
Generating PDFs
By default, pandoc uses the LaTeX PDF engine to generate PDF files. The easiest way to make sure you have the appropriate LaTeX dependencies satisfied is to install a LaTeX editor, such as Texmaker.
That’s quite a big install, though—Tex and LaTeX are both pretty hefty. If your hard drive space is limited, or you know you’ll never use TeX or LaTeX, you might prefer to generate an ODT file. Then, you can just open it in LibreOffice Writer and save it as a PDF.
Docs-as-Code
There are several advantages to using Markdown as your writing language, including the following:
Working in plain text files is fast: They load faster than similarly sized word processor files, and tend to move through the document faster, too. Many editors, including gedit , Vim , and Emacs, use syntax highlighting with Markdown text. You’ll have a timeline of all versions of your documents: If you store your documentation in a VCS, such as Git, you can easily see the differences between any two versions of the same file. However, this only really works when the files are plain text, as that’s what a VCS expects to work with. A VCS can record who made any changes, and when: This is especially helpful if you often collaborate with others on large projects. It also provides a central repository for the documents themselves. Many cloud-hosted Git services, such as GitHub, GitLab, and BitBucket, have free tiers in their pricing models. You can generate your documents in multiple formats: With just a couple of simple shell scripts, you can pull in the styles from CSS and reference documents. If you store your documents in a VCS repository that integrates with Continuous Integration and Continuous Deployment (CI/CD) platforms, they can be generated automatically whenever the software is built.
RELATED: What Is GitHub, and What Is It Used For?
Final Thoughts
There are many more options and features within pandoc than what we’ve covered here. The conversion processes for most file types can be tweaked and fine-tuned. To learn more, check out the excellent examples on the official (and extremely detailed) pandoc web page.