Designing Patterns

View from the Acropolis

A high level look at the theory and practice of software engineering



View from the Acropolis RSS FeedSite Feed

Characteristics of a Good Software Documentation Pipeline

My previous entry discussed software documentation standards and the importance of establishing one for a software development group. This is not sufficient to ensure that a group produces quality software documentation, however. A documentation creation and publishing pipeline also must be built to support the standard. Such a pipeline transforms software documentation created by programmers into formatted documents, such as web pages, and then publishes these documents on one or more repositories. A pipeline must satisfy a number of requirements in order for a group to derive the greatest benefit from creating documentation while expending the least effort and minimizing developer pain, which is crucial to ensuring wholehearted compliance with and peer enforcement of the standard.

The heart of a documentation pipeline is a tool or series of tools that read software documentation and generate formatted documents; these tools are called documentation generators. Such a tool might process a Ruby source file containing a class and generate a web page listing all of the class’ methods and displaying each method’s comments. Different documentation generation tools support different languages; in general, the more popular a language, the more tools will be available for it. Some tools, such as Doxygen, support a number of programming languages while others, such as RDoc, focus on only a single language, Ruby in this case. If a tool focuses on only one language, it likely will provide much better support for that language than does a more general tool. RDoc, for instance, is the only tool that supports Ruby’s metaprogramming constructs. When a group works with several languages, using the optimal tool for each language must be weighed carefully against using one tool for multiple languages. This is a very nuanced decision that should be made separately for each language and that depends on how tightly coupled the language’s code is with that of the other languages, how long it will take for the developers to master the language’s optimal tool, and how much benefit the language’s optimal tool offers over a multi-language tool. If a group maintains a code base that has different languages calling into each other, then I recommend using a single tool that can handle all of these languages and that preferably can understand the relationships across the language boundaries. At a prior employer I worked in a group that maintained a system written in C, C++, Fortran 77, and Fortran 95, and a documentation tool that could have understood the inter-language relationships would have been very valuable. RDoc, for example, provides this functionality for C and Ruby; although its purpose is to document Ruby code, it can parse C and understands the C/Ruby calling conventions, allowing it to document properly libraries consisting of both C and Ruby, which are common.

Aside from generating documents from code and comments, most widely-used documentation generation tools also define a markup language that can describe elements of the software documentation. A key characteristic of such a markup language is its expressiveness, particularly whether the markup can describe all program elements properly. Beyond being able to describe common elements like lists, headings, and emphasis, the markup also should be able to describe code constructs like method parameters, return values, and possible exceptions. While markup is not needed to generate basic documents, it greatly increases the expressiveness of the documentation and enhances the information conveyed by the generated documents. Describing elements with markup allows document formatting to be tweaked to display the elements in special ways, such using a particular color for method parameters. Without expressive markup, developers often use primitive markup in order to format documentation properly. RDoc, for instance, does not offer markup for parameters and return values, and so my company simulates such markup by using the less descriptive markup for headings and lists. Not only is the required markup longer, more painful to write, and duplicated throughout our code base, RDoc does not understand the elements properly and so cannot generate optimal documents with them.

Another important characteristic of a documentation generation tool’s markup language is how easily it can be understood, modified, and written. XML and HTML, for instance, are very verbose (since each tag must be closed), and I find that the tags obscure the content. Such complicated languages also are difficult to write correctly and so are painful for developers to use to write comments. By contrast, there are a variety of markdown languages that are easier to read and write (wikis often use markdown languages for this reason). It is vital that marked up comments be legible as text, since programmers working directly with the source code must be able to read them. This also is an important consideration if the package-level documentation is distributed as text files (for example, RDoc transforms a README that I maintain into beautiful HTML, but the text file remains very easy to read).

A defining characteristic of a documentation generation tool is the format of the documents that it generates. At a minimum, it should be able to generate web pages, since web pages can be viewed on virtually every platform, provide a plethora of stylistic options, make cross-references very easy to follow (via links), and can be searched easily. Additionally, if the web pages are published on the public internet, they will be indexed by search engines and so will advertise the software. Finally, web pages easily can be proofread by developers before releasing documentation. It is crucial that the default look of the web pages be as aesthetically pleasing and as readable as possible, so that the documents will be a pleasure to read. The look also should be customizable, so that the group’s developers can tweak it, and so that developers around the world can craft better looks. In addition to being more usable, appealing web pages are strong positive reinforcement for developers writing documentation. The tool also should generate cross-reference links to the appropriate documents for class names, method names, modules, and other source code entities, as such links make documentation much easier to use. The tool additionally should be able to link methods to syntax-highlighted source code, allowing readers to browse the source easily in the context of the documentation. Finally, the tool should be able to generate and embed class diagrams. Javadoc, Doxygen, and RDoc are examples of tools that offer all of these features.

Depending on the readers of a group’s software documentation, it may be desirable to publish documentation in other formats besides web pages. Some UNIX programmers, for instance, prefer to read software documentation on the command-line, and so tools exist for this such as man for UNIX utilities, perldoc for Perl scripts and libraries, pydoc for Python libraries, and ri for Ruby libraries. Windows programmers, on the other hand, often read software documentation in Windows Help, and so many tools can generate documents in this format. In order to make the documentation as usable as possible, the pipeline should generate all applicable output formats from a single documentation source, either with one tool or a series of tools. Javadoc, Doxygen, and RDoc all can generate documents in multiple formats.

After generating documents, a pipeline must publish them to at least one repository in order to allow developers to view them. The nature of this repository will, of course, depend on the document format. At a minimum, however, it should allow developers to search the documents. In addition, it should be accessible from wherever the developers will work; if developers log into a corporate system from home and code, for instance, then they also should be able to access the documentation repository from home. Depending on the software’s release cycle, the repository may need to store multiple versions of documentation, tracking different versions of the software. Lastly, it must be kept up to date, so that developers always can trust that it reflects the code base faithfully. This should be automated in order to eliminate human error and in order to save developers yet another chore when moving code. A prior employer ran a scheduled job that generated documentation nightly from the production code. At Designing Patterns, our build system generates and publishes documentation when releasing new versions of our open-source packages.

While it may take some work to establish, a good documentation pipeline will justify the effort by increasing programmer productivity and happiness, positively reinforcing the documentation standard.


You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

2 Responses to “Characteristics of a Good Software Documentation Pipeline”

  1. What’s your strategy for keeping documentation in sync when the underlying source code changes?

  2. At Designing Patterns, our build system (we’ve released a beta snapshot at http://sourceforge.net/projects/pattmake/) automatically rebuilds and publishes online documentation for our Ruby gems whenever we release a gem.

    Likewise, whenever we download a new gem, the gem system automatically builds documentation and publishes it locally for the new gem.

    Other strategies that I like:
    1.) Nightly rebuilds of documentation based on the production source.
    2.) A hook/trigger in the version control system that rebuilds documentation after a commit.

    One might be more attractive than the other for a particular group depending on how much documentation needs to be rebuilt for a commit and how many commits there are in a day.

Leave a Reply