Designing Patterns

View within the Urban Jungle

Discussing nitty gritty software development details that Tony finds interesting



View within the Urban Jungle RSS FeedSite Feed

Non-recursive Make Builds always are Better than Recursive Make Builds

Pattmake is the gmake-based build system used by Designing Patterns (I discuss why I chose to use gmake here). I had three main requirements when designing pattmake:

  1. That it be non-recursive
  2. That it easily accommodates simultaneous builds for different architectures
  3. That files generated by builds (build artifacts and targets, such as object files) be put in a different directory than that of the associated source code

I will discuss the reasons for and the implementation of each requirement in my next blog entries.

The first requirement might be a bit surprising, since recursive build systems still are very common. They are fundamentally and irretrievably flawed, however. Briefly, in a recursive build system, make spawns a new make process in order to build a subdirectory. This seems like a wonderful idea because:

  • Make works best when the source files are in make's current working directory. Developers do not have to take different source directories into account when writing rules, for instance.
  • The makefile for a given directory is completely independent of those for other directories.
  • Writing a recursive makefile requires minimal knowledge of the build infrastructure, since a recursive makefile is not very different than a standalone makefile.

These are very compelling advantages, and so recursive build systems often are used despite their fundamental flaw. A bit of background knowledge about make's implementation is required in order to understand the flaw, however. Make works in two phases

  1. Make processes the makefile, finding all possible targets, associating prerequisites with those targets, and storing commands that transform prerequisites into targets. From this information, it constructs a dependency graph.
  2. Make locates the target that it is supposed to build and works backward through the dependency graph from that target, rebuilding anything that is out of date using the commands stored in the first phase.

Recursive build systems are flawed because they restrict each make instance to creating a dependency graph with only the build information contained in the directory the instance is processing, resulting in truncated graphs. I’m going to prove this with the build for an example hello_world program (you have three chances to guess the program’s functionality, and the first two don’t count), the source for which can be downloaded here . This program’s build should work with any reasonably modern make, although I’ve only tested with gmake. The directory structure is:

  non_recursive.mk
  recursive.mk
  rules.mk
  bin:
      hello_world.cc
      non_recursive.mk
      recursive.mk
  lib:
      hello.cc
      hello.h
      non_recursive.mk
      recursive.mk

The package actually contains two build systems for the hello_world program, one recursive (consisting of rules.mk and all of the recursive.mk files; build with make -f recursive.mk all) and one non-recursive (consisting of rules.mk and all of the non_recursive.mk files; build with make -f non_recursive.mk all). Here is bin/recursive.mk:


TOP_LEVEL_DIR=..

include $(TOP_LEVEL_DIR)/rules.mk

hello_world: hello_world.o ../lib/hello.a
        $(CXX) $(LDFLAGS) -o $@ $^

When make recurses and launches a new instance for the bin directory, the new instance knows almost nothing about lib/hello.a. The only thing that it knows is the library’s time of last modification, which, if more recent than hello_world's time of last modification, will trigger a relink of hello_world. It does not know, however, how lib/hello.a is itself built and thus whether or not it needs to be rebuilt before linking bin/hello_world. The new instance has incomplete dependency information and thus can produce an incorrect bin/hello_world build. If the hello_world() function declaration and definition in lib/hello.h and lib/hello.cc are changed without lib being rebuilt, for instance, a broken bin/hello_world will be built. bin/hello_world will link against a stale lib/hello.a and get the old function definition, because the make instance for bin cannot trigger a rebuild for lib/hello.a as it is built in a different subdirectory. bin/hello_world.o will be compiled with the new function declaration in lib/hello.h, however. This will result in a likely fatal mismatch between the hello_world() function that bin/hello_world.cc tries to call and the hello_world() function that actually exists in lib/hello.a. The textbook solution for this problem is to duplicate the dependency of bin/hello_world on lib/hello.a in the top-level recursive.mk like so:


TOP_LEVEL_DIR=.

include $(TOP_LEVEL_DIR)/rules.mk

.PHONY: \
        all \
        bin \
        lib \
        clean

all: bin

bin: lib
        (cd $@; $(MAKE) -f recursive.mk)

lib:
        (cd $@; $(MAKE) -f recursive.mk)

#
# This is not a good way to implement clean,
# but for this simple example...
#
clean:
        $(RM) bin/*.o bin/hello_world lib/*.o lib/*.a

Make is told that both bin and lib are .PHONY targets. .PHONY is a directive (not a real target) that informs make that the targets that it modifies are not actual file system objects (and so are not physically built by make) and always should be rebuilt by make if they appear in the dependency chain (that is, .PHONY targets’ prerequisites should be evaluated and any associated commands should be run). Since they are not really file system objects, make never will try to check .PHONY targets’ modification times. .PHONY targets can be used to alias a group of physical targets. In this case, bin and lib are aliases for all of the targets in the given subdirectory, resulting in make always recursing into the given subdirectory. The dependency of bin/hello_world on lib/hello.a is duplicated in the dependency of bin on lib, which forces make to recurse first into lib and then into bin. This textbook solution is flawed in two ways. Firstly, duplication of dependency information is used to paper over the underlying problem, and it is very easy to modify the dependency in one place without changing it in the other place (resulting in a subtly broken build). Secondly, this band-aid only works if the build is kicked off at the top-level; a programmer who accidentally builds bin/hello_world in bin will create a broken executable if lib/hello.a is stale (the error scenario described above).

The problem described in the example may seem manageable, but the negative effects of a recursive build become corrosively bad in much larger software systems that have many more executables and libraries, a much more complicated web of dependencies, and code generation. A former employer had such a situation, and broken builds due to dependencies not being duplicated properly were quite common. The band-aid described above became very fragile. Peter Miller’s seminal paper on non-recursive builds mentions that many shops in fact decide to forgo the textbook band-aid and, instead, build the same code multiple times in order to paper over the recursive make instances’ lack of full dependency knowledge. This “solution” leads to a dramatically slower build, which can become a development bottleneck very quickly for large systems.

The alternative, a non-recursive build system, has a higher initial setup cost (the build system infrastructure is significantly more complicated than that of an equivalent recursive build system) and carries with it a steeper learning curve, but, by providing make with the full dependency graph, is the superior solution. The simplest non-recursive build system consists of a single makefile that is responsible for building multiple directories. This might be fine for small to medium projects but does not scale well. A far better alternative and the one that pattmake uses is to have a single makefile per directory that handles building the directory but to have make include these directory-specific makefiles rather than recursively processing them. This allows each directory to have a makefile that is independent of those for other directories, just as in recursive build systems. The top-level non_recursive.mk in the hello_world example demonstrates this technique:


TOP_LEVEL_DIR=.

include rules.mk

include bin/non_recursive.mk
include lib/non_recursive.mk

.PHONY:
        all \
        clean

all: $(PROGRAMS)

#
# This is not a good way to implement clean,
# but for this simple example...
#
clean:
        $(RM) bin/*.o bin/hello_world lib/*.o lib/*.a

Here is bin/non_recursive.mk:


bin/hello_world: bin/hello_world.o lib/hello.a
        $(CXX) $(LDFLAGS) -o $@ $^

PROGRAMS += bin/hello_world

Here is lib/non_recursive.mk:


lib/hello.a: lib/hello.o

While the makefiles may look a bit odd because sources and targets have to have a directory specified (make builds everything with the top-level directory as its current working directory in this system, which is quite different than in the recursive build where make changes to the directory to be processed before spawning a new make instance), developers quickly will get used to this. In fact, even recursive build systems may have such complication if the targets are being put into a different directory than the source, which is quite common for multi-architecture builds. The benefits of a non-recursive build system are:

  • Make always is able to build a full dependency graph, without any duplication of dependencies on the part of the programmer. Note that the top-level non_recursive.mk in the hello_world example does not contain any dependency relationship between bin and lib.
  • Unlike recursive build systems, in which only full, top-level system builds can be correct, non-recursive build systems can build specified targets (say a particular program) correctly, since, regardless of what it’s told to build, make always creates a full dependency graph. Note that make -f non_recursive.mk bin/hello_world works perfectly well while make -f recursive.mk bin/hello_world fails horribly. It is very easy for a non-recursive build system to provide this functionality in whichever subdirectory it is invoked, which pattmake does (although the hello_world example does not).
  • Well organized, complex build systems, either recursive or non-recursive, often have large build rule files (containing, for instance, implicit rules to transform .c files into .o files) that need to be included in each makefile. This ensures that the same build rules are used throughout the build. Even the simple hello_world example has such a file, rules.mk. Each recursive makefile has to include such system-wide files (every recursive.mk in the hello_world example does this), forcing make to process such files many times (once per each directory). Non-recursive makefiles, however, are processed by being included by some top-level makefile, and so only the top-level makefile needs to include the system-wide makefiles (rules.mk only is included by the top-level non_recursive.mk in the hello_world example). Thus, make only needs to process the system-wide makefiles once per non-recursive build.

The ability of non-recursive build systems to deliver correct builds consistently, without any duplication of dependency information, should lead to their always being chosen over recursive build systems.


You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Leave a Reply