Comp 15

System Building with make

These notes were originally written for another course that used C. I'm providing them as-is for those who want a very brief overview of the Unix make utility.

System Building

The principles of abstraction and modularity lead us to break a problem (and therefore the corresponding program) up into smaller pieces. For example, we define functions (methods, procedures, whatever your language calls them) to encapsulate solutions to subproblems that can be combined into a solution to a larger problem. (One way to build modular abstractions is to base them on common idioms/patterns in your code.)

Similarly, we may break a program up into separate modules, each of which implements a set of related abstractions (types, data structures, and functions). In most systems, this notion is bound up with the idea of the source file: the basic unit of editing and printing. In C, files and modules are exactly the same: there is no language-level notion (e.g., class, unit, module, cluster) at all.

This same divide and conquer approach applies to larger systems as well. For example, a project might involve building a searchable file system (Google for the desktop). Such a system will involve at least two processes: indexing and searching. Searching will involve a single server program with many components. Indexing will involve a process that coordinates a host of smaller mission-specific programs (each able to index files of a given type).

Breaking problems up this way has many advantages: Decomposing a problem into smaller pieces makes the whole problem easier to solve. It also makes the solution clearer to us and to others. Another benefit is that different people or groups can work on different pieces, which speeds up development. Finally, well-modularized code is also easier to maintain because bug fixes or feature additions can be localized to one (or a few) components. Widespread changes are not only harder, they risk the addition of more bugs!

Nothing is free, however. All these advantages come with a price: managing all the pieces we've created becomes very complicated very quickly. The problems fall into two broad, related categories: building the program or system and version control.

When everything is in one file, the build process is very simple: run the compiler on the file. As one of the homeworks shows, having just 2 or 3 files in the mix makes life a lot more complicated. It is common to spend hours debugging something that is not actually broken in the code because some source file was not recompiled — and the failure is showing up in another module. In fact, the author of the Unix make facility (the first widespread build tool) started the project in direct response to two episodes like this. In industry, it is quite common to have a whole group of programmers who work as a build team that is separate from the actual development team. (Industry also separates project testing into a quality assurance (QA) team.)

Version management is related: how do we ensure that we have the most recent version of all the files, or at least that a build is working with a consistent set of program components. The problem really comes to the fore when more than one person is writing code: Without help, the programmers will soon be spending more time coordinating their updates (and fixing inconsistencies introduced by multiple developers working concurrently). In the Unix community, CVS and, more recently, SVN, are the standard version management tools, though the Linux kernel community uses a tool called git. We will not be discussing version management further in this document.

make: Automating Builds

The make program's purpose is to keep track of file dependencies and figure out the minimum number of commands to execute (e.g., compiles) to generate some target. For example: (It might be a good idea to review the various phases of compilation.)

The make program takes such rules (and other information that we'll see below) and effectively constructs the transitive closure of the resulting dependency graph. Once this is done, make can build the entire system.

The genius of make is that it uses dependency information and the file modify times to figure out exactly what needs to be compiled. For example, you might want to make a program, say test_strncat, that depends on two .o files each of which depends on one C source file and one header file. If only one of the C source files has changed since the last build (that is, the source file is newer than its descendents in the dependency graph), then that file and anything that depends on it must be rebuilt.

This information is encoded in something called a make file, which is traditionally named Makefile or makefile. A make file contains variable definitions and dependency rules. Variable definitions look like this:


   CFLAGS   = -Wall -g -ansi -pedantic
   PROGRAMS = foo baz 
You can refer to the values of these variables in either of two ways: $(PROGRAMS) or ${PROGRAMS}, and references are replaced like macro definitions (think #define in a .h file). By convention, variables are given uppercase names.

Some variables are predefined and others have developed conventional uses. For example, CC holds the name of the C compiler. If you want to use another C compiler, you can set CC to your choice, and all the standard build rules will use the new value. CFLAGS is a set of flags to be used in every compilation.

Dependency rules have a target name, a colon, and a list of files the target depends on one line (you can use \ as a line continuation character if you need to). After this line, there are zero or more actions, shell commands, preceeded by a tab character. This insistence on the tab character is one of the most famous bone-headed decisions in Unix history.


   foo:   foo.o bar.o
           $(CC) $(CFLAGS) -o foo foo.o bar.o 
says that foo requires foo.o and bar.o. Once you have the these required targets built, you make a foo by calling the program stored in the variable CC with the approopriate arguments.

If you just type make and there is a file named Makefile or makefile in the current directory, then make will build the first target specified in the file. Traditionally, therefore, the first target is typically the entire system (and the target name is usually all) or the typical item users want to build. The target named install usually builds a system, and then installs the result on the current machine, e.g., by moving the program files to places like /usr/local/bin. Often anyone can build and run a system in a private directory, but installation requires administrator privileges. Another commonly used target name is clean, which usually has no dependencies. make clean should remove all temporary files (like executables and .o files) so that a fresh build from scratch can take place. Here is a common entry:


   clean:
        rm -rf foo *.o
which would remove the executable foo (from the example above) and all the object files.

Because the actions in a make file are arbitrary shell commands, they can be used for maintaining all sorts of things. For example, one can automate the building of a printable version of a book that involves processing input files for the text, running separate programs for building an index, and converting the result to a PDF file. On can also us make to build a web site from templates and some form of customization information.

Mark A. Sheldon (msheldon@cs.tufts.edu)
Last Modified 2017-Sep-07