Comp 40:
A simple introduction to Compile Scripts and Makefiles
Noah Mendelsohn

This note provides a brief introduction to the use of shell scripts, makefiles and other tools for building and testing your programs.

Table of Contents

  1. What is a script?
  2. Why use scripts to build your programs?
  3. What tools can you use to create your scripts?
  4. Using build scripts for testing and packaging
  5. Hints for writing and understanding compile scripts written in sh
    1. Finding documentation on sh
    2. Filename globbing
    3. Arguments to the script
    4. Back quotes
    5. Conditionals
    6. A realistic example
    7. The shebang #! convention
  6. Hints for writing makefiles
    1. Tabs and spaces
    2. Variables
    3. Targets and Dependencies
    4. Pattern rules and automatic variables
    5. A realistic example
  7. Build scripts as documentation
  8. APPENDIX I: A COMP 40 Compile Script
  9. APPENDIX II: A COMP 40 Makefile
  10. APPENDIX III: Additional reading

What is a script?

In computing, the term script refers to a program that invokes and coordinates the activities of other programs. An application build script is one that invokes compilers, linkers and other tools to compile and link your executable application. As we will see, some build tools are smart enough to understand which pieces of code depend on which others (e.g. which .c files include .h files) and to only recompile what's necessary when small changes are made.

Why use scripts to build your programs?

When you started programming in C++ or in C you probably learned to build your programs this way:

g++ -o myprog myprog.cpp
or
gcc -o myprog myprog.c

These commands use the Gnu compiler to compile a single source file into a runnable Linux executable. Directly calling the compiler this way is perfectly acceptable for simple programs but as you start building more complex programs a number of problems become obvious:

Application build scripts automate the process of building an application or other program and they can also automate related tasks such as running tests. Build scripts make it much easier to rebuild your program, and that in turn should encourage you to recompile and test frequently as you create your program. Build scripts also help other programmers learn how to recompile and link your code.

In COMP 40, we rely on compile scripts with a known interface to build your homework submissions. That is, when you submit your program you will include a build script, and it must be invokable in just the way we specify. Typically we require a shell script named compile which, when run with no arguments, builds all the code and applications we've asked you to submit. Our test scripts depend on that compile script to build your code for testing. (Some assignments may ask for compile scripts in other forms, such as using a makefile, as described below).

What tools can you use to build your scripts?

In principle, any programming language or tool that can invoke other command line programs can be used to create application build scripts. These include general purpose scripting languages such as Python, PERL and Ruby; shell languages like sh, bash, csh and ksh; and tools like make, mk, ant, raven, maven, etc. that are designed specifically for creating application build scripts. Of these many choices, there are two (or maybe three) that you should know about for COMP 40:

  1. Shell languages (e.g. sh)

    When you are typing on the console of your Unix login session you are talking to a program known as a shell. This is a program that reads what you type, does some work like expanding filename wildcards, and then invokes the programs you've asked to run. In fact, the shell languages are themselves small prorgramming lanuages; although you usually type one command at a time, you can actually use them to write little loops, if statements, etc. right from the console. The same shell command processors can be used to run shell programs stored in files, but the situation is confused somewhat by the fact that over the years a number of competing shells have been built, each with its own language.

    Regardless of the shell you're running to interpret your console commands, the tradition in COMP 40 is to write compile scripts using the /bin/sh shell language. A section below provides hints for writing and understanding compile scripts written in sh.

  2. make

    make is the most famous application build language. Although make is somewhat clumsy and ad-hoc in its syntax, it's very powerful and very widely used. If only for that reason, every good programmer should learn to use make.

    The key feature of make and of many other languages designed specifically for building applications is that it allows you to state which pieces of your program depend on which others. When you run make it checks the dates on all your files to determine what's been changed since the last time you built your application, and it rebuilds only files that depend, directly or indirectly, on changed files. You tell make how to do each step in the build process, e.g. how to build multiply.o from multiply.c, and make invokes only the steps that are needed.

    The rules for building each program or application are placed into a file known as a makefile. When you invoke make you can supply the name of the makefile to use:

                 make -f mymakefile myprog
        

    ...will use the makefile named mymakefile to build the program named myprog. If no makefile is named, make will try to find makefile or Makefile (in that order). The following will build myprog using one of those two makefiles.

                 make myprog
        

    When code is distributed for public use it's very common to find a makefile or Makefile along with the code. This is usually a signal that make is the tool to use for building that program.

  3. mk

    Even Stu Feldman, who many years ago created make, has acknowledged its problems. Among the tools he recommends as an alternative is mk (pronounced "muck"), which was created to support the experimental Plan 9 operating system at Bell Labs.

    The syntax of mk is arguably cleaner and easier to learn than that of make and mk has some powerful features that may be useful when you are building truly complex systems. Norman Ramsey gives a spirited defense of mk in How to Write Compile Scripts.

We may in some cases provide you with tools that are built with make or mk, and you are strongly advised to learn at least the basics of make. It's a skill that will serve you well in your career and one that potential employers may value. Some Hints for writing makefiles are also provided below.

mk is not nearly as widely used as the other tools discussed here but it's got a cleaner design than make and it also is the language we've used behind the scenes to build many of the COMP 40 course materials. You may occasionally see scripts with names like somescript.mk; that .mk suffix likely means that mk is the language being used.

Using build scripts for testing and packaging

Consider using make or other similar tools to integrate testing and packaging with the build process. For example, you might have a make target that does unit testing:

             make test-results

Write your makefile so that the target test-results depends on the executable program(s) being tested. Now if you change some source files and run the above command, make will automatically rebuild your application and immediately run tests on it. That can be a wonderful way to be sure that none of the changes you've made has broken anything that had been working. We strongly encourage you to build your test cases early in the development process and to retest your code after every significant change to the source.

In the same spirit, build targets can be used to invoke tools like valgrind:

             make valgrind-tests

You should have this target depend on an up-to-date build of your system, and then have it run one or more tests of your code using valgrind. Doing this will likely encourage you to run valgrind often, and to be sure that your code is up-to-date when you do.

If you're creating a system that will be distributed to others then a build target to create distribution packages can be very useful:

             make distribution

This would (if necessary) recompile and relink any code that's to be distributed to users in object form and then package the code, perhaps into a tar file, a zip file, or some other form that's convenient for distributing to others.

Of course, all these things can be done equally well with mk or with shell scripts.

Hints for writing and understanding compile scripts written in sh

Here are some hints that may be useful for writing and understanding compile scripts in sh:

Finding documentation on sh

The first place to look for documentation on sh is in the man pages:

man sh

Sometimes doing this gives you the man page for an implementation called bash and indeed on our Tufts CS systems /bin/sh is just a link to /bin/bash; bash has some extensions, but for most of what you'll doing doing sh and bash are compatible. So, either man page is good enough for COMP 40 purposes. In any case, the man pages document the complete syntax of the language and give some instructions for using it. There are also lots of sh and bash tutorials on the Web.

Filename globbing

As we've seen, sh (like bash, csh, tcsh, ksh, etc.) is a language that's designed to interpret commands at an interactive command prompt. One of the services it provides is globbing, in which strings containing characters like * are replaced with a list of filenames matching the expression:

   ls mult*.c    # all files starting with "mult" and ending with .c
   cat [ab]*     # all files starting with "a" or "b"

This is done by replacing the expressions with lists of filenames that become the arguments to commands. So, the second line above might actually turn into the command:

   cat albert anthony.c bertha birtday.html bigdeal.cpp

...if those files happened to be in the current directory. Because sh runs your commands the same way when they come from a script file as from the console, the same globbing is done on every command in your scripts!

Arguments to the script

Your compile script is itself a program that will be invoked (most likely) from the command line. So, your script can be given arguments.

compile -link fgroups

has two arguments: -link and fgroups. You will often see within the compile script code that accesses these parameters. For example:

echo $2

would call the echo command, passing it the second parameter (fgroups for the command above). You can also find out how many parameters there are:

echo $#

echos the number "2", because the script was passed two parameters. You can also work with all the parameters together. If the number of arguments is nargs then:

echo $@    

...is the same as:

echo $1 $2 ... $nargs   

Surprisingly, it's also possible to update the argument list. For example:

shift
echo $@

This would echo only "fgroups", because the shift command shifts all the arguments to the left, and thus "loses" the "-unlink" argument. Such shifting is very commonly done to eliminate arguments such as switches after they have been processed by the script.

set *.c

Somewhat confusingly, the set command completely replaces the command line arguments! After the above command, the original arguments (-link fgroups) are gone, replaced with a list of all .c files. Specifically, globbing is done on *.c, so the set command is given a list of all .c files in the current directory. The set command then sets the argument list to that list of files.

One of the reasons scripts update the argument list is that some important internal script commands default to operating on the argument list:

for cfile 
do
  gcc -c $cfile
done

There are at least two important things to learn from the above example: the for loop iterates over the arguments in $@ (which, after the earlier set command is a list of all the user's .c files). Also, shell scripts have named variables that can be substituted into commands using the $variable_name syntax. So, the above command loops through all the files named in the argument list and calls gcc on each one.

It's also common to set variable names using the = operator. These lines of code are taken from a COMP 40 build script:

CC=gcc

CFLAGS="-I. -I/comp/40/include $CIIFLAGS"
FLAGS="-g -O -Wall -Wextra -Werror -Wfatal-errors -std=c99 -pedantic"

for cfile 
do
  $CC $FLAGS $CFLAGS -c $cfile
done

You should now be more or less able to understand what commands are executed by that code.

Back quotes

One thing missing from the above is the assignment to $CIIFLAGS. In the actual script, that turns out to use a feature you haven't seen:

CIIFLAGS=`pkg-config --cflags cii40`

Note carefully the backquotes ` surrounding the pkg-config command on the right. Also note that pkg-config has nothing to do with the shell language itself: it's an ordinary command just like gcc, cat or ls.

When a command in a script is in backquotes, the command is run and the output of the command is substituted in place of the quoted command text. So, in this example, the pkg-config command is run with the arguments --cflags cii40; the output of that pkg-config command is assigned to the shell variable CIIFLAGS. (To learn what pkg-config does, try man pkg-config, or else you can just guess that it probably figures out some switches we'll want to pass to the compiler when building something to do with "cii40"...which turns out to be the package name for the Hanson code we use in COMP 40!)

Conditionals

sh has if statements, for loops and many other features of traditional imperative programming languages. You'll need to check the man page for details, but you'll often be able to guess from the syntax what these constructs are doing. One feature for which sh uses a somewhat unusual syntax is conditionals like these::

if [ $linked = no ]; then
  # do something here if variable linked is equal to "no"
fi 

An expression in square brackets is called a conditional, and it evaluates to a boolean true or false. As you can see from the example above, string literals like no need not be quoted unless they contain spaces; variable references must begin with a dollar sign $.

Although the expression in brackets may look like part of the if statement syntax, it's not; the same conditional syntax can be used elsewhere, for example:

[ -n "$2" ] || { echo "You need a second argument to this script" >&2; exit 1; }

Note that the above is of the form: a || b. The expression a, which is the conditional [-n "$2"] is evaluated; if it's false, then expression b on the right is evaluated. That happens to be a compound statement of the form: {stmt1; stmt2;}.

The key to understanding the above example is that [ -n "$2"] is a conditional that tests the length of a string to ensure it's not empty. So, this example writes an error message to standard error only if there is no second argument to the script.

Many conditionals test the existence or nature of named files. for example:


[-e "/tmp/somefile"]    # True if /tmp/somefile exists
[-f "/tmp/somefile"]    # True if /tmp/somefile exists and is a file (not dir, etc)
[-d "/tmp/somedir"]     # True if /tmp/somedir is a directory

There are many, many other useful conditionals. See the man page for details.

A realistic example

You don't have to understand all of the above in detail on day 1 of COMP 40 but you should rapidly develop an intuition about which parts of each compile script are doing what. That way you will be able to make small modifications to the scripts and to debug them when they break.

Now, take a look at the compile script for a recent version of COMP 40 HW assignments 1; with luck, you'll be able to understand most of what it's doing (several of the fragments above are taken from that script).

The shebang #! convention

This is a good time to learn a Unix scripting techinque that's illustrated on the first line of the sample compile script, which says:

#!/bin/sh

It starts with a #, so the script processor considers it a comment, but it serves a very specific purpose. There are actually two ways you could run this script:

  1. sh compile -link fgroups: when you do it this way, that first line doesn't matter. You're telling the system to invoke the sh shell and your giving it the name of the script to execute. Whoever does this needs to know that it's an sh script.
  2. ./compile -link fgroups: done this way, we're treating the script itself as an executable command, but how does Unix know whether to run this using sh (or maybe Python or csh?). That's what that first "shebang" line is for: when you invoke a command that appears to be a text file, Unix (or Linux) looks at the first line. If it starts with:
    #!...some executable name here
    
    Then the named executable (in our case sh) is run and passed the file as a first argument!

So, we've been able to use our script as if it was a new command named compile (assuming that's the filename of the script). There is one more detail: for this to work you'll need to give your file execute permissions. A likely way to do that is:

chmod u+x compile

which uses the chmod Unix command to indicate that the user (that's the u) who owns the file is to have execute (x) permissions. Of course, you should see man chmod for more information on setting the permission mode.

Oh, and why is it called the "shebang" convention? The "#" character is the "sharp sign" or "hash" (or "number sign"), and the "!" character programmers call "bang". So, shebang. You'll often hear a programmer who's reading a string to another programmer pronounce it in this style: "hash bang slash bin slash usr", etc.

(By the way, if you're not bored yet, you'll surely want to know that the etyomology of the word shebang is surprisingly varied and ambiguous...and if that's not sufficiently obscure detail, the # sign is also known as the octothorpe!)

Hints for writing makefiles

There are many make tutorials and sample makefiles available on the Web. Of course, you should also do man make, and also consult the extensive reference manual for GNU Make (do try to avoid using GNU extensions unless needed). Here we give a short introduction to what make can do.

Tabs and spaces

One important warning about make: tabs and spaces aren't interchangeable!. When you see a rule like this in a makefile:

brightness.o: brightness.c
	gcc -o brightness brightness.c

there is a tab character (not spaces!) ahead of the command "gcc" on the second line. If your editor has a feature that replaces tabs with spaces, you'll have to turn that off when editing makefiles! (This is just an example of why people complain about make. Still, it's a very powerful tool that can do very useful things for you.)

Variables

Like shell scripts, makefiles allow you to set and substitute named variables:

EXECUTABLES = brightness fgroups

all: $(EXECUTABLES)
is more or less equivalent to:
EXECUTABLES = brightness fgroups

all: brightness fgroups

Targets and Dependencies

Consider again that same fragment:

all: brightness fgroups

This tells make that if the user says:

make all

...then it's necessary to ensure that fgroups and brightness are up to date. We say that the target all depends on fgroups and brightness. But how can the make system know whether those program(s) actually need to be rebuilt? Somewhere in the makefile should be another rule telling what those two programs depend on, and how to build them. A very simple rule might be:

brightness: brightness.o
	gcc -o brightness brightness.o 

which says to build brightness by linking brightness.o. Note that each rule starts with a target name in the left margin, followed by a colon, and optionally followed by any dependencies. If there are one or more commands needed to build the target, they are listed below, with tabs starting each such line.

Of course, we then need a rule for being sure that brightness.o is up to date, and telling us how to build that:

brightness.o: brightness.c
	gcc -c brightness.c

So, what happens if we issue the command "make brightness"? Make starts by building a tree of dependencies. What do we need to build brightness? Answer: an up-to-date brightness.o. What do we need to build that? There's another rule that tells us, and so on. After building the whole tree of dependencies the system will decide that the first check to be made is whether brightness.c been changed since brightness.o was last built. and if so that code will be recompiled. Then, if the brightness exectuable is older than the (possibly recompiled) brightness.o, the executable is relinked. In short, only necessary work is done.

Note that, unlike a script, a makefile is not a list of steps to be performed in order; it's a set of rules, and each rule is considered as needed according to the circumstance. Rules are chosen by pattern matching on filenames and by checking whether any dependencies must be rebuilt. Once a rule is chosen, there may indeed be one or more steps (actually zero or more) specified to build the target.

Pattern rules and automatic variables

Make has many many other sophisticated features, but there's one more you should learn early. Consider this fragment:

# 
#    To get any .o, compile the corresponding .c
#
%.o:%.c 
	gcc $(FLAGS) $(CFLAGS) -c $<

Our first approach required a separate rule for each .o, but all of those rules look almost the same. The fragment above uses what make calls pattern rules (e.g. %.c) and automatic variables (e.g. $<). You'll have to read the GNU make documentation for all the details, but what the fragment above says is:

So, a single rule suffices to build a .o file from any corresponding .c file. Also shown in the gcc command line is the substitution of two variables FLAGS and CFLAGS that we can assume have been set with the compiler options that we're using on this project. We can thus ensure that the same compiler options are automatically applied to each compilation. Obviously we can (and should) do something similar on the commands that link the executables.

A realistic example

Appendix II contains a makefile that builds HW assignments 1. With the hints given above you should be able to figure out most of what it's doing.

Right now, the COMP 40 homework submission process expects a compile script not a makefile; you should be sure to provide a working compile script or else your homework may not be graded. Still it's good to know how makefiles work, and we may sometime soon switch to using them for some assignments.

Build scripts as documentation

Like any other code you write, your build scripts and makefiles should be designed as documentation to be read by other programmers. Especially if you use languages like make or mk, your build scripts formally document which pieces of your system depend on which others. Of course, your scripts also document the compiler switches and other settings used.

Make sure your build scripts are cleanly structured and easy to read. When it's not obvious from the code, include comments explaining what various build targets do, and any other information that might be needed by someone maintaining or modifying your code. Be especially careful to document non-obvious dependencies on tools or libraries, e.g.:

# This build runs against the grapics library 
# version 2.8, which is current as of July, 2013.  The 
# program depends on features such as double buffering
# introduced in v2.8.  
# NEEDSWORK We should add an automatic configuration 
# test to ensure double buffering is available.

APPENDIX I: A COMP 40 Compile Script

This is the compile script for a recent version of COMP 40 HW 1. That assignment requires the development of two executables, named brightness and fgroups

#!/bin/sh
#########################################################
#                     compile
#
#     Compiles all .c files and then links brightness and/or
#     fgroups, the two programs required for the comp40
#     intro assignment.
#
#     Options:
#        -nolink          #just compile, don't link
#        -link exe_name   # name of executable to build,
#                         # e.g. fgroups
#        -link all        # build all executables (default)
#
#     Note that this script supports use of the comp 40
#     versions of Hansons C Interfaces and Implementations
#
#########################################################

#########################################################
#                         Setup
#########################################################

set -e    # halt on first error

# check command line parameters

link=all  # link all binaries by default
linked=no # track whether we linked

case $1 in  
  -nolink) link=none ; shift ;;  # don't link
  -link)   [ -n "$2" ] || { echo "You need to say *what* to link" >&2; exit 1; }
           link="$2" ; shift ; shift ;;  # link only one binary
esac

# Choose compilers and set compiler flags

# use 'gcc' as the C compiler (at home, you could try 'clang')
CC=gcc

#  Use the pkg-config utiltiy to get the correct include file flags
#  (-I) and library search flags (-L and -l) for the COMP 40 version
#  of Hanson's "C Interfaces and Implementations" (the package called
#  cii40).
#
CIIFLAGS=`pkg-config --cflags cii40`
CIILIBS=`pkg-config --libs cii40`

# the next three lines enable you to compile and link against 
# course software by setting the compiler search path for 
# includes of .h files (the -I switch) and the search
# path for libraries containing .o files (-L and -l)
#
CFLAGS="-I. -I/comp/40/include $CIIFLAGS"
LIBS="$CIILIBS -lm"    # might add more libraries for some projects
LFLAGS="-L/comp/40/lib64"

# these flags max out warnings and debug info
FLAGS="-g -O -Wall -Wextra -Werror -Wfatal-errors -std=c99 -pedantic"

#########################################################
#     Clean out old object files and compile everything
#########################################################

rm -f *.o  # make sure no object files are left hanging around

case $# in
  0) set *.c ;; # if no args are given, compile all .c files
esac

# compile each argument to a .o file
for cfile 
do
  $CC $FLAGS $CFLAGS -c $cfile
done

#########################################################
#     Link the .o files and libraries to create an
#     executable program
#########################################################

# One case statement per exectuble binary

case $link in
  all|brightness) $CC $FLAGS $LFLAGS -o brightness brightness.o -lpnmrdr $LIBS 
                  linked=yes ;;
esac

case $link in
  all|fgroups)    $CC $FLAGS $LFLAGS -o fgroups    fgroups.o             $LIBS 
                  linked=yes ;;
esac

# put out error msg if asked to link something we didn't recognize
if [ $linked = no ]; then
  case $link in  # if the -link option makes no sense, complain 
    none) ;; # OK, do nothing
    *) echo "`basename $0`: don't know how to link $link" 1>&2 ; exit 1 ;;
  esac
fi

APPENDIX II: A COMP 40 Makefile

The following makefile is more or less equivalent to the compile script shown above in Appendex I. Like that script, this makefile defaults to building the brightness and fgroups executables. So, if we assume this file is called Makefile, here are some commands we could issue:

make all          # builds fgroups and brightness and any needed .o files
make              # same as "make all" because all is the default target
make brightness   # builds brightness, and any .o files that needs
make fgroups.o    # recompiles fgroups.o only if fgroups.c is changed
make clean        # remove all .o files and executables fgroups and brightness

Here is the sample makefile:

# 	         Makefile for COMP 40 Homework 1
#
#     Author: Noah Mendelsohn (adapted from Norman Ramsey's compile script)
#
#  Maintenance targets:
#
#
#    all         - (default target) make sure everything's compiled
#    clean       - clean out all compiled object and executable files
#    brightness  - compile just the brightness program
#    fgroups     - compile just the fingerprint group program.
#
#

# Executables to built using "make all"

EXECUTABLES = brightness fgroups

#
#  The following is a compromise. You MUST list all your .h files here.
#  If any .h file changes, all .c files will be recompiled. To do better,
#  we could make an explicit target for each .o, naming only the .h
#  files it really uses.
#
# Add your own .h files to the right side of the assingment below.

INCLUDES = 

# Do all C compies with gcc (at home you could try clang)
GCC = gcc

# Comp 40 directory

COMP40 = /comp/40

# the next two lines enable you to compile and link against CII40
CIIFLAGS = `pkg-config --cflags cii40`
CIILIBS = `pkg-config --libs cii40`

# the next three lines enable you to compile and link against course software
CFLAGS = -I. -I$(COMP40)/include $(CIIFLAGS)
LIBS = $(CIILIBS) -lm    
LFLAGS = -L$(COMP40)/lib64

# these flags max out warnings and debug info
FLAGS = -g -O -Wall -Wextra -Werror -Wfatal-errors -std=c99 -pedantic

# 
#    'make all' will build all executables
#
#    Note that "all" is the default target that make will build
#    if nothing is specifically requested
#
all: $(EXECUTABLES)

# 
#    'make clean' will remove all object and executable files
#
clean:
	rm -f $(EXECUTABLES) *.o


# 
#    To get any .o, compile the corresponding .c
#
%.o:%.c $(INCLUDES)
	$(GCC) $(FLAGS) $(CFLAGS) -c $<

#
# Individual executables
#
#    Each executable depends on one or more .o files.
#    Those .o files are linked together to build the corresponding
#    executable.
#
brightness: brightness.o
	$(GCC) $(FLAGS) $(LFLAGS) -o brightness brightness.o -lpnmrdr $(LIBS)

fgroups: fgroups.o
	$(GCC) $(FLAGS) $(LFLAGS) -o fgroups  fgroups.o $(LIBS)

echo:
	echo "$(CIIFLAGS)"

APPENDIX III: Additional reading

Professor Norman Ramsey offers a more extensive explanation of How to Write Compile Scripts. It will give you much more information about pros and cons of building scripts in different ways. This note should be enough to get you started quickly.
Author: Noah Mendelsohn
Last Modified: 3 August 2014