COMP40 Assignment: Analysis of AMD64 Assembly Code
(Defusing a Binary Bomb)

COMP 40 Assignment: Analysis of AMD64 Assembly Code

Assignment due Tuesday, November 1 at 11:59 PM.

Table of Contents

Purpose and overview

The purpose of this assignment is to take the first step toward one of the central goals of this course: to acquire a working knowledge of the AMD64 instruction-set architecture. [Now that Intel also supports AMD64, it is more officially known as \texttt{x86\_64}, but certain old dogs have trouble learning new tricks, not to mention that credit should be given where credit is due. Down with revisionist history!] To that end you will read, analyze, and understand a half-dozen or so C procedures given access only to an executable binary. An important secondary objective is for you to learn to debug programs using the Data Display Debugger (DDD).

Fundamental questions about assembly code include

The homework requires analysis of assembly-language programs. Important questions include

You'll mix three kind of activities:

You'll hand in a written report of your analysis; your work on defusing the bomb is reported automatically.

Making a game of it: the binary bomb

A ``binary bomb'' is a program that consists of a sequence of phases, each of which can be defused by the proper input. In a Comp 40 bomb, each phase expects one line on standard input; if the phase receives the input it expects, it is defused, and you continue to the next phase. If the phase receives any other input, the bomb explodes: it prints

BOOM!!!
The bomb has blown up.
Worse, it sends a message to the class ``bomb server,'' and you you lose 1/40 of the available credit for that phase. Be careful!

The bomb is divided into six phases. If you defuse a phase successfully, you will receive either full credit (if your bomb has never blown up) or partial credit (if your bomb blew up at any point). If you fail to defuse a phase, you receive no credit for defusing. The phases get progressively harder to defuse, but the increasing difficulty should be offset by the expertise you gain as you move from phase to phase. The final phase will challenge even the best students—don't start at the last minute.

We've doctored the bomb to help you avoid repetetive work and slips of the finger. First, the bomb ignores blank input lines. Second, if you run your bomb with a command-line argument, for example,

 
  ./bomb inputs1-3.txt
then the bomb reads input lines from inputs1-3.txt, which might contain the inputs expected by phases 1 to 3. Afterward, when the bomb reaches end of file on inputs1-3.txt, it switches to reading standard input. Use this feature to your advantage by putting correct phrases into a file!

How to get a bomb

On this assignment you are encouraged to work in pairs. Each pair should use one and only one bomb. Bombs are available from http://bomb.cs.tufts.edu:54321 (think countdown). This URL works only from inside the firewall; if you are not in the lab, ssh to linux and download your bomb with elinks or w3m. Each bomb is an executable binary which has been compiled from C; it is packaged in a tar file which can be unpacked with, e.g.,

   tar xvf bomb4.tar
Save the bombk.tar file to the directory in which you plan to do your work. Unpack the tar file by
  tar xvf bombk.tar
This command will create a directory called ./bombk with three files: If you make a mistake, such as losing your bomb or typing in the wrong group members, request another bomb.

Preparing to defuse your bomb

Run objdump -d bomb to get a printout of the assembly code of your bomb. You'll be especially interested in functions main (to which you have the source) and functions phase_1 through phase_6. You will also be interested in other functions, but anything before main in the printout is probably not interesting.

You must defuse the bomb on linux.cs.tufts.edu or one of the machines in Halligan 116, 118, or 120. If you wish to work remotely, you will need either an X windows client or a VNC client. If you are not already familiar with X clients and ssh forwarding, I recommend you use a VNC client, available from http://www.realvnc.com. Information about how to set up VNC is available at http://www.cs.tufts.edu/comp/40/howto-vnc/, with emphasis on using the Windows PuTTY client to get through the firewall. If you are using Linux you can SSH tunnel by using the -via option on the VNC viewer.

Connect to a class server, launch the Data Display Debugger (DDD) on your bomb, and start stepping through the code.

Defusing the bomb

You will need access to volumes of information, including

Please see the Resources section below. There is too much information, but the critical parts are As for the rest, you are not expected to understand much of it—just use it judiciously to get through your bomb.

Once you are comfortably seated on this indigestible brick of information, launch DDD and start single-stepping. Key skills are

Using the debugger is a critical skill; time invested now will be repaid manyfold throughout your career.

What we expect from you

Results from defusing the bomb are automatically sent to your instructor, so there is no need to hand them in. You should validate your results by checking the page at

http://www.cs.tufts.edu/comp/40/bombstats.html
This web page is updated frequently and shows everyone's progress.

In addition to defusing the bomb, we expect you to write down the results of your analysis:

We therefore expect you to submit the following:

  1. A README file which

  2. A defuse file that gives the input lines needed to defuse your bomb (for as many phases as you have successfully defused).

  3. At your option, a file pictures.pdf containing diagrams that support your README file

  4. A file phases.c which gives your notion of C code that is equivalent to the machine code in Phases 1 through 6
Hand-drawn pictures can be scanned into PDF using the machine in Halligan 102.

Resources

Tools

The Data Display Debugger

The Data Display Debugger is actually a user interface that operates on top of gdb. Here are some things to know:

The full documentation for DDD may be found at http://www.gnu.org/manual/ddd/. Our class lab on DDD highlights some especially useful commands.

The GNU debugger

DDD is actually ``just'' a user interface. To control the underlying program, DDD, relies on the the GNU debugger, gdb. gdb is a command-line debugger that supports a handful of languages and a great many target machines. With luck, you'll avoid most of gdb, but you'll want to keep in mind that the gdb window at the bottom of the DDD interface allows you to interact directly with gdb. All of DDD's abilities to single step, show data, set breakpoints, and so on, are also present in gdb.

Full documentation for gdb can be found at http://sourceware.org/gdb/documentation. It is probably more useful to browse the textbook site at http://csapp.cs.cmu.edu/public/students.html, which has a number of gdb resources. In particular, the ``Quick GDB x86-64 reference'' contains a very short summary of commands, most of which are in common with DDD. This is the place to learn what ``Stepi'' and ``Nexti'' actually mean.

The objdump program

The command objdump -d is invaluable to print assembly language for all of the code in the bomb. You can also just look at individual functions. Having a printed copy of the assembly code, which you can write on, is an invaluable tool to keep track of your growing understanding of the bomb.

The nm program

Running nm -p bomb will show all the names defined and used in the bomb. These names include all of the bomb's functions and global variables as well as all the functions the bomb calls. You may learn something by looking at the function names!

You'll also see many undefined names which should look vaguely familiar; they refer to names defined in the GNU C Library. This library is loaded dynamically, after the bomb starts. (Dynamic loading is also responsible for most of the junk between _init and _start. These are functions that are called from the bomb, which in turn make indirect calls through something called the Global Offset Table. At this point, sensible people run screaming from the room.)

For more information, some people like objdump -t, but I prefer the simplicity of nm. Check out the man page!

The lowly but useful strings program

It won't get you far, but

  strings bomb
will display the printable strings in your bomb. Sometimes strings squeezes a surprising amount of information out of a reticent program. If nothing else, strings will teach you not to store your passwords in the clear!

Documentation

You may find the following documentation useful:

How to analyze assembly code

The primary technique involved is to maintain at each program point an account of the contents of machine registers and the procedure's stack frame. [You are not yet expected to know what a stack frame is; it is covered in your book and will be covered in class.] Registers and stack slots that you don't care about can be omitted from description or can be written using the ``don't care'' value _|_ (pronounced ``undefined''). The technique is best applied both backwards and forwards:

Backwards analysis is what Sherlock Holmes used to love: reasoning from effects to what causes must produce those effects. But to validate your reasoning, you'll want to indulge in a little forwards analysis. This is sometimes called ``symbolic execution.'' You can do forward analysis in your head, but it's much easier to let DDD do it for you. You can set a breakpoint anywhere you like, use DDD to change the values in registers and memory, then single-step forward to see what happens. Just don't go too far, or you might blow yourself up!

Acknowledgements

The ``binary bomb'' idea and the associated code were developed by David O'Hallaron, who has a twisted sense of humor. [Terry Pratchett helped with the footnotes.] Many of the informational resources were suggested by Soha Hassoun. Backwards dataflow analysis was invented by Sherlock Holmes, who was invented by Arthur Conan Doyle.