Lecture notes for COMP 40

Table of Contents

9 Sep 2009: Intro to Comp 40

Blackboard: Machine structure and assembly-language programming

Introduce course staff


Blackboard: Machine-level programming

But also

Back to the machine level

Who has an idea what abstraction is?

Answers

We will see lots of abstraction, but let's talk about the machine level:


How good is this abstraction?


So (if it can't even do arithmetic) what's so special about the machine level of abstraction?

A problem: AMD64 is too complex to be mastered in one semester. So, you'll learn to analyze AMD64 code, but you won't write much. To understand assembly code and machine instructions in depth, later in the term we'll work with a virtual machine. Most students find this pretty interesting.

Representation is at the heart of programming: Let's consider data and the machine

Group Exercise: Ways to represent data in C++


Followup on blackboard:

14 Sep 2009: Programming with interfaces and implementations

Admin:

Syllabus sketch:

Followup from last class

My question about data was

In C, the answers are driven firmly by questions of representation:

The high points:

We'll hit the details


I asked about the problem with the Hanson book, and evidently the instructor in the summer course made it recommended rather than required. I'm very sorry for the nuisance.

Hanson's library deconstructed:

High-level abstractions: sequence, set, finite map, integers, threads

Programming with abstractions

Let's hear about data structures you used in COMP 15, but I want the user's-eye view from the world of ideas, not the implementor's view from the world of code.

The many flavors of sequences


Group exercise (skipped):

The Table interface

Detailed analysis of the Table interface, with special attention to void type and C-style polymorphism.

Start with parametric polymorphism:

And now, 16 variations on a theme (computer demo)

High points:

16 Sep 2009: Interface analysis continued; invariants

Admin:

Where are we and where are we going?

Orientation:

  1. Learning how to think in terms of big building blocks for software

  2. Soon we'll be creating new building blocks

  3. Then we'll use these skills to tackle interesting problems in machine-level programming

Why:

Syllabus sketch:

Invariants

Picture:

World of Ideas
      ^
      |
      V
the black box    <-------- invariants

Important bits of thinking:

An invariant

A representation invariant for an abstract data type

Invariants are usually just about the implementation, but the idea of invariant property is also useful when relating code to the world of ideas.

Example invariants

What are the invariants for

Which functions rely on these invariants?

Return to analysis of Table_T in 16 slides (computer demo)

High points:

Preparation for arrays

Group Exercise: what is a variable?

The Array interface

Classification of functions (works for any abstraction):

  1. Creator
  2. Destroyer (needed only in C/C++)
  3. Observer

and sometimes

  1. Mutator (mutable abstractions only)
  2. Producer (give me one, make a new one)

18 Sep 2009: Lab: Points, lines, and planes

At end of lab, don't forget to pick up

Procedure for today (walk students through)

  1. Work in groups of four
  2. Split across two rooms evenly
  3. Identify a record keeper, who logs in
  4. Copy question file locally
  5. Discuss answers within group and type in (30 to 40 minutes)
  6. Submit answers
  7. Full discussion in each room (reaching consensus)
  8. Instructor's perspective

21 Sep 2009: Entering the world of design

p9213554 p9213555 p9213556 p9213557 p9213558 p9213559

Who of those present are engineers? Who are mathematicians?

Alert! We are entering realm of design

Design tasks in your homework:

  1. Design an interface for two-dimensional arrays, including a contract for each function, as on your design checklist
  2. Design an interface for two-dimensional bitmap images
  3. Design and build implementations, including invariants and how invariants help functions meet their contracts
  4. (Minor design content in Sudoku problem: what's the repeated work? Can you find a way to abstract over it?)
  5. Major design content in removing black edges:

    Topic of Friday's lab.

This week's lectures:

Administrative things it is needful to know

Don't forget to look at the course web pages. Details and Grading are important! Also lots of technical goodness there for you.

It's your job to get to know the faculty. 5% of your course grade is to visit me at office hours. 11 students have already done so. Some of you have conflicts; make an appointment.

Pair programming

Homework

Lateness and extension tokens

Grades

Our evaluation serves two purposes:

  1. So that you know how you are doing, you get a grade.

    Grades range from No Credit up to Excellent. Consistent Very Good grades represent A work; for more detail, see the web pages.

  2. So that you know how to improve, you get comments on your work.

    At Tufts we have a saying: you don't have to be bad to get better. Our job is to help you learn as much as you can and to get as good as you can.

Grades also contribute to your final course letter grade:

  1. Written assignments 55%
  2. Exams 30%
  3. Participation in class 10%
  4. Office visit 5%

Group exercise: Function pointers

How many names are in the table?

Review of Friday's lab

Overview:

Polymorphism:

Polymorphic containers: pointlike, linelike, and planelike

Not everyone grasps deep connection between pointers and variables - a variable is a container that has a name - a pointer points to a container, probably anonymous

23 Sep 2009: Anatomy of an abstraction: Hanson's arrays

p9233560 p9233561 p9233562 p9233563 p9233564 p9233565 p9233566 p9233567 p9233568 p9233569 p9233570 p9233571 p9233572 p9233573 p9233574 p9233575 p9233576 p9233577 p9233578 p9233579

Modularity

Modular metaphor---picture the boxes and the functions

Abstraction is a tool by which we can achieve modularity

As clients, do we open the box?

Application: Of the code you wrote for the previous homework, what is most difficult to understand?

Casting

Casting is dangerous because the compiler trusts you implicitly

Example:

FILE *fp = fopen(argv[1], "r");
...
fclose((FILE *)fp);

What's wrong with this picture?

What if the truth changes? Becomes

const char *fp = argv[1];
...
fclose((FILE *)fp);

Limit casting to the essentials!

Only these casts are acceptable:

Every cast must be justified in writing. Unjustified casts will be downgraded. In particular,

Review of Friday's lab, continued

Polymorphic containers: pointlike, linelike, and planelike

Review: - a variable is a container that has a name - a pointer points to a container, probably anonymous

Given any variable, you can create a pointer that refers to the same container:

int n;
sscanf(s, "%d", &n);  // read number, place in location pointed to

Address of a local variable is safe only because the pointer is not used after the death of the procedure. We say "the pointer does not escape."

Converse not so: some locations pointed to aren't named by any variable:

int *p = malloc(sizeof(*p));

Why sizeof?

What about polymorphism?

Much confusion on this point. Sample answers from the same group:

What is said of each is true and is also true of the other

Another point of confusion: variables need not be temporary! Global variables, for example, live forever. You can decide how long a container lives independent of whether you refer to the container using a variable or a pointer.

Lesson:

A variable x and pointer location *p refer to exactly the same kind of thing—a container—and they have the same properties in common

Abstractions:

In the world of code, what would be a good representation of a single array element?

Digression: know when a narrative is not called for!

to access the rep, you begin at first and skip over i elements

A good representation is not the same as how to find

I'd prefer an answer that says "we don't know what you mean by representation, but here's how to find it".

It ain't what you don't know that gets you into trouble. It's what you know that ain't so." ---Mark Twain

2D arrays from 1D arrays

Analysis and design of arrays (lab report)

The Array interface

Array slides:

  1. The whole interface

  2. Classification as 'creator', 'destroyer', 'observer', 'mutator', 'producer'

  3. What does the void * point to?

  4. How to mutate without losing object identity

  5. Representation exposed!

  6. Contracts

Array_put and Array_get and the pitfalls of pointers

Making a good design better with mutable locations

28 Sep 2009: Anatomy of an Abstraction, continued

p9283581 p9283582 p9283583

Preliminaries

Homeworks to be returned; fgroups to be graded by Wednesday

Concatenating literals

Refer to enumeration literals by name!

Use sizeof and NELEMS

If you can use a for loop, use a for loop. while is for loops that don't fit the for model.

Review

Learning to program with abstraction and invariants so that we can tackle interesting problems instead of boring problems.

Most likely source of bugs in your code:

The design checklist will help you debug faster

Today's class: Invariants, contracts, and implementation

Wednesday: compiling and linking, starting to look at machine structure

Once again, Hanson's arrays

30 Sep 2009: Transition --- Compiling, linking, testing, libraries

p9303584 p9303585 p9303586 p9303587 p9303588 p9303589 p9303590 p9303591 p9303592 p9303593 p9303594 p9303595 p9303596 p9303597 p9303598 p9303599

Review of Monday's class exercise

Correct one-to-one mapping, test proposals (correct 2D array, two passes):

Correct ownnership:

Retrospective and prospective

3 weeks since first day of class:

You have built a lot of skills:

Some people call these methods object-oriented design, but because everybody wants to be called "object-oriented", the world looks disorderly. Don't believe everything you see on Wikipedia.

You will continue to practice and build on these skills, but starting next week, we will focus more on the machine structure and what's happening at that level.

Today, we're going to talk in detail about the concepts underlying compiling and linking. On demand we may look at compile scripts.

From source code to a running process image: compiling, linking, and libraries

Your best source to read about this material is my Short Guide to Working With Unix Libraries.

Questions:

  1. Can we create a program just from Array2? What would that mean?

  2. Where does every C program (and C++ start executing)?

  3. How would we run the unit tests we talked about at the beginning of class?

5 Oct 2009: From Newton to Einstein---locality and the costs of storage

pa053600 pa053601 pa053602 pa053603 pa053604 pa053605 pa053606 pa053607 pa053608 pa053609 pa053610 pa053611 pa053612 pa053613 pa053614 pa053615 pa053616

Announcements:

1st homework deadline remains 10/8 (Thursday).

2nd deadline extended to 10/13 (Tuesday).

Homework:

I know the .i files are new technology. A couple of points:

One other point about reading PNM files:

Questions?


Storage

Newtonian (classical) picture of storage:

Classical picture works in everyday cases, but breaks down at the extremes.

To get from Newton to Einstein, we need to know more about storage


A register is a one-word container

It is impossible to create a pointer to a register

New terms of art (for language designers and compiler writers)

Very occasionally you'll hear a location called a "mutable cell".

This distinction is important because we need to understand the difference between a location (which a program can assign to) an a value (which cannot be assigned to).


Address space is a sequence of 2-raised-to-64 one-byte containers (but only 2-raised-to-48 are usable on today's machines)

Address-space layout (software convention)

+-------------+
| OS reserved |   
+-------------+
|   ...       |
|   Dynamic   |
+-------------+
|    Stack    |
|   ...       |
|  (unmapped) |
|   ...       |
|  heap       |
+-------------+
|    data     |
+-------------+
|    text     |
+-------------+
| unmapped    |
+-------------+

Wait a minute! There's not enough memory!

Fast is small; big is slow

Registers < 300B

L1 cache from 8KB to 128KB

L2 cache 512KB to 8MB

L3 cache 2-6MB

Note on the numbers:

07 Oct 2009: The how and why of caching

pa073617 pa073618 pa073619 pa073620 pa073621 pa073622 pa073623 pa073624 pa073625 pa073626 pa073627 pa073628 pa073629 pa073630 pa073631

Review

Why caching works

Programs show a property called locality

Idea: keep recently used bytes/words and nearby words in the cache

How caches work

Sets of fast containers (not as fast as registers; faster than memory)

Two new terms of art (Bryant and O'Hallaron, page 506):

Today's machines have a line size of 64B (8 words).

Example: load a word

Phenom 9850 has very large 128KB L1 cache

2048 cache lines

To deliver two words per cycle, cannot possibly search all 2048 lines. Therefore lines are divided into sets.

Search through sets is done by very simple hardware

Phenom 9850 uses 2 lines per set, so 1024 sets

  1. Split address into pieces (see Bryant and O'Hallaron, page 488)

     64                           16              6      0
       +--------------------------------------------------+
       |  tag                       |  set index   |offset|
       +--------------------------------------------------+
    
  2. A set holds two lines:

  3. Cache-search algorithm:

    1. Take 10-bit set index from address, choose a 2-line set

    2. In parallel, compare tags and check valid bits

    3. If tags match and line is valid, use offset to index into block

    4. Otherwise, we have a cache miss

What happens on a cache miss

Go to next level down and get a block

Find a place to put the block in the cache:

  1. Determine which set by looking at bits of address

  2. Within set, is any line invalid?

Eviction:

Question: is there a scenario in which the program uses just 3 bytes in memory, but they cannot all fit in L1 cache?

13 October 2009: Split caches, prediction hardware; arithmetic

pa133632 pa133633 pa133634 pa133635 pa133636 pa133637 pa133638 pa133639 pa133640 pa133641 pa133642 pa133643 pa133644

Compile scripts

A couple of points people are not getting:

Also, people are unclear on the role of the compile script:

Homework invariants

Ambitious design: one-to-one mapping from blocked array to a sequence of elements. Very clean implementation:

struct coordinate {
   int col;
   int row;
}; 

static int getIndex(Array2b_T A, int col, int row);
static struct coordinate* getCord(Array2b_T A, int index);

Missing: statement of algebraic laws:

getCord(a, getIndex(a, i, j)).col == i
getCord(a, getIndex(a, i, j)).row == j
getIndex(a, getCord(a, n).col, getCord(a, n).row) == n

Also excellent basis for testing!

Homework performance

If your results are not as expected

  1. Compile with -O2

  2. Make sure your image is at least 40 to 50 megapixels

Review

Address split

     64                           16              6      0
       +--------------------------------------------------+
       |  tag                       |  set index   |offset|
       +--------------------------------------------------+

Set of two cache lines (2-way **associativity):

         +---+  +------------------------+    +---------------------+
         | V |  |  tag   (48 bits)       |    |  Block  (64 bytes)  |
         +---+  +------------------------+    +---------------------+

         +---+  +------------------------+    +---------------------+
         | V |  |  tag   (48 bits)       |    |  Block  (64 bytes)  |
         +---+  +------------------------+    +---------------------+

Example: Conflict on a set of size 2 with three addresses

Three reasons a block might not be in the cache

Einstein #1: more addresses than will fit in a set

Einstein #2: more addresses than will fit in the cache

Non-Einsteinian #3:

14 Oct 2009: Wrapping up the cache; arithmetic

pa143647 pa143648 pa143649 pa143650 pa143651 pa143652

Class exercise: modeling the cache

Assume cache lines are loaded on demand only

Examine graphs:

  1. For stride of size k, how many misses per load?

  2. Which hardware conforms to the model?

  3. What is going on with the Core2 Quad Q6700?

Other questions

AMD64 alignment

Type         size  alignment
bool           1       1
char           1       1
short          2       2
int            4       4
long           8       8
pointer        8       8
float          4       4
double         8       8
long double   16      16    (80 bits)

call stack    --      16

Extra hardware to avoid misses

Block diagram in photo:

Locality homework

How many have already submitted?

Sketch of image compression

Representation is the essence of programming. A *2x2 block of pixels is represented as

  1. 12 or 24 bytes on disk in PPM format

  2. RGB values (scaled integers)

  3. RGB values (floating-point numbers)

  4. Component video [Y/PB/PR] (floating-point numbers)

  5. Average chroma plus DCT (a, b, c, d) of luminance Y (floating-point numbers)

  6. 6 scaled integers (3 signed, 3 unsigned) with different scales

  7. One 32-bit codeword (an unsigned integer)

  8. 4 bytes on disk (big-endian order)

19 Oct 2009: Finite-precision decimal arithmetic

Arithmetic more fun to teach than to learn:

Topics:

Arithmetic with limited precision

TopicBase 10 Base 2
Positional notation1 7
Natural numbers (unsigned ints) 2 8
Signed-magnitude integers 3 not used
Two's-complement integers impossible 10
Fixed-point numbers (scaled integers) 4 ---
Floating-point numbers5 9
Software emulation (rationals) --- ---
Software emulation (arbitrary precision) 6 ---
Bit vectors and logical operations senseless 11

Natural numbers using positional notation with finite precision

n digits base b

d_{n-1} * b^{n-1} + ... + d_1 * b^1 + d_0 * b_0

Can represent any number k in the range 0 <= k < b^n

Examples: 112, 894

Addition: 112 + 894 = 1006

Question: how many additional digits needed to hold sum in general?

Question: how do you prove a theorem?

Multiplication: 112 * 894 = 100128

Question: how many digits are required to hold the product of an n-digit number and an m-digit number?

Question: do any of these theorems depend on the choice of base?

Signed-magnitude representation

Add a plus or minus sign to a number in positional notation

No examples; you've all done this to death.

Fixed-point representation of real numbers

Finite precision: 3 digits

Want to represent real numbers.

Idea: scaled integers

If the denominator is a power of 10, we can use the world-famous decimal point. (It works just as well with any other base.) Let's use the denominator 10:

Addition works as addition of integers:

112 + 89 = 201

11.2 + 8.9 = 20.1

11.2 + 8.94 = 20.14

But

8.94 + 8.94 = 17.88  (to three digits, 17.9 or 179/10)

8.9 + 8.9 = 17.8     (loss of accuracy in calculation)

Multiplication requires dividing by the scale factor

11.2 * 8.94 = 100.128 = 100.0 (


11.2 * 8.9 = 99.68

112 * 89 = 9968    (overflow!)

112 / 10 * 89 = 11 * 89 = 979

21 Oct 2009: Binary arithmetic

pa213653 pa213654 pa213655 pa213656 pa213657 pa213658 pa213659 pa213660 pa213661 pa213662 pa213663 pa213665 pa213666 pa213667 pa213669 pa213671 pa213672 pa213673 pa213674 pa213675 pa213676 pa213677 pa213678 pa213679 pa213680 pa213681 pa213682 pa213683 pa213686 pa213687 pa213688

Midterm course evaluations

Your chance to effect change!

https://www.eecs.tufts.edu/cscourse/feedback.php

Crucial information for homework

  1. Computation on finite representations of real numbers introduces errors (so-called "rounding error")

  2. Converting to a representation with fewer bits introduces errors ("quantization error")

  3. Today: binary arithmetic

  4. Later: the details of how precision is lost in floating-point computation.

All the world's a bit vector

What's this?

1111 0100

Ideas?

Bit stands for binary digit.

It's positional arithmetic, base 2.

Two's-complement arithmetic

Most significant bit is interpreted as sign bit

So

1111 0100

-128 + 64 + 32 + 16    +  4

    -64   + 32 + 16    +  4

         -32   + 16    +  4
              -16      +  4
                     -12

What therefore is the 6-bit

 11 0100 ?

What about 16-bit

1111 1111 1111 0100 ?

And 5-bit

1 0100

Truth: given a signed integer in two's-complement representation, you can extend the sign bit as far as you like.

Questions

Bitwise operations

Treat word as a bit vector, do Boolean operations in parallel:

&  bitwise and
|  bitwise or
~  bitwise complement

Complement changes 0 to 1 and 1 to 0. So if we write

k = -d{n-1}*2^{n-1} + d_{n-2}*2^{n-2} + ... + d1 * 2^1 + d0 * 2^0

Then what are the digits of ~k

     (1-d_{n-1}) ... (1-d0)

So what is the integer value of ~k + 1?

     -(1-d{n-1})*2^{n-1} + (1-d_{n-2})*2^{n-2} + ... + (1-d1)*2 + (1-d0+1)*2^0

     -(1-d{n-1})*2^{n-1} + (1-d_{n-2})*2^{n-2} + ... + (1-d1)*2 + (2-d0)*2^0

     -(1-d{n-1})*2^{n-1} + (1-d_{n-2})*2^{n-2} + ... + (1-d1+1)*2 + (-d0)*2^0

     -(1-d{n-1})*2^{n-1} + (1-d_{n-2})*2^{n-2} + ... + (2-d1)*2 + (-d0)*2^0

                                    (2-d_i)*2^i -> 1 * 2^{i+1} - d_i*2^i

And eventually

     (1-(1-d_{n-1}))*2^{n-1} + (-d_{n-2})*2^{n-2} + ... + (-d1)*2 + (-d0)*2^0

       d_{n-1}*2^{n-1} + (-d_{n-2})*2^{n-2} + ... + (-d1)*2 + (-d0)*2^0
      - (-d_{n-1})*2^{n-1} + (-d_{n-2})*2^{n-2} + ... + (-d1)*2 + (-d0)*2^0

      - k

Multiple interpretations of bit vectors

Most important are the signed and unsigned integers.

What are 'Big-Endian' and 'Little-Endian'

Serial formats:

Big-Endian is most significant byte first

Little-Endian is most significant byte first

We now reveal technology for extracting bytes (and implementing Bitpack)

Shifting

Shift right and left, >> and <<

Suppose the two most significant bits are identical?

What happens on a right shift if the least significant bits are not zero?

Combining bitwise operations with shifts

Boolean algebra plus shifter.

Recall our example byte:

1111 0100

Suppose we want to replace 101 with 010?

Or more generally, given

111xxx00

and given abc, how do we produce

111abc00?

Next problem: given two bytes

mnopqrst

and

abcdefgh

how do you compute

mnofghst ?

How about

unsigned field = Bitpack_getu(abcdefgh, 3, 0);
return Bitpack_newu(mnopqrts, 3, 2, field);

26 October 2009: Floating-point arithmetic

pa263691 pa263692 pa263693 pa263694 pa263695

Changes in course staff

Lost services of Lee Tibbert

Observations on the homework

Time spent on class varies widely:

Looking for

Simplify!

Plans for the next two weeks

No new homework for a week!

Friday's lab is cancelled.

New homework and new topic one week from today.

The upcoming midterm

Most students find my exams difficult

Two reasons:

Why?

How to prepare:

How to take the exam:

Floating-point numbers

Most important concept: normalized numbers

Anatomy of a floating-point number:

Formula;

x = +/- m * b^n      where n typically is exp - k

Q: What happens if we subtract 1 from n and multiply S times b?

x = +/- (b*m) * b^(n-1)

To avoid this problem, we have normalized numbers:

Examples:

Q: Can every number be normalized?

Examples base 10 -- 3 digit significand

x = 10.1    normalized 1.01e1
y = 9.93    normalized 9.93e0
z = .707    normalized 7.07e-1

Typical coding exponent 10^n where n = exp - 5 and exp is one digit

Q: Are there nonzero numbers that can't be normalized?

Q: Which ones can be represented as denormalized numbers?

Zero is a denormalized number

Floating-point numbers and algebraic laws

Decent, self-respecting laws do not hold

Let's see why!

Floating-point unit must add apples to apples

Before two numbers can be added or subtracted, exponents must match

Example real arithmetic: 10.1 - 9.93 = 0.17

But

  10.1   =   1.01e1  -->   1.01e1
-  9.93  =  -9.93e0  --> - 0.99e1
                       ------------
                           0.02e1  --> 2.00e-3 = 0.20

Yow! 0.17 != .20, and result is wrong in every digit.

Add a guard digit and the result is exact* (demo).

Leads to the cardinal rule of the IEEE floating-point standard

This is a gold standard (no hardware could possibly do better).

And yet errors accumulate!

Take-home messages:

  1. One operation is "exact then round"

    Result is answer + eps ("error term"; "rounding error"; "quantization error")

  2. Further computation with rounded results increases error

  3. Extreme example: (x + eps) - (x + eps') = eps - eps'

    Cancellation (here, catastrophic---only error is left)

Cancellation example

Subtracting rounded results can leave only error!

x = 1.01e-2            x^2 = 1.02e-4
y = 9.93e-3            y^2 = 9.86e-5

x - y = 1.70e-4
x + y = 2.00e-2

x^2 - y^2   = 9.21e-4
(x-y)*(x+y) = 3.40e-6

Exact answer: 3.4051e-6

Exact answer, rounded: 3.41e-6

Floating point for the amateur

Minimum exp holds much more than zero:

Things to know:

For the deeper mysteries, let Dave Goldberg be your guide.

2 Nov 2009: Debugging and assembly code

pb023696 pb023697 pb023698 pb023699 pb023700 pb023701 pb023702 pb023703

Debugger demo

Compiling with -g -O0:

report40 demo-req*-bm.lua40

cp /comp/40/require/tests/locality/required-rot90-bm/rand105x104.0 /tmp/box.pbm
ddd --debugger "gdb -d /comp/40/cii" ppmtrans-block

Demo:

Parameter passing

DDD's machine-code window

Defusing a binary bomb

Translation examples:

4 Nov 2009: Control flow

pb043704 pb043705 pb043706 pb043707 pb043708 pb043709 pb043710 pb043711 pb043712 pb043713 pb043714 pb043715

Parameter registers:

How to analyze:

Conditition codes

All decision making funnelled through condition codes, which Intel calls flags.

Translating control flow

Four basic tools:

Translating control flow

Basic conditional:

L':

Short-circuit Booleans:

if (C1 && C2) goto L;

becomes

 if (!C1) goto Lnot;
 if (C2) goto L;
Lnot:

Short-circuit Booleans:

if (C1 || C2) goto L;

becomes what?

How about

while (C) S;

Try Forrest Baskett!

 goto Ltest;
Lhead:
 S
Ltest:
 if (C) goto Lhead;

And also

for (exp1; C; exp2) S;

do S while(C);

Finally

switch (exp) {
  case 0: SS0;
  case 1: SS1;
  case 2: SS2;
  case 3: SS3:
  default: SSd;
}
...

Uses a jump table (Pidgin C):

static code *jump_table[4] = { L0, L1, L2, L3 }; // addresses

 t = exp
 if ((unsigned)t < 4) {
    t = jump_table[t];
    goto t;
 } else
    goto Ld;
L0: SS0;
L1: SS1;
L2: SS2;
L3: SS3:
Ld: SSd;
Lend: ...    

And break becomes goto Lend

9 Nov 2009: Indirect branches, returns, and calls

pb093720 pb093721 pb093722 pb093723

Review

Instructions set flags

test
cmp
fcom
[u]comis[sd]

Decisions read flags:

jcc  (je, jl, jae)
cmov (cmovnz, ...)

Q: If the field in %rdi that is 4 bits wide and has least significant bit 4 is not zero, explode the bomb. What is the code?

Higher-level control flow is built from conditional and unconditional branch

How about

while (C) S;

Try Forrest Baskett!

 goto Ltest;
Lhead:
 S
Ltest:
 if (C) goto Lhead;

Computed branches

The target is not coded into the instruction:

jmp *(%rax)
jmp *0x40b980(,%rax,8)

And what else?

Calls and returns

In the beginning was FORTRAN. No recursion, no stack!

No excuse for a recursive factorial function, except it makes a good example:

int fact(int n) {
  if (n == 0)
    return 1;
  else
    return n * fact(n-1);
}

Q: When fact(4) is calling fact(3), where is n-1?

Q: OK, then where is n?

Q: Where was n on procedure entry?

Q: We have a problem. What's the general shape of the solution?


Attack from a fully general point of view (timeless truth):

Here's the life cycle of a procedure:

What's wrong with this picture? Is it adequate to describe factorial?

Suppose we let the program run, and we suddenly freeze it at an arbitrary point:

Refine our idea of state:

If f() calls g() calls h(), how are their lifetimes related?

Is there any way that an activation of g() can outlive an activation of h()?

Therefore, what is the right data structure for activations?

11 Nov 2009: Procedure calls and a virtual machine

pb113725 pb113726 pb113727 pb113728 pb113729 pb113730 pb113731 pb113732 pb113733 pb113734 pb113735

Review:

The implementation of procedure calls: Calling conventions

Conventional contract or agreement:

Focus on two procedures: caller, callee

Conventions for

Universal truths:

Problem:

Problem:


Bonus

More universal truths:

Problem:

Truths of C programs:

A Universal Machine

A Turing-complete machine in 14 instructions

Why the Universal machine?

Architecture of the Universal Machine

A 32-bit machine:

A segmented memory architecture:

Word-oriented, RISC instruction set architecture

Computational capabilities deliberately limited:

Instruction set

Emulating the Universal Machine

Keep representation of machine state:

Interface: given an initial 'array 0' (binary program), run until reaching a halt instruction, then return.

Design of UM software

Asking for a more modular design:

16 Nov 2009: Comments on the arith homework; refactoring; how to write documentation

pb163736 pb163737 pb163738 pb163739 pb163740 pb163741 pb163742 pb163743

Examples from the arith homework

What's wrong with this code?

if (n <= max)
    return true;
else
    return false;

What can you tell me about this value?

uint64_t word = ...;
return (word << (64-w)) >> w;

Which of these is guaranteed to do a 64-bit shift?

1 << w
1l << w
1ul << w
(uint64_t)1 << w

Best of all:

static const uint64_t one = 1, zero = 0;

Simplify, simplify, simplify!

Suggested procedure:

  1. Work with your programming partner

  2. When session is over, print out code

  3. Each of you can review code on printout

  4. At next meeting, discuss proposed revisions and simplifications

  5. Implement the best ones

Example: Bitpack_newu in 21 lines:

uint64_t Bitpack_newu(uint64_t word, unsigned width, 
                      unsigned lsb, uint64_t value) {
  if (width + lsb > 64) {
    RAISE(Bitpack_Overflow);
  }

  if(!Bitpack_fitsu(value, width)) {
     RAISE(Bitpack_Overflow);
  }

  uint64_t one = 1;

  uint64_t filter = shift_left64(one, width) - 1;
  filter = shift_left64(filter, lsb);

  uint64_t result = word & (~filter);
  uint64_t insert = shift_left64(value, lsb);
  result = result | insert;

  return result;
}

Simplified version in 9 lines (assuming global static uint64_t one = 1):

uint64_t Bitpack_newu(uint64_t word, unsigned width, 
                      unsigned lsb, uint64_t value) {
  if (width + lsb > 64 || !Bitpack_fitsu(value, width))
    RAISE(Bitpack_Overflow);

  uint64_t mask =  shift_left64(one, width) - 1;
           mask = ~shift_left64(mask, lsb);
  return (word & mask) | shift_left64(value, lsb);
}

Factors of two matter!!

Documentation (as expected for UM)

Typical error on arith:

Here is what I expect for the Universal Machine:

18 Nov 2009: Documentation and Implementation of the Universal Machine

Overview:

What I'm looking for in your code and documentation

Documentation:

Code:

Implementation Issues

UM segmented memory architecture

World of ideas:

World of implementation

Other issues

23 Nov 2009: Performance

The performance killers

1 Jan 2010 (the indefinite future): stuff we will cover later

Assembly code vs machine code

malloc/free vs new/delete (no "constructors")

Hanson's NEW and FREE; use of sizeof

Using function pointers and void *.

The implemntation of Array, how it maps to bytes and words. Why put and get are truly redundant. What would be a better interface? What would be a better implementation?

1 Jan 2010 (the indefinite future): stuff we will cover later



Back to class home page