Blackboard: Machine structure and assembly-language programming
Introduce course staff
Myself: programming language guy, but deeply involved in compilers, assemblers, linkers, debuggers, all of which work with machine structures and assembly language.
Call me 'Professor' or 'Professor Ramsey' or 'Norman'
I like questions! If you have a question, chances are several other people also have a similar question.
Blackboard: Machine-level programming
But also
Raise programming to another level
11's building blocks were statements and expressions
15's building blocks were functions and methods
40's building blocks are modules
Engineering problem-solving
The example of the two GPS's
The Yellow Thing
The Driver's Thing
Both have time to destination
How do you think it works? What are the algorithms and data structures?
Which one works better?
Actually the Yellow Thing works better. Why? Traffic and red lights.
Example engineering problem: how should the Driver's Thing account for traffic and red lights?
Engineering problem for 40: how many have access to an 8-megapixel digital camera? Congratulations---we will turn it into a 300dpi scanner.
Who has an idea what abstraction is?
Answers
Parameterizing over something, where something about the parameter is not known.
Connecting the world of ideas to the world of implementation, and showing that the world of implementation faithfully reflects the world of ideas. (You may substitute "homework problems" for "ideas".)
We will see lots of abstraction, but let's talk about the machine level:
Computer instruction set is the AMD64
Based on an 8-bit microcontroller almost 40 years old
Took on its modern form over 20 years ago (Visicalc runs on Windows 7)
What do you think is going on in the hardware? Not what it says in the manual! But the hardware is faithful to the manual.
How good is this abstraction?
Can the computer do arithmetic on integers?
Somebody give me an integer
OK, now somebody give me a method, using arithmetic, of making a bigger integer.
How many digits???
What's wrong with this picture?
How many digits in a byte?
How many bytes are available?
Can we do arithmetic?
Answers:
The machine throws up its hands around 4 billion, or if you try a little harder, 16 billion billions.
Software libraries can go bigger---until you run out of memory. They're called "bignums".
Languages like Haskell, Scheme, and Smalltalk have this built in
C and C++ give you what the machine has
So (if it can't even do arithmetic) what's so special about the machine level of abstraction?
Some faults can be diagnosed only if you understand machine-level programming.
To understand and improve performance, you must understand the machine level.
The foundation of the world's computing infrastructure is built at this level, in C or C++.
A problem: AMD64 is too complex to be mastered in one semester. So, you'll learn to analyze AMD64 code, but you won't write much. To understand assembly code and machine instructions in depth, later in the term we'll work with a virtual machine. Most students find this pretty interesting.
Group Exercise: Ways to represent data in C++
Followup on blackboard:
Admin:
Don't be put off by the paper storm. You have to read the course policy pages; they will tell you important stuff like we don't take submissions that have tab characters in them or that are wider than 80 columns.
For the rest, read when you are stuck or are moving more slowly than you would like on problems. For example,
If you know how to do something in C++ but not in C, that would be the time to look at the C idioms.
If you are having trouble with error messages, that might a time to look at the libraries handout.
For tonight's design document, we just want an explanation of what data structures you will use to solve the problem of the fingerprint groups, and how the contents of those data structures relate to the input. A couple of well-written paragraphs will do the job. Certainly no more than a page.
Note: we expect your code to perform reasonably on inputs of several hundred thousand lines
More admin details in a future class.
We're polling for office hours! Check the course web page; poll will close tonight.
Syllabus sketch:
My question about data was
NOT "what different places does data live?", but rather
What kinds of data can you talk about, no matter where it lives?
In C, the answers are driven firmly by questions of representation:
A scalar is a quantity that might live in a machine register
(Machine register is a kind of storage that is very fast, maybe 100x or more faster than memory, but in very short supply. You can put billions of integers in memory; in registers you can only put 16.)
This covers all the integer, character, boolean, and floating-point types
Representationally, these are the words, halfwords, doublewords, quadwords, quarterwords, bytes, and so on.
Scalars are boring.
A pointer to a sequence of things in memory
This one comes in flavors:
Sequence of identical things == as close as we get to array
Sequence of mixed things (heterogeneous sequence) == struct
Sequence of machine instructions == function
Sequence of thing to be determined at run time == union!
Representationally, on today's machines there is but one kind of pointer, and it is always a full word (64 bits in our case)
Pointers are where it's at!
The high points:
No more strings---only pointers
Pointer equality is not string equality
struct and pointer to struct is not the same thing
Almost everything interesting is some kind of pointer
We'll hit the details
I asked about the problem with the Hanson book, and evidently the instructor in the summer course made it recommended rather than required. I'm very sorry for the nuisance.
I've written to Hanson and he's asked his publisher to print more copies; apparently they are print on demand.
Meanwhile you can buy online access through Safari
Simulate exceptions
An important language feature
Found in C++ and many other languages
No obvious analog at machine level
Therefore not found in C
Atom interface: less power than C++ strings, but better than C's
bare pointers. A sweet spot in the design space.
Picture: two pointers to "Hello"
Pointer equality no good
strcmp a nuisance, linear time
Reduce string to pointers by
Atom_string(s1) == Atom_string(s2) iff strcmp(s1, s2) == 0
Memory-management utilities to reduce possibility of pilot error
String formatting and string analysis (Fmt, Str, Text)
Abstract data types!!!
High-level abstractions: sequence, set, finite map, integers, threads
Let's hear about data structures you used in COMP 15, but I want the user's-eye view from the world of ideas, not the implementor's view from the world of code.
Call for data structures
Deal with each one abstractly
The many flavors of sequences
Group exercise (skipped):
fgroups? Is the
cost model important? Table interfaceDetailed analysis of the Table interface, with special attention to void type and C-style polymorphism.
Two of the most important concepts in programming
Parametric polymorphism
Function closures
Neither concept is inherently difficult, but
Both are unfamiliar
Both are awkward in C++
Both are extremely awkward in C
Had you studied Scheme or Haskell or ML, the compiler would handle it, and you would barely even notice what was happening. In C, you must make it all happen yourself.
Start with parametric polymorphism:
What are the type parameters? (If you prefer, template parameters)
Let's modify the interface by identifying which voids represent key and value (looking at just the put and get methods):
So, can you define a Table_T in which the values are integers?
And now, 16 variations on a theme (computer demo)
High points:
void * is a pointer to an unknown typevoid * might also be an unknown pointer type
void * pointers enables powerful abstractions like iterationsAdmin:
Orientation:
Learning how to think in terms of big building blocks for software
Soon we'll be creating new building blocks
Then we'll use these skills to tackle interesting problems in machine-level programming
Why:
Abstraction (the relation of code to the world of ideas) is new, but it allows you to think bigger. Often you can decide on implementation independently, once you know more about the best cost model.
Invariants are a powerful tool for getting your implementation right.
You can often connect the two: even though the relationship between your code and the world of ideas doesn't appear in the code itself, you can think of it
Syllabus sketch:
Build up your skills at programming with modules
Gradually introduce more and more emphasis on machine-level topics
Syllabus on the web is a work in progress.
Picture:
World of Ideas
^
|
V
the black box <-------- invariants
Important bits of thinking:
How does the world of ideas relate to the interface?
How does the world of ideas relate to the implementation?
What is true about the implementation?
An invariant
Is a property whose truth or falsehood can be checked at a single point in time
Does not require any narrative description of actions taken as time passes
BUT is a property that holds at multiple single points throughout the program
A representation invariant for an abstract data type
Is true any time code is executing outside the black box
Enables what's called rely/guarantee reasoning
Each exported function may assume (rely upon) the invariant on entry
Each exported function must guarantee the invariant on exit (and therefore entry to the next function)
Invariants are usually just about the implementation, but the idea of invariant property is also useful when relating code to the world of ideas.
What are the invariants for
Which functions rely on these invariants?
Table_T in 16 slides (computer demo)High points:
void * is a pointer to an unknown typevoid * might also be an unknown pointer type
void * pointers enables powerful abstractions like iterationsGroup Exercise: what is a variable?
Classification of functions (works for any abstraction):
and sometimes
At end of lab, don't forget to pick up
Procedure for today (walk students through)
Who of those present are engineers? Who are mathematicians?
Alert! We are entering realm of design
Design tasks in your homework:
Major design content in removing black edges:
Topic of Friday's lab.
This week's lectures:
We will analyze and improve Hanson's Array interface (and look at programming idioms)
We will study carefully Hanson's Array implementation, seeing how contracts and invariants come into play
(The contract is also known as a precondition and postcondition.)
But first,
Don't forget to look at the course web pages. Details and Grading are important! Also lots of technical goodness there for you.
It's your job to get to know the faculty. 5% of your course grade is to visit me at office hours. 11 students have already done so. Some of you have conflicts; make an appointment.
It is never acceptable for a partnership to divide work:
If you are working on code, both partners must be present. (If there's a problem, talk to the course staff.)
Example: disaster if one of you does sudoku and the other
does unblackedges
At most three assignments with any one partner
Please pay attention to detail on instructions, especially standard input, standard output, and command line
Pay attention to abstraction, modularity, and reuse:
Homeworks must print in 80 columns and must have no tab characters
Acknowledge all help received, including Wikipedia, classmates, stackoverflow
The 11:59 deadline has a 10-minute grace period
You can extend at most 2 times on any one assignment
A single token extends all parts of an assignment, including design and implementation
You have five tokens for the term. Use them wisely.
Our evaluation serves two purposes:
So that you know how you are doing, you get a grade.
Grades range from No Credit up to Excellent. Consistent Very Good grades represent A work; for more detail, see the web pages.
So that you know how to improve, you get comments on your work.
At Tufts we have a saying: you don't have to be bad to get better. Our job is to help you learn as much as you can and to get as good as you can.
Grades also contribute to your final course letter grade:
How many names are in the table?
Overview:
I scored higher than on some previous exercises, since almost everyone answered the questions I was trying to ask, which must have been aligned with questions written down.
Many diverse answers, including many diverse right answers
Almost all metaphors and mathematical ideas were useful, but some are more useful than others
Analogy with bridges again:
My job is to steer you toward the more useful metaphors
Polymorphism:
An important but rich topic. Takes several weeks in COMP 80 to cover the basics precisely.
Our job in 40 is to get a feel for polymorphism. Important ideas:
We will deal exclusively in parameteric polymorphism, where at least in the world of ideas, type parameters are lurking. For intuition, think "template parameters"
Polymorphism question: in the world of ideas, what is an array parametric in?
Not everyone grasps deep connection between pointers and variables - a variable is a container that has a name - a pointer points to a container, probably anonymous
Modular metaphor---picture the boxes and the functions
Abstraction is a tool by which we can achieve modularity
As clients, do we open the box?
Application: Of the code you wrote for the previous homework, what is most difficult to understand?
Casting is dangerous because the compiler trusts you implicitly
Example:
FILE *fp = fopen(argv[1], "r");
...
fclose((FILE *)fp);
What's wrong with this picture?
What if the truth changes? Becomes
const char *fp = argv[1];
...
fclose((FILE *)fp);
Limit casting to the essentials!
Only these casts are acceptable:
const or volatileEvery cast must be justified in writing. Unjustified casts will be downgraded. In particular,
void * pointer
Review: - a variable is a container that has a name - a pointer points to a container, probably anonymous
Given any variable, you can create a pointer that refers to the same container:
int n;
sscanf(s, "%d", &n); // read number, place in location pointed to
Address of a local variable is safe only because the pointer is not used after the death of the procedure. We say "the pointer does not escape."
Converse not so: some locations pointed to aren't named by any variable:
int *p = malloc(sizeof(*p));
Why sizeof?
What about polymorphism?
Much confusion on this point. Sample answers from the same group:
What is said of each is true and is also true of the other
Another point of confusion: variables need not be temporary! Global variables, for example, live forever. You can decide how long a container lives independent of whether you refer to the container using a variable or a pointer.
Lesson:
A variable xand pointer location*prefer to exactly the same kind of thing—a container—and they have the same properties in common
Abstractions:
In the world of code, what would be a good representation of a single array element?
Digression: know when a narrative is not called for!
to access the rep, you begin at first and skip over
ielements
A good representation is not the same as how to find
I'd prefer an answer that says "we don't know what you mean by representation, but here's how to find it".
It ain't what you don't know that gets you into trouble. It's what you know that ain't so." ---Mark Twain
Array interfaceArray slides:
The whole interface
Classification as 'creator', 'destroyer', 'observer', 'mutator', 'producer'
What does the void * point to?
How to mutate without losing object identity
Representation exposed!
Contracts
Array_put and Array_get and the pitfalls of pointers
Making a good design better with mutable locations
Homeworks to be returned; fgroups to be graded by Wednesday
Concatenating literals
Refer to enumeration literals by name!
Use sizeof and NELEMS
If you can use a for loop, use a for loop. while is for loops that don't fit the for model.
Learning to program with abstraction and invariants so that we can tackle interesting problems instead of boring problems.
Most likely source of bugs in your code:
Invariant does not properly relate the world of ideas to your implementation, or
Invariant is violated
The design checklist will help you debug faster
Today's class: Invariants, contracts, and implementation
Wednesday: compiling and linking, starting to look at machine structure
Correct one-to-one mapping, test proposals (correct 2D array, two passes):
Place distinct values in distinct containers
Write all zeroes, write and check for ones
Use map functions and compare addresses (requires development, but seeking uniqueness of addresses). Won't detect if two containers map to overlapping memory blocks.
Write values and then print them. Not good for unit test because it requires too much human judgment. (And too much work to repeat for a new implementation.)
Correct ownnership:
Look for segfaults. Nice idea, but won't work. Segfault doesn't know what you own. By the end of next week you will know exactly what segfault means and what it does and does not do for you.
Free memory and look for leaks. No protection against reading memory you don't own. But does protect against leakage.
Check value of the representation of every container (pointer
returned by get function). Make sure it's in the range of
memory allocated. Works, but only by exposing the
representation. Not robust against change in representation.
Practical only if the array is a single block of memory.
Otherwise, quite likely to have bugs in test.
Use valgrind on any program that reads every container.
3 weeks since first day of class:
You have built a lot of skills:
Programming with interfaces and implementations
Working with other people's interfaces
Programming in the world of ideas
Invariants and the relationship of the world of ideas to your code
Designing your own interfaces
Design checklists for when you have bugs or get stuck
Some people call these methods object-oriented design, but because everybody wants to be called "object-oriented", the world looks disorderly. Don't believe everything you see on Wikipedia.
You will continue to practice and build on these skills, but starting next week, we will focus more on the machine structure and what's happening at that level.
Today, we're going to talk in detail about the concepts underlying compiling and linking. On demand we may look at compile scripts.
Your best source to read about this material is my Short Guide to Working With Unix Libraries.
Questions:
Can we create a program just from Array2? What would that
mean?
Where does every C program (and C++ start executing)?
How would we run the unit tests we talked about at the beginning of class?
1st homework deadline remains 10/8 (Thursday).
2nd deadline extended to 10/13 (Tuesday).
I know the .i files are new technology. A couple of points:
The .i file maintains a single point of truth about how to
rotate an image.
The macros are new technology, but they are buying you compile-time type checking (in place of function pointers with casts).
Try gcc ... [many options] -E foo.c | less, which will expand
all macro definitions and write the results to standard output.
With the .i trick and the function-pointer trick, you now know
all the tricks I know for supporting polymorphism in C. It's a
critical skill.
One other point about reading PNM files:
The Pnmrdr was good when we had no 2D arrays
Switch now to the Pnm interface; it exploits your 2D arrays
Function pointers to the max
Questions?
Newtonian (classical) picture of storage:
Address space is a big blob
Variables are somewhere
All costs are equal
Classical picture works in everyday cases, but breaks down at the extremes.
To get from Newton to Einstein, we need to know more about storage
A register is a one-word container
16 Integer registers: general-purpose
16 SSE registers (XMM): floating-point, vector
8 legacy floating-point registers: 80-bit floating-point
control word
status word
It is impossible to create a pointer to a register
New terms of art (for language designers and compiler writers)
Location is the container
Value is the thing contained
Very occasionally you'll hear a location called a "mutable cell".
This distinction is important because we need to understand the difference between a location (which a program can assign to) an a value (which cannot be assigned to).
Address space is a sequence of 2-raised-to-64 one-byte containers (but only 2-raised-to-48 are usable on today's machines)
Byte containers can be aggregated to form word containers
8 bytes make a 64-bit word
We'll look at details in arithmetic
Address-space layout (software convention)
+-------------+
| OS reserved |
+-------------+
| ... |
| Dynamic |
+-------------+
| Stack |
| ... |
| (unmapped) |
| ... |
| heap |
+-------------+
| data |
+-------------+
| text |
+-------------+
| unmapped |
+-------------+
Wait a minute! There's not enough memory!
Address space: 2-raised-to-48 canonical addresses (256 terabytes)
Physical memory: 1-16 gigabytes typical (30-34 bits' worth)
Solution: Addresses are translated, also called mapped
The mapping is partial but is injective (one to one) on its domain
However, just counting shows that almost all addresses are unmapped
What happens when a program tries to use a container whose address is unmapped???
More on this in COMP 111 (Operating Systems)
Registers < 300B
L1 cache from 8KB to 128KB
L2 cache 512KB to 8MB
L3 cache 2-6MB
Note on the numbers:
The ideas are timeless, but the numbers change frequently
Every new chip/process changes cache size and speeds
A new architectural revision may add registers
Registers are small and fast and live happily with the CPU
The C compiler uses registers and memory
Local variables and formal parameters whose addresses are not taken go into registers.
Everything else goes in memory
Memory and registers are connected by a sequence of caches
Programs show a property called locality
Temporal locality: referring to the same address repeatedly, close together in time
Spatial locality: referring to an address close to another address in space
Idea: keep recently used bytes/words and nearby words in the cache
Sets of fast containers (not as fast as registers; faster than memory)
Two new terms of art (Bryant and O'Hallaron, page 506):
The cache line is the name for the container, plus extra information about what is in the container
The block is the name for the thing contained
Today's machines have a line size of 64B (8 words).
Phenom 9850 has very large 128KB L1 cache
2048 cache lines
To deliver two words per cycle, cannot possibly search all 2048 lines. Therefore lines are divided into sets.
Sets are organized for best performance
Architects do massive simulation
Alternatives are very well covered in Chapter 6
I will hit the high points
Search through sets is done by very simple hardware
Phenom 9850 uses 2 lines per set, so 1024 sets
Split address into pieces (see Bryant and O'Hallaron, page 488)
64 16 6 0
+--------------------------------------------------+
| tag | set index |offset|
+--------------------------------------------------+
A set holds two lines:
Each holds valid bit, tag, and block
+---+ +------------------------+ +---------------------+
| V | | tag (48 bits) | | Block (64 bytes) |
+---+ +------------------------+ +---------------------+
+---+ +------------------------+ +---------------------+
| V | | tag (48 bits) | | Block (64 bytes) |
+---+ +------------------------+ +---------------------+
Cache-search algorithm:
Take 10-bit set index from address, choose a 2-line set
In parallel, compare tags and check valid bits
If tags match and line is valid, use offset to index into block
Otherwise, we have a cache miss
Go to next level down and get a block
Find a place to put the block in the cache:
Determine which set by looking at bits of address
Within set, is any line invalid?
If yes, store the block and the tag and make the line valid
If not, evict a block to make room
Eviction:
Choose block that is least recently used or not used recently
May need to write the block to the next level down (write-back)
Block may already have been written (write-through)
Question: is there a scenario in which the program uses just 3 bytes in memory, but they cannot all fit in L1 cache?
A couple of points people are not getting:
How many implementations of Array2b_map should be included in
your submission?
How many implementations of Bit2_loc should have been included
in a previous submission?
In order to have a single point of truth about your code, and
to preserve abstraction and representation-independence,
you must understand .h files and linking.
Previous submission duplicated the key Bit2 functions: one set
in bit2.c, which was never used for anything, and another set in
unblackedges.c. Don't make this mistake!
Also, people are unclear on the role of the compile script:
sh compile should cleanly build all the binaries for
the current assignment, and only those binaries. In particular,
compile should not try to build stale binaries left over from
previous assignments.Ambitious design: one-to-one mapping from blocked array to a sequence of elements. Very clean implementation:
struct coordinate {
int col;
int row;
};
static int getIndex(Array2b_T A, int col, int row);
static struct coordinate* getCord(Array2b_T A, int index);
Missing: statement of algebraic laws:
getCord(a, getIndex(a, i, j)).col == i
getCord(a, getIndex(a, i, j)).row == j
getIndex(a, getCord(a, n).col, getCord(a, n).row) == n
Also excellent basis for testing!
If your results are not as expected
Compile with -O2
Make sure your image is at least 40 to 50 megapixels
pnmscale will enlarge an image Address split
64 16 6 0
+--------------------------------------------------+
| tag | set index |offset|
+--------------------------------------------------+
Set of two cache lines (2-way **associativity):
+---+ +------------------------+ +---------------------+
| V | | tag (48 bits) | | Block (64 bytes) |
+---+ +------------------------+ +---------------------+
+---+ +------------------------+ +---------------------+
| V | | tag (48 bits) | | Block (64 bytes) |
+---+ +------------------------+ +---------------------+
Example: Conflict on a set of size 2 with three addresses
Einstein #1: more addresses than will fit in a set
conflict misses
easy with 2-way associativity (L1)
harder with 12-way or 16-way associativity (L2)
Einstein #2: more addresses than will fit in the cache
not in this class, but imagine an 8MB data structure in a 4MB class
capacity misses (rare in today's apps)
Non-Einsteinian #3:
compulsory misses
load a word you've never loaded before
Friday's lab an example
Image rotation an example
Assume cache lines are loaded on demand only
Examine graphs:
For stride of size k, how many misses per load?
L.L loads with stride k. How many lines
are loaded?Which hardware conforms to the model?
What is going on with the Core2 Quad Q6700?
Other questions
Type size alignment
bool 1 1
char 1 1
short 2 2
int 4 4
long 8 8
pointer 8 8
float 4 4
double 8 8
long double 16 16 (80 bits)
call stack -- 16
Block diagram in photo:
How many have already submitted?
Do we need extra TAs?
Shall I sketch the next homework, or wait until Monday?
(I can do it again Monday.)
Representation is the essence of programming. A *2x2 block of pixels is represented as
12 or 24 bytes on disk in PPM format
RGB values (scaled integers)
RGB values (floating-point numbers)
Component video [Y/PB/PR] (floating-point numbers)
Average chroma plus DCT (a, b, c, d) of luminance Y (floating-point numbers)
6 scaled integers (3 signed, 3 unsigned) with different scales
One 32-bit codeword (an unsigned integer)
4 bytes on disk (big-endian order)
Arithmetic more fun to teach than to learn:
First trip through, hard to see how it all fits together
I will try!
Topics:
| Topic | Base 10 | Base 2 |
| Positional notation | 1 | 7 |
| Natural numbers (unsigned ints) | 2 | 8 |
| Signed-magnitude integers | 3 | not used |
| Two's-complement integers | impossible | 10 |
| Fixed-point numbers (scaled integers) | 4 | --- |
| Floating-point numbers | 5 | 9 |
| Software emulation (rationals) | --- | --- |
| Software emulation (arbitrary precision) | 6 | --- |
| Bit vectors and logical operations | senseless | 11 |
Positional notation for integers
Representations of real numbers:
Scaled integers sharing a denominator (fixed point) and their arithmetic
Rational numbers (each number its own denominator, no hardward support)
Floating point: scaled integers, with a limited set of scales and a compact representation
Binary arithmetic
Unsigned arithmetic
Two's-complement signed arithmetic
Bit vectors
n digits base b
d_{n-1} * b^{n-1} + ... + d_1 * b^1 + d_0 * b_0
Can represent any number k in the range 0 <= k < b^n
Examples: 112, 894
Addition: 112 + 894 = 1006
Question: how many additional digits needed to hold sum in general?
Question: how do you prove a theorem?
Multiplication: 112 * 894 = 100128
Question: how many digits are required to hold the product of an n-digit number and an m-digit number?
Question: do any of these theorems depend on the choice of base?
Add a plus or minus sign to a number in positional notation
Represent negative numbers with a minus sign
Negate a number by changing signs
Subtraction enters the picture (can represent k1 - k2 where k2 > k1)
Arithmetic means addition, subtraction, and multiplication on magnitudes, along with clever manipulation of signs.
No examples; you've all done this to death.
Finite precision: 3 digits
Want to represent real numbers.
Idea: scaled integers
Divide by implicit denominator
All numbers share the same denominator
If the denominator is a power of 10, we can use the world-famous decimal point. (It works just as well with any other base.) Let's use the denominator 10:
Real number 11.2 represented as 112 (implicit division by 10)
Real number 8.94 represented as 089 (loss of precision)
Addition works as addition of integers:
112 + 89 = 201
11.2 + 8.9 = 20.1
11.2 + 8.94 = 20.14
But
8.94 + 8.94 = 17.88 (to three digits, 17.9 or 179/10)
8.9 + 8.9 = 17.8 (loss of accuracy in calculation)
Multiplication requires dividing by the scale factor
11.2 * 8.94 = 100.128 = 100.0 (
11.2 * 8.9 = 99.68
112 * 89 = 9968 (overflow!)
112 / 10 * 89 = 11 * 89 = 979
Your chance to effect change!
https://www.eecs.tufts.edu/cscourse/feedback.php
Computation on finite representations of real numbers introduces errors (so-called "rounding error")
Converting to a representation with fewer bits introduces errors ("quantization error")
Today: binary arithmetic
Later: the details of how precision is lost in floating-point computation.
What's this?
1111 0100
Ideas?
Bit stands for binary digit.
It's positional arithmetic, base 2.
Most significant bit is interpreted as sign bit
NOT sign-magnitude
Instead,
-d7*2^7 + d6*2^6 + d5*2^5 + ... + d1 * 2^1 + d0 * 2^0
So
1111 0100
-128 + 64 + 32 + 16 + 4
-64 + 32 + 16 + 4
-32 + 16 + 4
-16 + 4
-12
What therefore is the 6-bit
11 0100 ?
What about 16-bit
1111 1111 1111 0100 ?
And 5-bit
1 0100
Truth: given a signed integer in two's-complement representation, you can extend the sign bit as far as you like.
Questions
Smallest integer representible in 5 signed bits (two's complement)?
x xxxx
How do I complete the bits?
What is the value?
Largest integer representible in 5 signed bits (two's complement)?
x xxxx
How do I complete the bits?
What is the value?
What is the representation of -1?
Almost true that a 5-bit signed integer can represent the integers -15 to +15.
Treat word as a bit vector, do Boolean operations in parallel:
& bitwise and
| bitwise or
~ bitwise complement
Complement changes 0 to 1 and 1 to 0. So if we write
k = -d{n-1}*2^{n-1} + d_{n-2}*2^{n-2} + ... + d1 * 2^1 + d0 * 2^0
Then what are the digits of ~k
(1-d_{n-1}) ... (1-d0)
So what is the integer value of ~k + 1?
-(1-d{n-1})*2^{n-1} + (1-d_{n-2})*2^{n-2} + ... + (1-d1)*2 + (1-d0+1)*2^0
-(1-d{n-1})*2^{n-1} + (1-d_{n-2})*2^{n-2} + ... + (1-d1)*2 + (2-d0)*2^0
-(1-d{n-1})*2^{n-1} + (1-d_{n-2})*2^{n-2} + ... + (1-d1+1)*2 + (-d0)*2^0
-(1-d{n-1})*2^{n-1} + (1-d_{n-2})*2^{n-2} + ... + (2-d1)*2 + (-d0)*2^0
(2-d_i)*2^i -> 1 * 2^{i+1} - d_i*2^i
And eventually
(1-(1-d_{n-1}))*2^{n-1} + (-d_{n-2})*2^{n-2} + ... + (-d1)*2 + (-d0)*2^0
d_{n-1}*2^{n-1} + (-d_{n-2})*2^{n-2} + ... + (-d1)*2 + (-d0)*2^0
- (-d_{n-1})*2^{n-1} + (-d_{n-2})*2^{n-2} + ... + (-d1)*2 + (-d0)*2^0
- k
Most important are the signed and unsigned integers.
Differ only in the interpretation of the most significant bit
-2^{n-1}
or
+2^{n-1}
Serial formats:
a 32-bit integer has 4 bytes
a 64-bit integer has 8 bytes
Big-Endian is most significant byte first
Little-Endian is most significant byte first
We now reveal technology for extracting bytes (and implementing Bitpack)
Shift right and left, >> and <<
Suppose the two most significant bits are identical?
Then shift left 1 is equivalent to multiplying by 2
A right shift is always equivalent to a division
There are two right shift instructions.
Both notated as >> in C
Compiler chooses arithmetic or logical shift depending on type of operand.
What happens on a right shift if the least significant bits are not zero?
Boolean algebra plus shifter.
Recall our example byte:
1111 0100
Suppose we want to replace 101 with 010?
Or more generally, given
111xxx00
and given abc, how do we produce
111abc00?
Next problem: given two bytes
mnopqrst
and
abcdefgh
how do you compute
mnofghst ?
How about
unsigned field = Bitpack_getu(abcdefgh, 3, 0);
return Bitpack_newu(mnopqrts, 3, 2, field);
Lost services of Lee Tibbert
Advantage: your work will be evaluated by people with more experience. In particular, I'll be grading much of the homework myself.
Disadvantage: Backlog in grading homework; will take time to clear.
Disadvantage: Fewer people to cover the mailing list
I've started a Twitter feed to let you know when I will and will not be able to cover.
If you think of other information that would be useful for the Twitter feed, please let us know.
Especially for compile and link errors, please don't forget stackoverflow.com.
Time spent on class varies widely:
Aim is roughly 1.5 hours per day
total 15 hours for 10-day assignment
Reported times ranging from triple to one-third
If you are finding yourself spending more time, please see me
Looking for
Good names for variables and fields
More attention to modular reasoning
Simplify!
Many people attempted extra credit with multiple image transformations. Few realized that only two questions matter:
Are the dimensions of the target image the same as the source image, or are they equal to the source image with the width and height swapped?
Given a pixel in the source image at (i, j), to what location should it be copied in the target image?
Q: what's the simplest way to represent the answers?
No new homework for a week!
Friday's lab is cancelled.
New homework and new topic one week from today.
Most students find my exams difficult
Two reasons:
Bad: I have a bad tendency to ask too many questions.
Good: I mostly ask questions you don't know the answers to.
Why?
Not interested in knowing whether you can reproduce something you've already seen.
Interested in knowing if you can apply knowledge to new problems and new situations.
How to prepare:
Revisit all homeworks and their solutions
Revisit class exercises
Labs may also be helpful, but labs tend to be a focused exercise on part of the homework
Make a crib sheet with notes on both sides (maximum size 8.5x11, font or handwriting as small as you like)
How to take the exam:
Write something for every question
If you need to make up syntax or a formula, go ahead and be imaginative; just state your assumptions
Don't expect to get all the points; plenty of A grades leave lots of points on the table
Most important concept: normalized numbers
Anatomy of a floating-point number:
Sign bit
Exponent (offset coded)
Significand
Formula;
x = +/- m * b^n where n typically is exp - k
Q: What happens if we subtract 1 from n and multiply S times b?
x = +/- (b*m) * b^(n-1)
To avoid this problem, we have normalized numbers:
Examples:
100 is +100e0 (not normalized)
100 is +1.0e2 (normalized)
Q: Can every number be normalized?
Examples base 10 -- 3 digit significand
x = 10.1 normalized 1.01e1
y = 9.93 normalized 9.93e0
z = .707 normalized 7.07e-1
Typical coding exponent 10^n where n = exp - 5 and exp is one
digit
Range from 9.99e4 to 1.00e-4
Exp = 0 does not code 1e-5; instead it signals a denormalized number.
Q: Are there nonzero numbers that can't be normalized?
Q: Which ones can be represented as denormalized numbers?
Zero is a denormalized number
Decent, self-respecting laws do not hold
x - y == x does not imply y == 0
x squared - y squared does not equal (x-y)*(x+y)
(x+y)+z does not equal x+(y+z)
Let's see why!
Before two numbers can be added or subtracted, exponents must match
Example real arithmetic: 10.1 - 9.93 = 0.17
But
10.1 = 1.01e1 --> 1.01e1
- 9.93 = -9.93e0 --> - 0.99e1
------------
0.02e1 --> 2.00e-3 = 0.20
Yow! 0.17 != .20, and result is wrong in every digit.
Add a guard digit and the result is exact* (demo).
Leads to the cardinal rule of the IEEE floating-point standard
*Every operation must perform as if done by exact real arithmetic and then rounded to an adjacent representible number.
Four rounding modes:
Round toward plus infinity (round up)
Round toward minus infinity (round down)
Round toward zero (truncate)
Round to nearest
This is a gold standard (no hardware could possibly do better).
And yet errors accumulate!
Take-home messages:
One operation is "exact then round"
Result is answer + eps ("error term"; "rounding error"; "quantization error")
Further computation with rounded results increases error
Extreme example: (x + eps) - (x + eps') = eps - eps'
Cancellation (here, catastrophic---only error is left)
Subtracting rounded results can leave only error!
x = 1.01e-2 x^2 = 1.02e-4
y = 9.93e-3 y^2 = 9.86e-5
x - y = 1.70e-4
x + y = 2.00e-2
x^2 - y^2 = 9.21e-4
(x-y)*(x+y) = 3.40e-6
Exact answer: 3.4051e-6
Exact answer, rounded: 3.41e-6
Minimum exp holds much more than zero:
"Denormalized numbers"
Plus and minus infinity
NaN (not a number) [Example: sqrt(-1.0)]
Things to know:
After rounding, == is nearly useless
NaNs don't compare with numbers, so no "law of excluded middle":
x < NaN? No!
x > NaN? No!
x == NaN? No!
In fact x == x is not true if x is NaN!
Beware cancellation!
For the deeper mysteries, let Dave Goldberg be your guide.
Compiling with -g -O0:
report40 demo-req*-bm.lua40
cp /comp/40/require/tests/locality/required-rot90-bm/rand105x104.0 /tmp/box.pbm
ddd --debugger "gdb -d /comp/40/cii" ppmtrans-block
Demo:
set breakpoint in main
single step
display expressions, e.g., argv[i]
continue
Parameter passing
DDD's machine-code window
Defusing a binary bomb
string comparison
read integers with scanf, examine properties
structs linked with pointers
Geometrically decreasing scale of points for each stage; first three stages will be worth about 70% of the credit.
Translation examples:
Parameter registers:
Integral scalars: %rdi, %rsi, %rdx, %rcx, %r8, and %r9
64-bit pointers or integers
32-bit come in %edi and so on
Floating-point scalers: %xmm0 .. %xmm7
How to analyze:
Is the value used before it is written?
Infer number and types of parameters
All decision making funnelled through condition codes, which Intel calls flags.
Mutated by a cmp or test instruction
Observed by a Jcc or cmov instruction
Flags are set "for free" by many computational instructions --- why?
Four basic tools:
Label (assembly) or address (machine) of instruction---target of a branch
L:
Unconditional branch (jump) to a labelled address
goto L;
Conditional branch depending on flags (they code for conditions,
hence "condition codes"). Combined with test or cmp we have
if (x == y) goto L;
if (x < y) goto L;
And so on for eq, ne, lt, le, gt, ge,
ltu, leu, gtu, geu. Also jo, jno, jc, jnc, others.
N.B. It is easy to negate the condition without introducing new instructions!
Computed branch jumps to an address in a register
j %rsi
Inexpressible in C but supported as a GCC extension.
Basic conditional:
L':
Short-circuit Booleans:
if (C1 && C2) goto L;
becomes
if (!C1) goto Lnot;
if (C2) goto L;
Lnot:
Short-circuit Booleans:
if (C1 || C2) goto L;
becomes what?
How about
while (C) S;
Try Forrest Baskett!
goto Ltest;
Lhead:
S
Ltest:
if (C) goto Lhead;
And also
for (exp1; C; exp2) S;
do S while(C);
Finally
switch (exp) {
case 0: SS0;
case 1: SS1;
case 2: SS2;
case 3: SS3:
default: SSd;
}
...
Uses a jump table (Pidgin C):
static code *jump_table[4] = { L0, L1, L2, L3 }; // addresses
t = exp
if ((unsigned)t < 4) {
t = jump_table[t];
goto t;
} else
goto Ld;
L0: SS0;
L1: SS1;
L2: SS2;
L3: SS3:
Ld: SSd;
Lend: ...
And break becomes goto Lend
Instructions set flags
test
cmp
fcom
[u]comis[sd]
Decisions read flags:
jcc (je, jl, jae)
cmov (cmovnz, ...)
Q: If the field in %rdi that is 4 bits wide and has least
significant bit 4 is not zero, explode the bomb. What is the code?
Higher-level control flow is built from conditional and unconditional branch
How about
while (C) S;
Try Forrest Baskett!
goto Ltest;
Lhead:
S
Ltest:
if (C) goto Lhead;
The target is not coded into the instruction:
jmp *(%rax)
jmp *0x40b980(,%rax,8)
And what else?
In the beginning was FORTRAN. No recursion, no stack!
No excuse for a recursive factorial function, except it makes a good example:
int fact(int n) {
if (n == 0)
return 1;
else
return n * fact(n-1);
}
Q: When fact(4) is calling fact(3), where is n-1?
Q: OK, then where is n?
Q: Where was n on procedure entry?
Q: We have a problem. What's the general shape of the solution?
Attack from a fully general point of view (timeless truth):
Suppose you are computing factorial, or unblackedges, or the binary bomb, or anything else. How many procedures actually are executing in the processor at one time?
What are all the other procedures doing?
Here's the life cycle of a procedure:
The procedure is born when it is first calls.
It is then alive or active.
It may call other procedures.
It runs until it returns or an exception is raised.
Then it's dead.
What's wrong with this picture? Is it adequate to describe factorial?
Suppose we let the program run, and we suddenly freeze it at an arbitrary point:
What is the oldest procedure activation in the system (that we know about)?
What is the youngest activation?
What activations are using the processor and its registers?
What are all the other activations doing?
Refine our idea of state:
Activation may be running
Activation may be suspended at a call site
If f() calls g() calls h(), how are their lifetimes related?
Is there any way that an activation of g() can outlive an activation
of h()?
Yes in other languages, but no in C.
This is why 'return address of a local variable' leads to faults---you are returning a pointer to a dead container.
Therefore, what is the right data structure for activations?
Every callee is younger than the procedure that called it.
Every procedure, when it returns or otherwise dies, is the youngest activation in the system.
The oldest activation is main
Structure of language dictates stack implementation, not the other way around.
Conventional contract or agreement:
All procedures
Operating system!
Focus on two procedures: caller, callee
Conventions for
Passing values and results
Representation and invariants of the stack
Who has what rights to what registers?
Universal truths:
Dedicated stack pointer
Some parameters (maybe none) passed in registers
Remaining parameters passed on stack
Return address may be in register or on the stack (just a special parameter)
Invariants of stack alignment and contents at a call site
Some registers may be volatile (changed during a call)
Some registers may be involatile (preserved across calls)
Problem:
n and the return
address). Where do you put them?Problem:
Bonus
More universal truths:
Allocation by moving the stack pointer toward the young end of the stack
Deallocation by moving the stack pointer toward the old end of the stack
Two parties may allocate or deallocate: caller, callee
Problem:
Truths of C programs:
Caller allocates and deallocates space for parameters and results
Callee allocates and dealloctes space for
local variables
saved registers
A Turing-complete machine in 14 instructions
Why the Universal machine?
Learn to distinguish binary machine code from assembly code
Learn how a sensible binary instruction set works
Learn what the assembler and linker do
(Develop a complete if shallow picture of what happens from source code to execution.)
Will write a little assembly code if time permits
Finally, processor emulation interesting in its own right
A 32-bit machine:
A segmented memory architecture:
Most resembles the Intel 8086 (first IBM PC)
Access to any memory element requires two registers:
"segment" register (called "array") by UM specification
offset within the segment
Segmentation is out of fashion
Word-oriented, RISC instruction set architecture
No byte instructions; smallest addressable unit is the word (Like CRAY-1 supercomputer)
All computation in registers, pure load/store (RISC)
Every instruction is 32 bits wide (RISC) --- can find "next instruction" without decoding current instruction
Only two binary machine-code formats (RISC) --- very fast instruction decoding
Computational capabilities deliberately limited:
Simpler to implement and debug
Limitations can be worked around by a compiler
Purpose is partly instructional, e.g., learn how to implement subtraction in terms of other operations
No floating-point operations
Instruction set
Computation: Add, multiply, divide, NAND
Data movement: load and store from segment ("array index", "array update"), load value
Control flow: computed goto ("run instruction in segment")
Decision-making: conditional move
Create and destroy segment (activate/inactivate array)
Input and output
Halt
Keep representation of machine state:
Contents of 8 registers
Contents and sizes of 1 or more segments (arrays); segment 0 is always the program
Value of the program counter
Interface: given an initial 'array 0' (binary program), run until
reaching a halt instruction, then return.
Asking for a more modular design:
UM execution should be a standalone component with its own API
UM loading (from a file) should be a standalone component
main() should be almost empty---just stitch together
The biggest design task is simulating segments
What's wrong with this code?
if (n <= max)
return true;
else
return false;
What can you tell me about this value?
uint64_t word = ...;
return (word << (64-w)) >> w;
Which of these is guaranteed to do a 64-bit shift?
1 << w
1l << w
1ul << w
(uint64_t)1 << w
Best of all:
static const uint64_t one = 1, zero = 0;
Suggested procedure:
Work with your programming partner
When session is over, print out code
Each of you can review code on printout
At next meeting, discuss proposed revisions and simplifications
Implement the best ones
Example: Bitpack_newu in 21 lines:
uint64_t Bitpack_newu(uint64_t word, unsigned width,
unsigned lsb, uint64_t value) {
if (width + lsb > 64) {
RAISE(Bitpack_Overflow);
}
if(!Bitpack_fitsu(value, width)) {
RAISE(Bitpack_Overflow);
}
uint64_t one = 1;
uint64_t filter = shift_left64(one, width) - 1;
filter = shift_left64(filter, lsb);
uint64_t result = word & (~filter);
uint64_t insert = shift_left64(value, lsb);
result = result | insert;
return result;
}
Simplified version in 9 lines (assuming global static uint64_t one = 1):
uint64_t Bitpack_newu(uint64_t word, unsigned width,
unsigned lsb, uint64_t value) {
if (width + lsb > 64 || !Bitpack_fitsu(value, width))
RAISE(Bitpack_Overflow);
uint64_t mask = shift_left64(one, width) - 1;
mask = ~shift_left64(mask, lsb);
return (word & mask) | shift_left64(value, lsb);
}
Factors of two matter!!
Typical error on arith:
All information poured into README, with too much detail (cognitive overload)
6-line header comment for a 4-line function
Undocumented representations
Undocumented private functions
Here is what I expect for the Universal Machine:
README to document the architecture of the system. Each interface may get a short description, but no more.
Rule of thumb: data types may be mentioned (in the world of ideas) but functions should not be mentioned.
Each interface to be documented with
Overall header comment briefly explaining
The general purpose (in many cases one line is enough)
The declared data types and what they stand for in the world of ideas
General knowledge needed to understand and use the interface (e.g., references to man pages or other info), if any
Well-chosen names!! (critical form of documentation)
Short documentation for each function. Aim for about one line per function---but recognize that sometimes it is clearer to document a small group of functions and let the names and types speak for themselves.
Don't forget to document the element types of arrays!!
Each implementation to be documented with
Descriptions for each declared type and global variable. What does it represent in the world of ideas? What are its invariants?
Descriptions of any functions private to the implementation:
These functions should have prototypes and be declared
static
The prototypes should be documented in much the same way as the function prototypes in a .h file
Bodies of functions should be documented only where code is tricky, long, or difficult to understand. Most of you are getting this part right already.
Overview:
Phase diagram for C:
Representations: C, IR, Assembly, .o, a.out
Phases: compilation (front and back ends), assembly, linking, execution
For UM, just two phases:
Much simpler modular structure than image compression
Existing modules including Bitpack, Seq, Array, Table,
and so on. Ample choices available.
A module to manage UM memory access
A module to run UM codes (execution phase)
A module to create UM codes for unit testing (assembly-linking phase)
Some main functions
Documentation:
Stratify into three layers: architecture, interfaces, implementations.
Internal functions should be static and be documented like
interface functions, but not inline without good reason.
(Usually the compiler makes good inlining decisions on static
functions.)
Don't forget to try compiling with both -O1 and -O2 to see
about your performance target---mention results in README
Code:
Notice that the entire UM execution phase sits in just two modules: the UM run module and the array module. This means you have a lot of freedom to change design decisions within modules.
An important aspect of modular reasoning is to identify design decisions that are likely to change. Many of you have already done so, especially around UM arrays. I want those decisions hidden behind a layer of abstraction.
Otherwise, I'm mainly looking for good names:
There are a lot of numbers in binary code. The number 3 means something different depending on whether it occurs in opcode position, in register position, or in value position (in the Load Value) instruction. I expect you to use distinct names in your code. Where you need to, define enumeration types.
Make sure your naming conventions distinguish the name of a container (register 3) from the contents of that container (the integer 17, a 32-bit array identifier).
If you choose to define auxiliary functions, remember modular reasoning, and be sure that you separate these two concerns:
Instruction decoding
How the segmented memory unit works
For example, if you have an auxiliary function that finds a
memory location, its parameters should be called segment
register and offset register, not rB and rC. In fact,
the load and store instructions (Array Index and Array Update)
don't even use the same registers!
Remember Monday's exercise on refactoring: be judicious when naming intermediate computations, and choose names wisely.
Performance alert: malloc without free
Run under valgrind for 20 seconds, control-C, check total number of bytes allocated
Run again this time for 40 seconds, control C, check total number of bytes allocated
The sandmark benchmark runs in 5.5 billion instructions, with
91M malloc()s for a total of 1.9GB allocated, but there are also
91M free()s, and there is never
more than 4MB live at once, so it can comfortably fit in L2 cache.
Friday's lab, backpatching, and the um-test binary
Implementing the UM's segmented memory architecture with its 32-bit array identifiers
Other issues
World of ideas:
Q: An active 32-bit identifier is associated with what?
We have a bijection between active identifiers and sequences
Q: How is the bijection used and changed by the instruction set?
Q: When an identifier makes the transition from active to inactive, what happens to its sequence?
World of implementation
How is a 32-bit identifier represented?
How can we distinguish active from inactive identifiers?
How is a sequence of mutable 32-bit locations represented?
To what degree are you required to expose the state of the Universal Machine?
How does the answer to this question give you freedom?
What other designs can you imagine where state would be exposed?
Debugging
Single stepping
'core dump'
The performance killers
Assembly code vs machine code
malloc/free vs new/delete (no "constructors")
Hanson's NEW and FREE; use of sizeof
Using function pointers and void *.
The implemntation of Array, how it maps to bytes and words. Why put and get are truly redundant. What would be a better interface? What would be a better implementation?
Never give up performance accidentally!
Bytes and words
Intel Core 2—words on steroids:
Native 64-bit words
Legacy 32-bit words
Double legacy 16-bit words!!
C: a high level language near the bytes and words
C++ to C to machine data
Is a function data?
Pecularities of C:
Incomplete struct
Need for typedef