This assignment is all about the cache and locality. You'll implement blocked two-dimensional arrays, which you'll then use to evaluate the performance of image rotation using three different array-access patterns with different locality properties. You'll also see how to write code that is polymorphic in an array type.
The assignment has two parallel tracks:
As described in Section [->], in this assignment we provide you with a lot of code and information. It will take time to assimilate. You can get some of the code by running the commands
git clone linux.cs.tufts.edu:/comp/40/git/locality cd localitywhich will create and enter a directory called locality. Do this at the start of the assignment.
In this part of the assignment, you will implement a standard technique for improving locality: blocking. The idea is best expressed in a picture. Here is a 10-by-10 array organized in 4-by-4 blocks:
#ifndef UARRAY2B_INCLUDED #define UARRAY2B_INCLUDED #define T UArray2b_T typedef struct T *T; extern T UArray2b_new (int width, int height, int size, int blocksize); /* new blocked 2d array: blocksize = square root of # of cells in block */ extern T UArray2b_new_64K_block(int width, int height, int size); /* new blocked 2d array: blocksize as large as possible provided block occupies at most 64KB (if possible) */ extern void UArray2b_free (T *array2b); extern int UArray2b_width (T array2b); extern int UArray2b_height(T array2b); extern int UArray2b_size (T array2b); extern int UArray2b_blocksize(T array2b); extern void *UArray2b_at(T array2b, int i, int j); /* return a pointer to the cell in column i, row j; index out of range is a checked run-time error */ extern void UArray2b_map(T array2b, void apply(int i, int j, T array2b, void *elem, void *cl), void *cl); /* visits every cell in one block before moving to another block */ /* it is a checked run-time error to pass a NULL T to any function in this interface */ #undef T #endifInterface for blocked arrays [Figure uarray2b]
Since you have already been through a very similar design exercise, I will not ask you to repeat it. Instead, I am specifying an interface, and I suggest a design which you may use if you wish. The interface you are to implement, to be called UArray2b, appears in figure "uarray2b". The blocksize parameter to UArray2b_new counts the number of cells on one side of a block, so the actual number of cells in a block is blocksize * blocksize. The number of bytes in a block is blocksize * blocksize * size. The blocksize parameter has no effect on semantics, only on performance.
The UArray2b_new_64K_block allows you to default the blocksize; it is similar to UArray2_new. It chooses a blocksize that is as large as possible while still allowing a block to fit in 64KB of RAM. If a single cell will not fit in 64KB, the block size should be 1. On almost any machine built in the last five years, the L1 data cache will hold 128KB of data, so if you create arrays using UArray2b_new_64K_block, you can fit two blocks in cache at one time.
If you wish, you may use your own design and architecture for the implementation of UArray2b, or you may use one of mine described as follows:
Here is a simple architecture for UArray2b. Because of the many layers of abstraction, it does not perform very well, but it is relatively easy to implement.If you implement this design successfully, it is not too difficult to modify the code such that your blocked array is stored in a single, contiguous area of memory. Once you have the address arithmetic right, you can get a substantial speedup by avoiding all the memory references involved in going indirectly through the UArray2 and Array abstractions. But the focus of this assignment is not on performance, and a faster implementation is purely optional.
- An UArray2b_T can be represented as an UArray2_T, each element of which contains one block.
- A block should be represented as a single UArray_T. This representation guarantees that cells in the same block are in nearby memory locations.
- To find the cell at index (i, j), first find the block at index (i / blocksize, j / blocksize). Within that block, use the cell at index blocksize * (i % blocksize) + j % blocksize.
- Your mapping function should visit all the cells of one block before moving onto the cells of the next. Blocks on the bottom and right edges may have unused cells, and your mapping function must not visit these cells.
I have written two solutions to this problem. The one that uses the design sketched above is about 175 lines of C, 50 of which appear at the end of this assignment. I then wrote another, faster solution which is about 130 lines of C. The faster solution has a significantly more complicated invariant and was correspondingly more difficult to get right.
typedef void *A2; // an unknown type that represents a 2D array of 'cells' typedef void A2Methods_Object; // an unknown sequence of bytes in memory // (element of an array) typedef void A2Methods_applyfun(int i, int j, A2 array2, A2Methods_Object *ptr, void *cl); typedef void A2Methods_mapfun(A2 array2, A2Methods_applyfun apply, void *cl); typedef void A2Methods_smallapplyfun(A2Methods_Object *ptr, void *cl); typedef void A2Methods_smallmapfun(A2 a2, A2Methods_smallapplyfun f, void *cl); typedef struct A2Methods_T { // operations on 2D arrays // it is a checked run-time error to pass a NULL 2D array to any function, // and except as noted, a NULL function pointer is an *unchecked* r. e. A2 (*new)(int width, int height, int size); // creates a distinct 2D array of memory cells, each of the given 'size' // each cell is uninitialized // if the array is blocked, uses a default block size A2 (*new_with_blocksize)(int width, int height, int size, int blocksize); // creates a distinct 2D array of memory cells, each of the given 'size' // each cell is uninitialized // if the array is blocked, the block size given is the number of cells // along one side of a block; otherwise 'blocksize' is ignored void (*free)(A2 *array2p); // frees *array2p and overwrites the pointer with NULL // observe properties of the array int (*width) (A2 array2); int (*height) (A2 array2); int (*size) (A2 array2); int (*blocksize)(A2 array2); // for an unblocked array, returns 1 A2Methods_Object *(*at)(A2 array2, int i, int j); // returns a pointer to the object in column i, row j // (checked runtime error if i or j is out of bounds) // mapping functions void (*map_row_major) (A2 array2, A2Methods_applyfun apply, void *cl); void (*map_col_major) (A2 array2, A2Methods_applyfun apply, void *cl); void (*map_block_major)(A2 array2, A2Methods_applyfun apply, void *cl); void (*map_default) (A2 array2, A2Methods_applyfun apply, void *cl); // each mapping function visits every cell in array2, and for each // cell it calls 'apply' with these arguments: // i, the column index of the cell // j, the row index of the cell // array2, the array passed to the mapping function // cell, a pointer to the cell // cl, the closure pointer passed to the mapping function // // These functions differ only in the *order* in which they visit cells: // - row_major visits each row before the next, in order of increasing // row index; within a row, column numbers increase // - col_major visits each column before the next, in order of // increasing column index; within a column, row numbers increase // - block_major visits each block before the next; order of // blocks and order of cells within a block is not specified // - map_default uses a default order that has good locality // // In any record, map_block_major may be NULL provided that // map_row_major and map_col_major are not NULL, and vice versa. // alternative mapping functions that pass only cell pointer and closure void (*small_map_row_major) (A2 a2, A2Methods_smallapplyfun apply, void *cl); void (*small_map_col_major) (A2 a2, A2Methods_smallapplyfun apply, void *cl); void (*small_map_block_major)(A2 a2, A2Methods_smallapplyfun apply, void *cl); void (*small_map_default) (A2 a2, A2Methods_smallapplyfun apply, void *cl); } *A2Methods_T;
Polymorphic interface for manipulating two-dimensional arrays [Figure a2methods]
#include <stdlib.h> #include <a2plain.h> #include "uarray2.h" // define a private version of each function in A2Methods_T that we implement static A2Methods_UArray2 new(int width, int height, int size) { return UArray2_new(...); } static A2Methods_UArray2 new_with_blocksize(int width, int height, int size, int blocksize) { (void) blocksize; return UArray2_new(...); } ... many more private (static) definitions follow ... // now create the private struct containing pointers to the functions static struct A2Methods_T uarray2_methods_plain_struct = { new, new_with_blocksize, ... other functions follow in order, with NULL for those not implemented ... }; // finally the payoff: here is the exported pointer to the struct A2Methods_T uarray2_methods_plain = &uarray2_methods_plain_struct;Boilerplate for implementing a struct pointer of type A2Methods_T [Figure template]
typedef void Array2_apply(int row, int col, void *elem, void *cl); extern void Array2_map_row_major(Array2_T a2, Array2_apply apply, void *cl);
struct a2fun_closure { A2Methods_applyfun *apply; // apply function as known to A2Methods void *cl; // closure that goes with that apply function A2Methods_UArray2 array2; // array being mapped over }; static void apply_a2methods_using_array2_prototype (int row, int col, void *elem, void *cl) { struct a2fun_closure *f = cl; // this is the function/closure originally passed f->apply(col, row, f->array2, elem, f->cl); } static void map_row_major(A2Methods_UArray2 array2, A2Methods_applyfun apply, void *cl) { struct a2fun_closure mycl = { apply, cl, array2 }; Array2_map_row_major(array2, apply_a2methods_using_array2_prototype, &mycl); }Mediating between map/apply functions that use different prototypes [Figure mediate]
You now have two different representations of two-dimensional arrays: UArray2, which supports column-major and row-major mapping, and UArray2b, which supports block-major mapping. In order not to duplicate code, we want to write image rotations that can operate on either kind of array. To achieve this kind of reuse, we resort again to polymorphism: we define an interface A2Methods that can represent either kind of two-dimensional array. You write one image-rotation program against this interface, and you can use it with two implementations.
The A2Methods interface uses the same principles as the declaration of an abstract class in a language like C++, C#, Java, or Smalltalk. Instead of calling functions by name, you will call through pointers to functions. Those pointers live in a method suite of type A2Methods_T, which is a pointer to a record of function pointers (Figure [->]). For each of these function pointers, you will you will need to create a static function that calls into UArray2. To implement your method suite, you put pointers to those functions into a struct. Looking at a2blocked.c will show you a complete example, and in figure "template" we also provide a template for your a2plain.c.
The interface you are to implement is defined in a2plain.h:
You should write file a2plain.c, which implements this interface. It should look something like the template in figure "template". The only tricky bit is resolving differences in your apply functions. Let's suppose that your UArray2 apply and row-major map functions look like this:#include <a2methods.h> extern A2Methods_T uarray2_methods_plain; // functions for normal arrays
This ``inner'' apply function is compatible with UArray2, but it is not the same as the ``outer'' apply function used in the A2Methods interface shown in figure "a2methods". The exported mapping function will receive an ``outer'' apply function that is compatible with the A2Methods interface, and you will have to create an ``inner'' apply function that is compatible with your own personal UArray2 interface.typedef void Array2_apply(int row, int col, void *elem, void *cl); extern void Array2_map_row_major(Array2_T a2, Array2_apply apply, void *cl);
Here is a summary of your obligations for this part:
Using the A2Methods abstraction, implement program ppmtrans, which is modelled on jpegtran and performs some simple image transformations. Program ppmtrans offers a subset of jpegtran's functionality. The image-transformation options you may support are as follows:
-rotate 90 Rotate image 90 degrees clockwise. -rotate 180 Rotate image 180 degrees. -rotate 270 Rotate image 270 degrees clockwise (or 90 ccw). -rotate 0 Leave the image unchanged. -flip horizontal Mirror image horizontally (left-right). -flip vertical Mirror image vertically (top-bottom). -transpose Transpose image (across UL-to-LR axis).You must implement both 90-degree and 180-degree rotations. Other options may be implemented for extra credit; if you choose not to implement them, reject the unimplemented options with a suitable error message written to stderr and a nonzero exit code.
Significant requirements:
-row-major Copy pixels from the source image using map_row_major -col-major Copy pixels from the source image using map_col_major -block-major Copy pixels from the source image using map_block_major
Why this problem is interesting from a cache point of view:
If cells in a row are stored in adjacent memory locations, processing cells in a row has good spatial locality, but it's not clear about processing cells in a column. If cells in a column are stored in adjacent memory locations, processing cells in a column has good spatial locality, but it's not clear about processing cells in a row. In a 90-degree rotation, processing a row in the source image means processing a column in the destination image, and vice versa. Thus, the locality properties of 90-degree rotation are not immediately obvious.In a 180-degree rotation, rows map to rows and columns map to columns. Thus, whatever locality properties are enjoyed by the source-image processing are enjoyed equally by the destination-image processing. If you understand how your data structure works, then, you should find it easier to predict the locality of 180-degree rotation.
In a blocked representation, the mapping of blocks to blocks is not obvious. To understand the locality properties of blocked array processing, you will have to think carefully.
My solution to this problem is about 150 lines of code.
This part of the assignment is to be completed at the same time as your design work for parts A and C. Please estimate the expected cache hit rate for reads of each of the six operations in the table below. Assume that the images being rotated are much too large to fit in the cache.
row-major access | column-major access | blocked access | |
90-degree rotation | |||
180-degree rotation |
Justify your estimates on the grounds of expected cache misses and locality. Your justifications will form a significant fraction of your grade for this part.
Unfortunately, measuring hit rates is not so easy (although valgrind can do a lot). But what we are really interested in is the effect of locality on performance. We are therefore also asking you to predict the relative performance of each algorithm. But do make some simplifying assumptions:
Please submit the expected work per pixel in a table like the following:
Kind of rotation | adds/subs | multiplies | divs/mods | compares | loads | hit rate | stores | hit rate |
180-degree row-major | ||||||||
180-degree column-major | ||||||||
180-degree block-major | ||||||||
90-degree row-major | ||||||||
90-degree column-major | ||||||||
90-degree block-major |
Once you have estimated the expected cost per pixel, please estimate the expected speed of each of the six operations in the table below. Your speed estimate should include the cost of stores as well as the cost of loads.
row-major access | column-major access | blocked access | |
90-degree rotation | |||
180-degree rotation |
To complete this problem successfully, you will need to understand the material presented in class and in Chapter 6 of Bryant and O'Hallaron.
This part of the assignment is to be completed after you have a complete, working implementations of ppmtrans. Please measure the speed of each of the operations in following table:
row-major access | column-major access | blocked access | |
90-degree rotation | |||
180-degree rotation |
In order to see any effects, you must use images that are too large to fit in the cache. Your fastest rotation should take several seconds; if it does not, you need a larger image.
djpeg /comp/40/images/from-wind-cave.jpg | pnmscale 3.5 | /usr/bin/time ./ppmtrans -rotate 90 | display - djpeg /comp/40/images/wind-cave.jpg | pnmscale 1.2 | /usr/bin/time ./ppmtrans -rotate 90 | display -
https://www.eecs.tufts.edu/userguide/forms/quota.phpand list me as faculty sponsor.
This section identifies infrastructure you can use for this assignment.
The interfaces below appear in /comp/40/include. Do not copy these files. You should be able to compile against any of these interfaces by using the option -I/comp/40/include with gcc.
The Pnm interface uses the A2Methods interface.
You should be able to link against our implementation of pnm.h by using the options
-L/comp/40/lib64 -l40locality -lnetpbmwith gcc.
Once you can build a2test, run valgrind ./a2test.
sh compile uarray2b.c # compile just the one .c file sh compile -nolink # compile all .c, but don't link anything sh compile -link a2test # build executable binary a2testYou get all these sources by
git clone linux.cs.tufts.edu:/comp/40/git/localitywhich will create a subdirectory locality. We recommend you begin the assignment by creating a directory using git clone.
What's important about this assignment is how locality stores affects performance, not how to rotate images. We therefore inform you that we believe
Your preliminary submission should include your design work for parts A and C as well as all of part D.
But we expect you to pay special attention to the representation and its invariants. Please be sure your submission explains
Please submit two files:
Submit using submit40-locality-design.
Your implementation, to be submitted using submit40-locality, should include
sh compileencounters no errors and builds two executable binaries: a2test and ppmtrans.
Here are the mistakes most commonly made on this project:
To deal with command-line options in ppmtrans.c, consider the code below. This code does not help you decide if a file has been named on the command line, which determines whether you read from that file or from standard input. To make this decision, you will need to examine the values of i and argc.
#include <stdio.h> #include <string.h> #include <stdlib.h> #include "assert.h" #include "a2methods.h" #include "a2plain.h" #include "a2blocked.h" #include "pnm.h" int main(int argc, char *argv[]) { int rotation = 0; A2Methods_T methods = uarray2_methods_plain; // default to UArray2 methods assert(methods); A2Methods_mapfun *map = methods->map_default; // default to best map assert(map); #define SET_METHODS(METHODS, MAP, WHAT) do { \ methods = (METHODS); \ assert(methods); \ map = methods->MAP; \ if (!map) { \ fprintf(stderr, "%s does not support " WHAT "mapping\n", argv[0]); \ exit(1); \ } \ } while(0) int i; for (i = 1; i < argc; i++) { if (!strcmp(argv[i], "-row-major")) { SET_METHODS(uarray2_methods_plain, map_row_major, "row-major"); } else if (!strcmp(argv[i], "-col-major")) { SET_METHODS(uarray2_methods_plain, map_col_major, "column-major"); } else if (!strcmp(argv[i], "-block-major")) { SET_METHODS(uarray2_methods_blocked, map_block_major, "block-major"); } else if (!strcmp(argv[i], "-rotate")) { assert(i + 1 < argc); char *endptr; rotation = strtol(argv[++i], &endptr, 10); assert(*endptr == '\0'); // parsed all correctly assert(rotation == 0 || rotation == 90 || rotation == 180 || rotation == 270); } else if (*argv[i] == '-') { fprintf(stderr, "%s: unknown option '%s'\n", argv[0], argv[i]); exit(1); } else if (argc - i > 2) { fprintf(stderr, "Usage: %s [-rotate <angle>] " "[-{row,col,block}-major] [filename]\n", argv[0]); exit(1); } else { break; } } ... }