Table of Contents
These instructions are somewhat long, in part because there are a lot of details you'll eventually have to get right. We suggest you set aside some time to read through them casually first, to get a general idea of the scope of the project. Then go back, and try to note the key steps you'll have to take, and maybe also note things that don't make sense. With that done, you and your partner should get together to compare what each of you has learned, and to start making a plan. Also: remember that Piazza is the place to ask questions if you are confused.
Goals
In this programming assignment, which is the first really significant one of the term, you will deeply explore the use of the end-to-end principle, and the design of packet-based protocols. Your goal will be to build a system that copies a directory full of files from a client on one machine to a server on another. Your focus here will not be to build the most optimized and efficient protocol, but rather to cleanly separate end-to-end recovery logic from other lower-level error recovery, and to build a system that successfully copies the files. You will also have the opportunity to explore idempotence, text vs. binary formats (which we will discuss in detail later), and other key concepts we've discussed in the course.
Your submission will consist of a client program, a server program, a Makefile (which can be adapted from the one supplied), and an HTML file explaining your project, documenting its protocols and design tradeoffs and answering questions set out in the report template.
This is not an easy assignment, and complete success is not required for a good grade! There is a section below outlining Success criteria. In your report, you will indicate how far you think you've gotten, and that will guide our review of your submission. For example, we will ask you which "nastiness" levels you think are appropriate for testing your code, and why, and we will test it accordingly: if you know your code doesn't work for network-nastiness > 1, then there's no sense in our trying to test it beyond that.
To complete this asssignment, you will work in teams of two, doing pair programming. Please be sure you are familiar with the course rules on pair programming: in short you do not split the work; both partners must be present (in person or via video) when design decisions are made, when coding or debugging is done, and when the report is being written. Both must contribute to all phases, and both must agree to any decisions before they are implemented. The same grade will be awarded to both partners. (Exceptions allowing you to work alone will be very rare, and must be approved in advance by the professor.)
Overview
Your specific task will be to write a UDP-based client and a corresponding
server program that will copy all the files
in a source directory on the client machine to a target directory
on the server machine. (*See note below).
Unfortunately, this will be greatly complicated by two types of challenges,
one of which you dealt with in the pingtest
program:
- All your network traffic must be sent using the same
c150nastydgmsocket
class that you used in thepingtest
programs (note that in the ping programs, the nasty version is used only in the server; please use the nasty version for both client and server in this assignment, and run with the same nastiness levels on client and server). - All your file reads and writes will be done with a class you
haven't yet used:
c150nastyfile
, which is documented inc150nastyfile.h
. Its API is almost identical to the usual Unix fopen/fread/fwrite/fseek, etc. but you can guess the twist: you give the constructor a "nastiness" level that may cause it to sometimes give you erroneous data when reading file data, or to corrupt the file when writing or closing. Note that a sample program callednastyfile
is provided for you, with source, in the FileCopy project directory. If you take a look at that, and try commands likeman fread
, you will quickly see that the class is just a thin (if occasionally malicious) wrapper over well-documented Unix functions.
To make sure that you don't declare success on any files
that aren't properly copied, you will implement an end-to-end checking
protocol that reads each file back from the disk after it's copied, and
sends your choice of the entire file, or a sha1
hash code back to the client. The client then verifies whether the file has been
correctly copied.
To learn about sha1, look it up on the Internet and also see the sha1test.cpp
sample that's provided for you (details below). You may steal code from the sample
(don't forget the header files you need to include, and be sure to
get the -lssl
and -lcrypto
switches into your Makefile
,
or you'll spend a lot of time wondering why your builds don't work!)
There's still a problem though. After the above steps,
the client knows whether the copy
has succeeded, but the server doesn't. Therefore, when you first write
each file in the target directory,
you will do what many network file copy programs do: you
will store it under a temporary name like "filename.TMP", I.e. with .TMP appended to the name of the temporary copy.
When the client discovers that the copy operation has succeeded, it must tell the server to rename its file to the proper name; optionally,
you can also try to alert the server in the case of failure, to clean up its
temporary file.
(Renaming files can be done with a rename
system call;
look it up in Kerrisk, or else
do a man rename
.)
Note that this design gives you an interesting and important invariant: Except for files with the .TMP suffix, every file at the server is known to be a correct copy of the file at the source. Many "real world" systems uses similar tricks to ensure that users are never trusting incomplete or incorrect file data (many editors write to temporary files first, then rename once the entire file is safely written). Also note that in modern Linux systems rename
is an atomic operation; if there's a crash right while rename
is running, then after restart you will either find the old name or the new, but never both.
There are other protocol variations you could use, e.g. sending sha codes to the server and checking there, but then you'd need to ensure that the server waits to be sure the client knows about success. The rest of your protocol is likely driven by the client, and in such cases it's usually easier to have all of it driven by the client. That's why this design is suggested, even though it may add a bit of overhead.
The result of all the above trickery is that if your end-to-end checking is good, then it will never incorrectly claim to have successfully copied a file! This is the most important part of the assignment. Occasionally failing to successfully copy a file in the face of significant nastiness may impact your grade a little. Claiming even once that you successfully copied a file when you didn't shows us that your end-to-end check is faulty, and that will cost you significantly more.
So,we'll be asking you to implement and test your end-to-end check first: that submission is due on Tuesday October 08. Note that you can do this without any network file copy code at all. Just preload your target directory with files that are either good or bad copies of the source files, and make sure your end-to-end check can successfully detect any corrupt or missing files. Also at that time we ask you to submit a very preliminary design document, outlining your plans for eventually doing the file copying.
Once that's done, you can start with very simple, dumb copying algorithms. They may not succeed when nastiness is high, but at least you'll know when they failed, and when nastiness is low, they may work anyway. That's the essence of the end-to-end principle in action! We separated checking for success from using sophisticated techniques to improve the chances of success. As an example of a simple strategy, you can blindly send the data to the server, hope it gets there, and rely on the end-to-end check to have you redo the whole thing if necessary. Of course, if the nastiness gets high enough, then it might be a very, very long time before the whole file makes it successfully. As you proceed, you can make your packet protocols increasingly sophisticated in an effort to get more files copied in the face of higher levels of disk and network nastiness. That said, you won't come up with a good design for high nastiness by incremental "hacking". Quite early in your work, after you've done some successful experiments with low nastiness, you should try to come with a clean, well organized approach to efficiently handling both disk and file nastiness. Your final submission including your filecopy code and a detailed report is due on Tuesday October 15.Success criteria
Again, the main points of the project are to get a feel for how end-to-end checking can organize your whole approach to reliability, and to give you a feel for the challenges of designing protocols using unreliable datagrams.
Success in this project is a matter of degree. Showing that your end-to-end check works is the bare minimum for a passing grade. Your grade will improve somewhat if you implement a simple file copy protocol that copies files successfully in the case where nastines=0 (no disk or network errors), and that might succeed sometimes on small files even when nastiness>0. Improving your grade further depends on two related factors:
- Improving your protocol to succeed with higher levels of disk and network nastiness
- Showing in your report that you understand the relationship between the design choices you make and the correctness of your results (correctly reporting that you have failed to copy a file is a correct answer, and indeed it may be the only practical answer when the nastiness is high -- most real systems abort when error rates get sufficiently high)
Don't under-estimate the second factor. Understanding and explaining the choices you make in this project is as important as writing code that runs. Furthermore, it can be almost impossible to figure out how a protocol works by reading just the code, so we will be depending on your written explanation of what you've done as a guide to our grading of the code too.
What to do
Here is a summary of what you will do. Although these steps are outlined in order and should be coded and tested incrementally, it's also essential that you start thinking ahead. The two protocols you'll design, I.e. the end-to-end check and the file copy, are the two hardest parts of this assignment, and they'll ultimately have to work together: file copy packets that are delayed in the network may show up in the middle of your end-to-end check once you turn up the nastiness. Stray packets from an earlier file copy may show up during a later one. You'll probably want to start thinking about how you'll handle tough challenges like that, while coding the easier versions that don't deal with high nastiness levels.
You should do the following roughly in order. Remember, you can get a decent if not spectacular grade by merely copying the directory with nastiness=0 or nastiness=1. You are strongly urged to get that much working first using a straightforward protocol, while thinking in advance about the changes that might deal with higher error rates. Handling the higher nastiness levels is very tricky, and if you don't create (and keep!) a version of the code that handles the easier cases, you risk having nothing running when the assignment is due! (git is a great tool for this if you know how to use it, or if not, just make a habit of copying your entire development directory into a dated backup copy every few hours, and especially when you've reached useful milestones. If you're not using git, which has commit messages, we suggest you leave yourself a note in each backup indicating what level of function provides. That way, if you get in trouble, it will be easy to go back to something that works.)
Getting Started
- Install a copy of the
FileCopy
project directory, including sample code and Makefile, just as you did for theping
projects. You'll probably do this with a command like:cd <yourParentDir> cp -R /comp/117/files/FileCopy/ .
- Make sure your environment variable
COMP117
is set to/comp/117
. (See course info page.) - Unlike the
ping
assignment, there is no single base file that you will start with, but you will find that thepingserver.cpp
andpingclient.cpp
are useful for getting some include files right, as a guide to c150nastydgmsocket, and as a staring point for command line parsing for network nastiness, etc. If you take code fromping
then you are responsible for updating the comments to be accurate for your submission. (You don't need to be quite as detailed, but you shouldn't have stray comments referring to ping's function.) - Read, understand and try the
nastyfiletest.cpp
program. Not only does it explain the c150nastyfile class you'll be using, it's got useful code you can steal for iterating through directories at the client. Furthermore, as you'll see in a moment, it can be useful for creating TARGET directories you can use to test your end-to-end protocol before you've implemented file copy over the network.
Implementing end-to-end checks
There's a nice trick that makes it easy to try out your end-to-end checks
without implementing file copy code at all: our servers and clients share
a filesystem!
So, for this first step, your program won't copy files at all!
Rather, you will use the nastyfiletest
program
to fill the TARGET directory for you. Depending on the nastiness,
it will either make clean copies, or it will include some
files with errors, and your end-to-end code should catch those.
Specifically, you should:
- Using a combination of
pingclient.cpp
,pingserver.cpp
andnastyfiletest.cpp
as a starting point (you may freely steal code from them) create a skeleton of a client and a server. The command line arguments should be as documented below. - Be sure to copy the code from the start of nastyfile.cpp that says:
// // DO THIS FIRST OR YOUR ASSIGNMENT WON'T BE GRADED! // GRADEME(argc, argv);
Put that in each of your programs. The purpose speaks for itself :-). Make sure you#include "c150grading.h"
too, or this won't work. Also: you will note that the grading framework may sometimes be writing GRADELOG files to your current directory. You can delete these. - As you go through the steps below, be sure to put entries in the GRADELOG as specified in What to put in the grading logs.
Even though you are only implementing the end-to-end check, you must put
in all pertinent messages. For example, when the client starts working on a file
it must log the message:
File: <name>, beginning transmission, attempt <attempt>
. For this end-to-end check, your attempt number will always be 1, but later when you actually do retries the attempt number may increment. - The client should loop through all the filenames in the source directory named on its command line. For each one, it should use its end-to-end protocol to tell the server that a check is necessary.
- The server should do its part of the end of the end-to-end check, and inform the client. (Exactly how to split the checking between server and client is a design decision you will have to make.)
- The client should then confirm to the server that it knows about the success (or failure), at which point the server should indicate in its output the name of the file, and whether there was success or failure. Be sure to write your log GRADELOG entries as specified in What to put in the grading logs. (This is the point where, in later versions, the server will either rename or delete the file).
- The server should acknowledge to the client, and if the client times out waiting, it should resend the confirmation. (The confirmation is idempotent!). Of course, that means the server needs to quietly flush any duplicate confirmations it might get too. The client should retry several times, and if none of those are acknowledged, it should declare the network down and give up.
- After a directory is successfully checked, the client should exit.
- The server should then wait for a request from the client to check another directory
- As described below, your submitted
end-to-end
programs will take the same command line arguments as for your finalfilecopy
submission, including specifications for file and network nastiness. In preparation for your finalfilecopy
you should pass these parameters on to the appropriate socket and and nastyfile constructors, which means you can test yourend-to-end
submission with nastiness > 0 if you like. Don't panic if you can only get it to work with nastiness==0 for this submission, but plan on improving it later. Your grade will be based on the nastiness handled by your ultimatefilecopy
submission.
Note that Linux comes with cmp
and diff
programs you can use to see for yourself which files are clean and which are
corrupt, so that makes it easy to see if your end-to-end checks agree.
Your first submission will consist of the end-to-end check code described in this section, along with a design document for your file copy protocol.
Your program's command line arguments
Your programs should have the following names, and take the following arguments:
fileclient <server> <networknastiness> <filenastiness> <srcdir> fileserver <networknastiness> <filenastiness> <targetdir>
Among the tests we will likely run on your programs is::
fileclient <sever> <networknastiness> <filenastiness> /comp/117/files/FileCopy/SRC fileserver <networknastiness> <filenastiness> TARGET
SRC
is provided to you, and it's pre-stocked with files
for you to copy.
For the final filecopy
submission, TARGET
should start out empty before each test, and will
hold the results written by your server.
Of course, you should run other tests too, but the supplied SRC directory is a good one to start with.
For your initial end-to-end
submission, the commands, arguments
and SRC
directories are the same as for the filecopy submission,
but as described above in Implementing end-to-end checks
you should pre-populate the TARGET
directory either
with a correct copy of SRC
, or with your choice of missing
and/or corrupted files.
In your report, you will tell us the nastiness levels we should use and how much function you believe is working (if you've only manged to do end-to-end checks and no file copying, you will need to give us a TARGET.TST directory with the nasty'd files you've used for checking).
Implementing a simple file copy protocol
If the above is all you do, you'll get a passing grade. Next it's time to improve that, by doing your own file copies through the network. Again, we strongly urge you to start with a simple protocol that will work with nastiness=0. Getting that to work will raise your grade very significantly, and building protocols to run over lossy networks is very tricky.
- Save a working copy of your code from the steps above! If you don't get file copies working with nastiness>0, then it's your only proof that you got end-to-end checks to work (since they never fail unless nastiness is turned on)
- Modify your code to include a simple file copy protocol, and use it to try to copy the SRC directory to the TARGET,
with both disk and network nastiness set to zero. There's no guarantee packets won't be lost on the network anyway, but in practice on our virtual servers
packet loss is rare unless you let your client send huge amounts of
data faster than the server can read it.
What's a simple protocol? Well, you could blindly send all the data to the server and not even check whether the packets arrived. Your end-to-end check will tell you later if they did. A slightly more complicated variation would be to do that, but retry the whole file a few times if the end-to-end check fails. That might get you past a little bit of nastiness. Do keep in mind that for very large files the network may drop packets if you send the whole file faster than the server can read it. To handle large files, you may need to find a way to keep the client from getting too far ahead of the server. - In any case, for each file your client must send the filename as well as the data.
- On the server, append
.TMP
to the name of each file you write. That's because it's temporary until the end-to-end check confirms success. - As each file transmission completes, your end-to-end check should
be used to determine success (since nastiness=0, a failure is almost
surely a bug you should fix).
When the client discovers success, it should write on its console:
"File <name> end-to-end check SUCCEEDS -- informing server".
Likewise, if an end-to-end check fails, write on the client either:"File <name> end-to-end check FAILS -- giving up". -or- "File <name> end-to-end check FAILS -- retrying".
depending on whether you are going to retry or not. -
When the server hears that the check was successful, it should rename the
file to drop the
.TMP
suffix, and write on the server console:"File <name> copied successfully".
As noted above, the Unixrename
system call is documented in Kerrisk and inman 2 rename
. - You must not close and reopen the
c150nastydgmsocket
between files; please leave it open until the whole directory is copied (or you give up!) - See Instrumenting your code for grading for instructions on output you must generate in the grading log files.
- The server should then wait for a request from the client to copy another directory
- After a directory is successfully copied, the client should exit.
Improving your protocol
Your grade will improve significantly if you can reliably handle nastiness levels >0. Improve your protocol to recover from errors and to retry. Suggestion: always keep copies of earlier versions that worked. That way, if you get in trouble, you'll have something to hand in.
Hints
We'll add to these as we get questions from students, so check back here often
- Good practice is that UDP packets not be large, and the framework enforces a limit of 512 bytes (the constant MAXDGMSIZE is defined for you). So, there's no way you can put a whole large file in one packet. Indeed, you'll need to leave space for control information, and we're not judging you on efficiency (within reason!). It's up to you, but sending only 256 bytes of file data/packet is fine, or you can go higher and fill any space not needed by your control information.
- c150nastydgmsocket is UDP-based: it will do things like dropping packets (and worse!), but it will not scramble the bits within a given packet. The usual UDP checksums are in place, and if the packet arrives at all, you can assume the bits in it are correct.
- The range of network nastiness is 0..4 and file nastiness is 0..5 The constructors will throw exceptions if you go try a value that is out of range.
- As noted above, the packet queues in the underlying Linux software are not infinitely deep; if one node sends lots of data ahead of what the receiver reads, packets may be dropped even with network nastiness=0.
- File data may be binary (none of our samples are, except for nulls in some cases), and I suggest you send that in the packets just as you read it from the disk. You should decide whether to make the control information in your packets text or binary, but for ease of debugging, I strongly urge you to use text.
- Many programmers don't notice that on Linux and Unix systems,
it's not necessary to create the contents of a particular file "in order".
You can
fseek
to any byte offset in a file, and start writing there. If you later go back and read the file, any earlier parts of the file you have not explicitly written will read back as zeros. Knowing this gives you some interesting options for dealing with packets that arrive out of order: if you like, you can write the data to a file in any order that it arrives. Of course, your end-to-end checks will ultimately catch anything that doesn't match the source file (Interestingly, the end-to-end check will likely not catch the case where you dropped a packet that was writing all zeros...that's ok, because the file still winds up with the right content! That really demonstrates the power of the end-to-end principle; what matters is that the result is correct, not how you got it.) - When going across machine architectures
there can be problems using C/C++ structs to map the information
in your packets. Don't worrry about it for this assignment.
Our virtual servers are the same architecture, so you can use structs
to map out the information in your packets. If it's all character
data, almost all compilers will pack the fields in the obvious way:
in order, with no gaps. You can always compute
(MAXDGMSIZE - sizeof(struct mystruct))
to see how much space is left in your packet. - As noted above, once network nastiness goes up, packets from file transfers may get interspersed with those from end-to-end checks and from other file transfers. To handle network nastiness>1, you will need a strategy for either dropping or sorting out such traffic. You may want to label each packet with some indicator of the copy activity that it goes with. On your sever, you'll have to decide whether to let a 2nd file copy start while the first is still finishing (I.e. because packet reordering caused you to see the "start of 2nd transfer" ahead of the final checking packets for the first), or whether to punt when this happens. Either strategy is OK, but allowing the two to overlap will probably get you success in some cases of high nastiness that punting won't. Implementing overlap is not necessary for a good grade, but do let us know in your report if you did it, and whether you actually saw it happen.
- There is no need to copy subdirectories of SRC. The sample
code in
nastyfiletest.cpp
does a good job of finding the ordinary files and skipping the others; steal that. The sample SRC we provide won't have subdirs, but if you happen to wind up with any, they should be skipped. - Remember that the network framework has debugging hooks you learned to use in the ping assignent. Use them to see what's happening with your packet traffic. Right now, the c150nastyfile class does not have them, but we could probably add them if it's helpful, or you can just add your own debug output to the log for file traffic.
- Makefiles can be useful for testing. For example, you could
implement targets like "testserver12" and "testclient12" in your Makefile that would "depend" on clean builds of your client and do steps like:
- For testserver, remove all files from the TARGET directory
(you don't have to handle that in your C++ code if it's
too much trouble; just put a
rm SRC/
in your makefile if you prefer). - Run the server or client (respectively) with nastiness level you want to test
. This means you'll need different targets for different nastiness,
but you might name them something like
testserver23
to get networknastiness=2 and filenastiness=3. Of course, you'll have to put in the right calls likefileclient 2 3 SRC
to make that happen.
- For testserver, remove all files from the TARGET directory
(you don't have to handle that in your C++ code if it's
too much trouble; just put a
- There are a couple of utility routines declared in
c150utility.h
that may be helpful. Note, this is not a signal that you will likely need these. It depends on how you decide to write your code. They're available if you want to use them :-
void printTimeStamp(ostream& os); void printTimeStamp(string& s);
These are two interfaces to the same underlying code, which will print into the supplied string or stream a timestamp of the same form used in the debug log files. This may be useful to you for your own debugging or logging, or possibly for putting (rather verbose but easily readable) timestamp information in your packets, should you wish to. The first version takes a stream, and the timestamp is appended to the stream; the second version takes a reference to a string, and the timestamp replaces any contents of the string. (I probably should have implemented << on the stream version, but never got around to it.) The source code insrc/c150utility.cpp
also can be used as an example of manipulating time of day information from the operating system, should you wish to. -
string &cleanString(string& s); void cleanString(string::iterator start,string::iterator end );
These again are two interfaces to similar code. There may be times when you wish to print to the screen or a log file a packet that you have received, but which may contain (either because you've decided to send binary data or due to bugs), characters that are not printable. These routines will take a supplied string and, altering it in place, replace any non-printable characters with '.' The first version is the one that most of you would likely use if you need it (which you may not). The second may be handy for those who happen to know about C++ iterators, but otherwise you can ignore that.
-
Instrumenting your code for grading
To help us understand what your program is doing, you MUST include in your program statements to output information into a grading log that will alert us to your program's progress. Doing this should be straightforward and shouldn't take much time. This section gives the specific instructions.
First, as noted above you must copy the code from the start of nastyfile.cpp that says:
// // DO THIS FIRST OR YOUR ASSIGNMENT WON'T BE GRADED! // GRADEME(argc, argv);
into the main program of each of your executables. This will cause the program to create a new GRADELOG_XXXX.txt file each time you run your program. You will not submit these log files, but you are welcome to look at them and check them to help verify your program. We will generate new logs ourselves when we test your code.
By default the grading files will not have much useful, but you MUST instrument your code with additional output statements as described here. Read your GRADELOG files during testing to make sure they're working. Note that grading log filenames include the program name and date, so server and client automatically get their own logs, and successive runs don't wipe out older files. You'll be deleting any you create before submission anyway, so there's no need to keep extra ones around.
Writing to the grading logs
An output stream pointer with the global name GRADING is created for you automatically when you issue the GRADEME call shown above.
As long as you include c150grading.h
in a source file, you can write to the grading log like this:
*GRADING << "The sum of 100 + 20 + 3 is: " << 100+20+3 << endl;
In other words, this is an ordinary C++ ostream and you can do all the usual things with it. Note that GRADING is a pointer so you must write to *GRADING
, including the splat (*).
IMPORTANT: what to put in the grading logs
You MUST use the technique shown above to log significant events
to the grading log. You'll need to put grading log information in
both your client and server.
We need to see each time a client attempts to send a new file,
each time a server starts writing a new file, when transmission
of a file completes at the client, when end to end checks
succeed and fail, etc. Some of these events will be common
to everyone's work, since
everyone must try to send a file somehow.
We've standardized a set logging message formats for these common
events. Please follow the formats when logging those events.
For the client:
Event | Format |
---|---|
When the client starts to send a filename
<name> at attempt #<attempt> (If your protocol does not support retries, then make <attempt> 0.) |
File: <name>, beginning transmission,
attempt <attempt> |
When the client finishes sending all of
filename <name> to the server during attempt
#<attempt> |
File: <name> transmission complete,
waiting for end-to-end check, attempt <attempt> |
When the end-to-end check for the file is
successful |
File: <name> end-to-end check
succeeded, attempt <attempt> |
When the end-to-end check for the file is
unsuccessful |
File: <name> end-to-end check failed,
attempt <attempt> |
And for the server:
Event | Format |
---|---|
When the server starts to receive a filename
<name> |
File: <name> starting to receive file |
When the server has finished receiving the
file and an end-to-end check is starting |
File: <name> received, beginning
end-to-end check |
When the end-to-end check for the file is successful | File: <name> end-to-end check succeeded |
When the end-to-end check for the file is unsuccessful | File: <name> end-to-end check failed |
There will likely be other significant events to log depending on
the protocols
you've designed.
For example, if you happen to use an approach where you resend
just part of a file that's in error, then you should log something
like "File: myfile.txt resending bytes 5000-6000 attempt 2". If
your program does something different, like
giving up on an entire file in case of an error, then log that.
It is also helpful to include information about the type of
end-to-end check you're performing and log something like File:myfile.txtsendingsha1checksumedef9723...af5bc00toclient
(or
"sending whole file to client"
)
In other words, the logs will be our guide to what your program is doing. Try to make sure they tell the story. We'll read your report and your code as a guide and a cross check. When nastiness=0, we'll expect to see just a few lines per file, indicating start/end of transmission, successful end-to-end check etc. When nastiness is higher, we'll expect to see more about your attempts to retransmit or recover.
Guideline: we do not want to see a line in the log for
things like successful transmission of an individual packet or
file block, because that would generate too many entries. We do
want to see when you use strategies for error recovery, e.g. when
you re-attempt sending a file, if you ask for or receive
retransmission of a missing packet, if you do anything particular
to recover disk nastiness, etc.
Typically, except where you are doing lots of error recovery, we
would expect a few lines for each file transmission attempt.Â
The grading logs vs. the debug logs
In case there's confusion: here's how to think about the difference between the debug logs and the grading logs:
- The debug logs are mainly for you. Use them as an aid to debugging your program, or don't use them if they're not helpful. We may turn on debug logs if it helps us figure out what you're doing, but you're under no obligation to reflect any of your application logic in the debug logs.
- The grading logs are mainly for us: we will use them to understand what your program is doing
Of course, you may sometimes want to duplicate information in the two logs, e.g. so you can see significant events in your debug log as they occur: that's up to you. If you do that a lot, you can always make a little helper function that will write the same message to both.
Preparing your report
The report you submit with your project will be as significant for grading as the code itself. In fact, the report will be our guide to understanding your code.
A template for your report is available for download from the links below: the procedure is similar to what
we've been using for question/answer assignments. Download the HTML file and adapt it to include your report.
Please keep the heading information intact, but feel free to adapt the rest of the HTML to your needs, and to
include <style>
tags with CSS at the top if necessary. Remember that the <pre>
HTML tag is very useful for quoting multi-line code fragments (like structs to explain your packet structures).
There are a few questions at the top of the template that you should be sure to answer. These will tell us how far you think you got with the assignment, how you recommend testing your code, etc. In particular be sure to indicate the highest nastiness levels at which your code should succeed in copying the entire directory.
The main part of the report will be an explanation of your approach, what you think you've achieved, and what we should expect when we test your code. We want to know what your code is doing, and we want you to explain why it does or does not work in particular cases. For example, if you know that delayed packets will cause your program to get confused as to which file it's working on, tell us. If you think you're handling that case, explain the technique you're using to keep things straight.
Overall, your report should cover:
- Overview: what did you do? What worked and what didn't?
- Which cases do you think your code handles and why?
- Describe your protocol:
- What is the sequence of packets you send in the normal cases and for recovery?
- What is the structure of your packets? Briefly explain each significant field. You can copy/paste the actual structs if you like, but include enough commentary (in the structs or below it) that we can figure out what's going on.
- What's your approach to dealing with lost packets? Packets that are reordered?
- Are there any invariants that give you confidence in the correctness of your protocol (e.g. "My rename is done after my end-to-end check succeeds, so any TARGET file without a .TMP suffix is correct")
- Do you expect your code to succeed when there are errors reading or writing the disk? What ensures that it will succeed, or why do you think it might not?
- Are there bugs or shortcomings you know about? Are they indicated in comments in the code with NEEDSWORK? (see commenting and code quality)
- What should we look for in the grading logs? Please relate this to your explanation of the protocol you've invented (if you like, you can combine the two, indicating gradelog entries as you explain the protocol, or you can explain the gradelog separately).
- Which cases are you aware of (e.g. high nastiness levels or particular combinations of reordering) that you aren't trying to handle correctly? In such cases, will your code detect the problem and abort (OK) or will it silently produce incorrect results?
- If your code has to give up copying one file, will it go on and try others? This can be a good thing to do. Certain protocols will tend to succeed on short files, but not on longer ones. If this is true of yours, can you explain why? If you have this problem, what file sizes do you handle at which nastiness levels, and how long does it typically take for them to be copied?
- Are there any cases for which your code doesn't do what you expect? Do you have any intuition why that might be?
- What did you learn from this asssignment?
Include anything else that will demonstrate your understanding of this assignment and your results. Your comments on the assignment, and suggestions for future revisions of it are also welcome. Also, please include a statement confirming that both team members were present for (substantially) all coding, and that both worked out the design together (obviously, you can do some individual design work, but it should be roughly balanced, and you and your team mate must make all final decisions together, and with shared understanding.)
Template for report - Download template for report
Submitting your work for grading
The following are the steps you should take to prepare and submit your work. If you are on a team, one student should follow these steps, and the other should follow the instructions under team submission below.
Preparing your work for submission
As described below, you will be making two submissions:
- A preliminary submission demonstrating your end-to-end check and including a design document for your proposed file copy protocol.
- A final submission with your running file copy code, and the report described above.
There are sections below telling you how to do each of these submissions. For both of them, please consider the following checklist before submitting:
- Ensure that all your progams build correctly with
make all
. Our test scripts depend on this. It's also desirable if you update the rule formake clean
to remove any executables you create, temporary data files, etc. The idea ofmake clean
is to put your directory back in a state where the code can be rebuilt from scratch. Then, before submitting, runmake clean
so that you will not be submitting your executables and.o
files. - It's essential that you not be using any private versions of the shared libraries in /comp/117/files/c150Utils.
- Do a "make clean" to remove all object and executable files
- Either delete your SRC and TARGET directories, or ensure that they are empty. WARNING:
to avoid wasting tons of space, provide is set to not
allow >1M submissions for this assignment. If you don't
clean out the SRC directory, your project will not
be accepted! You should also do a
make clean
before issuing provide, as your executables may be large, and we will rebuild them anyway. - Delete any GRADELOGs, debug logs, or other additional files that may have wound up in your directory.
- (Final submission only) Make sure you have followed the instructions in Preparing your report. As with your code, the report must be joint work, with all team members contributing equally.
Submitting your preliminary end-to-end check and design document (due Tuesday October 08)
Approximately one week ahead of the final due date for the project, you must submit the following (see the assignment calendar for the exact due date):
- Code implementing only your end-to-end check, as described above.The server and client command line interface should be as for your final submission. The only differences are:
- You should manually pre-populate the target directory with the files to be checked.
- Your distributed protocol should do the end-to-end checks, but nothing else. Be sure your program clearly reports in the grading log which files passed the check (source and target versions the same) and which didn't (source and target different). You may in addition report this on the console if you like, but be sure to indicate it in the grading log.
- A design document (design.pdf or design.txt) outlining the protocol you intend to use for your file copy in your final submission. You should describe the different packets you will send, and how they will be used by your overall protocol to implement file copying.
It will be fine if you later change your design, but we want you to think about and document a design before you try coding and debugging
One team member from each team should submit the code and design document:
cd <parent-directory>
provide comp117 filecopycheck FileCopy design.pdf <explain.txt>
Note that the submission name for this
preliminary submission is filecopycheck
.
You may submit design.txt
in place
of design.pdf
if you prefer.
FileCopy
must be the name of the directory in which your code is built.
design.pdf
or design.txt
is your design document.
In most cases explain.txt
need not be provided, but this is the place for the submitting team member to give explanations of any personal issues that might need attention (explanations for lateness, illness, etc.)
For this preliminary submission, just note the partner's name and login in the design document. As described below, we use a more formal procedure for the final submission.
Your final submission (due Tuesday October 15)
The same team member who made the preliminary submission should submit the code and your report. Be sure to reread the instructions for the report; it is not the same as your design document (though you are welcome to copy pieces from your earlier design document submission...be sure to use the supplied report template though.)
cd <parent-directory> provide comp117 filecopy FileCopy filecopyreport.html <explain.txt>
FileCopy
must be the name of the directory in which your code is built. filecopyreport.html
is your report. In most cases explain.txt
need not be provided, but this is the place for the submitting team member to give explanations of any personal issues that might need attention (explanations for lateness, illness, etc.) Information related to actually running and grading the submission should be in the report itself.
Instructions for team members
If you are a member of a team, then one of you should submit your complete project as described above. Immediately after that's done, the other should:
provide comp117 filecopy teamreport.txt <explain.txt>
teamreport.txt should be a short text file indicating your name, and your team member's name. It should indicate: "I hereby certify that the submission by <partner's name> on <date> and <time> is our joint submission. Both the code and the report included with that submission are our joint work, and should be the basis for my grade for the file copy assignment."
Again, to emphasize what's stated in the first paragraph of this section: the student who submits the code must not make an additional submission with a teamreport; only the student(s) who do not do the code submissions submit the team report.
If you there is any additional information, e.g. relating to personal issues (illness etc.) then either partner can provide that in an explain.txt file, as usual. As noted above, information related to actually running and grading the submission should be in the report itself.
A note on commenting and code quality
In programs of this complexity it's particularly important that you organize and comment your code so that a grader can figure out how it works. If your code is not pleasant to read, well organized, and reasonably well commented, you will lose credit. Even code that works may not be judged well if we can't easily figure out why it works.
Please be sure to follow the CS 117 coding standards.
Utility programs supplied to you
SHA1 Code
You'll note in the project directory a program named sha1test.cpp
.
We encourage you to look at it, and steal code from it if you like
It provides
SHA1 hashes
that may be useful
in your end-to-end checks.
To try sha1test
pass it a list of filenames and it will compute sha1 hashes for each.
The crucial call that does the checksum looks like this:
char buffer[]; // buffer with your file data int length; // length of your file data unsigned char shaComputedHash[20] // hash goes here SHA1((const unsigned char *)buffer, length, shaComputedHash);
Also, it's very useful for testing to know that you can compute sha1 hashes from the command line using:
openssl dgst -sha1 file [file...]
You do not have to use SHA1 hashes for your end-to-end checks, but
if you decide to, sha1test.cpp
may be a useful guide.
Important: if you are using the supplied SHA1
method shown above, you must link against the
ssl
and crypto
libraries. So, the Makefile
entry for
sha1test
looks like this:
sha1test: sha1test.cpp $(CPP) -o sha1test sha1test.cpp -lssl -lcrypto
Makedatafile
Also in the project directory is a file called makedatafile.cpp
.
I wrote this for my own testing, and it may be useful to you.
It's a simple, not particularly well-written program that takes a filename and a linecount. It creates a file of the given name, with the specified number of lines.
The lines are filled with ascending numbers.
When debugging the file copy protocol, files like this can be useful,
because it's easy to figure out which buffer goes where!
It's provided as-is "without any warranty". Bug reports welcome, as always.
Collaboration
Most of you will work in teams. The rules we will follow will be those set out for pair programming in COMP 40.
That said, this is hard stuff! Distributed programming is tricky: even simple bugs can be difficult to find, and if you fall into the (common) pitfall of designing a messy protocol, things can head downhill fast. So, you should get help when you need it.
Piazza should be your first stop for getting help. As usual, if your question is of general interest, please post publicly. If your question discloses any aspect of your design or design ideas, then please post privately. You are also welcome to mail us design documents or notes for comment, and we will do our best to take a look and help you. Getting reports about what's confusing you also helps us to know what we need to clarify for everyone. When in doubt, send us your questions, and we'll do our best to get you some help.
Our TAs are also available to meet with you and help you. Similarly, you may seek out help from others who are expert in networking who are not currently taking the course. When you get such help, it MUST be acknowledged with your report, and you must explain what kind of help you got. In general, it's appropriate for helpers to point out flaws in your design, and to point you to useful references. It is never appropriate for someone (other than the course staff) to fix your design or code for you.