Comp111 Final Exam
Review Sheet 3
Fall 2005

Open book and notes. No electronics of any kind allowed. Please turn off cellphones, pagers, laptops, radios, etc.

Takehome questions:

  1. (25 points) In class, we described a scheme for representing files in the filesystem based upon two kinds of blocks. A link block consists of an array of pointers to other blocks, with one next pointer that points to the next link block. A data block consists entirely of data. A file is described by one or more link blocks that themselves point to data blocks.

    Give a complete list of advantages and disadvantages of such a scheme for representing files. Consider file sizes, paging, scheduling, and read-only versus read-write access to files in your answer. Hint: in a normal filesystem, a one-byte file occupies one block.
    This is a simple case of trading space for time. If a file is a simple linked list of blocks, it takes n operations to get to the nth block. If it is implemented as in the problem, and there are k links to a link block, then it takes at most n/k accesses to get to the nth block. This means that sequential access of the two kinds of files occurs at roughly the same speed, while random access to blocks in large files is much faster for the new representation than for the naive and simple one. In getting this speed, we pay for it in space. The average file has at least two blocks in it; one link block and one data block. In the simpler representation, there is only one block required for the shortest file.

  2. You have to decide between two options for web service: one machine with a clock rate of 500MHZ or two machines with clock rates of 250MHZ each. The software the machines will run is identical. There is only static content: you are just serving up unchanging files. You may assume that the process of selecting between machines to respond to requests takes negligible time (it usually does). You may assume that the machines have no other tasks other than web service.
    1. (10 points) Estimate which solution will give the faster throughput response time, utilizing M/M/c queueing theory. Assume that the load is the same in either case and that the processing rate μ on a 500 MHZ machine is exactly twice that of μ on a 250 MHZ machine.
      • Plugging in c=2 and μ into the M/M/c equations, where p=λ/μ:
        • S0 is 1/[1+p+p2(1/(1-(1/2)p))].
        • The mean time spent waiting is (1/4μ)(p2)/[(1-(1/2)p)(1 + (1/2)p + (1/2)p2)]
        • The mean time till service is (1/4μ)(p2)/[(1-(1/2)p)(1 + (1/2)p + (1/2)p2)] + 1/μ
      • Plugging in c=1 and 2μ into the M/M/c equations, where p=λ/μ:
        • S0 is 1/[1+(1/2)p(1-(1/2)p)]
        • The mean time spent waiting is (1/(4μ))p
        • The mean time till service is (1/(4μ))p+ 1/(2μ)
      We know that 0<=p<1. If p=0, then response for two servers is 1/μ and response for one server is 1/2μ, which makes the one server faster. For p=1, the response for two servers is 5/4μ while the response for one server is 3/4μ making the one server faster. Other cases are similar. We conclude that one server with twice the service rate is superior to two with a service rate.
      • Perfect: +10
      • Forgot to add time of service: +7
      • Incorrect calculation, correct answer based upon calculation: +7
      • Mistakes in both calculation and conclusion: +5
      • Anything of value: +3.
    2. (15 points) Explain exactly how and why the estimate you just gave is inaccurate. Is it too low, or too high, and why?
      The queueing theory estimate is completely ridiculous, because the process of providing web service is I/O bound rather than CPU-bound. To service a request, one must read the file from disk and then send it. Reading from disk is orders of magnitude slower than the CPU, so that having multiple disks is a definite advantage, and the actual performance of two machines can be close to double of that of one machine, regardless of CPU speed (provided it is not just too slow to cope with I/O). Given the effects of disk page cacheing, double speed is easily approachable.
      • Perfect: +15
      • All true statements, but didn't figure out disk bound: +10
      • Some false and true statements: +5.

    Some students pointed out that "throughput" is the wrong thing to think about, because by the balance equations, throughput is the same by definition. They are correct. The differences are in response time rather than throughput.

Non-takehome questions:

  1. (25 points) In a journalling filesystem, what constraints are there on writing the journal entries to disk? Should I write them as late as possible or as early as possible? Why?
    Since the journal is intended to cope with power failures, it has a different purpose than the main disk. The most common strategy is to flush the journal on a regular schedule, which is more frequent than late flush but less frequent than "as soon as possible". Of "as late" and "as soon" as possible, the better answer is "as soon"; otherwise the journal can't appropriately recover from a power failure.
  2. (25 points) One hot topic in current operating systems is "virtualization" of the operating system. It is possible to run two operating systems on the same computer using, e.g., the commercial VMWare software product or the open-source Xen virtual machine. The most difficult problem in virtualization is how to share physical devices between two operating systems. Explain precisely why sharing a physical device (e.g., a disk or a printer) between two operating systems is problematic.
    The problem is that one must maintain state of the physical device in some way. Having two copies of that state won't work, any more than having two kernel descriptors for the same device will work. In practice, both VMWare and Xen contain device drivers that replace the device drivers in the operating systems it supports, and all device access is virtualized by using the VMWare or Xen driver, thus providing a single point at which state is managed.