Name: _Answers_Login:_couch_

Comp111 - Operating Systems
Final Exam, Dec 16,19,22, 2011
Open Books and Notes

  1. Suppose you have the following situation as reported by ls -l:
     
    dr-xr-xr-x 1 root  root    4096 Dec 15 13:00 /
    drwx-w---x 1 couch faculty 4096 Dec 15 13:00 /foo
    -rw--w---x 1 cat   student 2965 Dec 15 13:25 /foo/bar
    
    1. (5 points) Can a user in group faculty create a file inside /foo? Why?

      Answer: foo is writeable to group, and has group faculty. The meaning of writeable for a directory is the ability to create and delete files. So yes, faculty can create (and delete) a file inside /foo. Note that execute access is also present for /, which is necessary.
    2. (5 points) Can a user in group student read the file /foo/bar? Why?

      Answer: Moving down the path hierarchy,
      • / is executable by other, and has owner and group root, so it is executable by student, so /foo is visible to group student.
      • /foo is executable by other, and has owner couch and group faculty, so it is executable by group student. Thus a user in group student can access /foo/bar by name.
      • /foo/bar is writeable to group student, but not readable.
      Thus /foo/bar cannot be read by group student.
    3. (5 points) Can couch in group faculty delete /foo/bar? Why?

      Answer: By the same argument as in part a, it can be deleted.
    4. (5 points) Can couch in group faculty list the directory /foo? Why?

      Answer: There is read privilege for owner couch, so yes, listing will work.
  2. (20 points) Virtualization programs such as VMWare player or VirtualBox allow one to execute an operating system as a regular user program inside another operating system. How must an "OS player" program deal with I/O to and from the OS that is now running as a user process? Where is the true driver for each device located? Why?

    Answer: An OS player is itself an (unprivileged) user process in a "host" operating system, that executes a "guest" operating system within the player itself. This is different from a hypervisor situation, in which two operating systems are peers managed by a third overarching but tiny operating system. In the case of a player, there is a strict hierarchy between host, player, and guest. Due to the problem of state maintenance for devices, the real driver for the device must be in the host operating system, and drivers in the guest operating system must talk to the real driver via mechanisms in the player program. Otherwise, there will be confusion between device state in the host operating system and that of the same device in the guest operating system.

    (Some players deal with potential conflicts between host and guest use of devices by assigning devices to the guest or host operating system one at a time. More modern players utilize the hypervisor machine instructions to transparently virtualize I/O in the guest through the driver in the host.)

  3. We have extensively discussed what can and cannot be observed and/or done by a user or a subsystem of the operating system.
    1. (5 points) Explain why it is difficult for a user to directly measure the speed with which a file is written to disk.

      Answer: The only mechanisms the user has for determining such information is to time calls to write and fdatasync. Unfortunately, the times for these calls do not reflect how fast data is written to the disk, but rather, how fast data is written to the paging subsystem and journal, respectively. When one calls write for a file, the call returns after the buffer is copied into the paging subsystem, but before the file is written to disk. If one calls fdatasync, the call returns after the page cache is flushed to a permanent place, but that does not mean that the block is in its final state on disk, but that it is in a persistent location (e.g., a journal). Thus the time for a write is much too small, while the time for a write plus an fdatasync is potentially too small or too large, depending upon the kind of filesystem!
    2. (5 points) Does a process need to know whether its read-only (and copy-on-write) pages are shared with other processes? Why or why not?

      Answer: There is no need whatever (and no mechanism for discovering) whether a process's read-only and copy-on-write pages are shared with other processes.

      A read-only page cannot be changed, so there is no potential for critical sections that write and read data at the same time. Thus there is no need for knowledge of sharing of a read-only page because there is nothing to coordinate.

      A copy-on-write page is nothing more than a special kind of read-only page, that becomes writeable when one process tries to write to it. That process gets a writeable copy while it remains read-only to the other processes. From the process's point of view, the page is always writeable, in the sense that a write will always succeed (by some mechanism unknown to the process). Again, no coordination between sharing processes is necessary, because the process of becoming writeable is transparent to them.

    3. (5 points) Does the filesystem driver need to know how the disk paging subsystem works? Why or why not?

      Answer: The whole point of filesystem design is to avoid this necessity. A filesystem driver is a high-level driver built on top of the raw disk driver. The raw disk driver copes with the paging subsystem.

      There is one thing the filesystem driver exploits about the raw disk driver and paging subsystem: local references to the same page are cached. This is the only knowledge the filesystem driver has to have about the paging subsystem.

    4. (5 points) Does the raw disk driver need to know the kind of disk it is controlling? Why or why not?

      Answer: The raw disk driver communicates with the actual device. By nature, device protocols vary: talking to a flash drive is very different than talking to a regular disk. So yes, the raw disk driver must account for differences in devices. That is its function.
    1. (10 points) Please explain why I claim that the disk paging subsystem is a producer/consumer architecture. Identify both producers and consumers and describe what each does.

      Answer: There are two cases, "write" and "read".

      In the case of "writing", processes serve as producers for the paging subsystem, by making changes in pages that become a job queue of things to post to the disk. The "consumer" is the "update" process ( also known as the disk scheduler), which writes these page changes to disk.

      In the case of "reading", P/C relationships are more difficult to describe. The processes make "requests", which form the producer queue. The paging subsystem reads these requests into memory, forming a consumer of "requests".

      Note that the architectures of the read and write cases are quite different.

    2. (10 points) Does adding a journal to a filesystem result in an additional producer/consumer relationship? Why or why not?

      Answer:

    Yes, for the write case.

    (In the read case, no extra levels are generated by journalling. The multiple-level access (first to cache, then to journal, then to disk) is not a producer-consumer relationship because there is no well-defined queue of things that passes from producers to consumer.
  4. (20 points) The design of an operating system involves many tradeoffs between predictability and efficiency. Predictability refers to the fact that processes have predictable behavior and results, while efficiency refers to the fact that useful work predominates and overhead is minimized. List some tradeoffs between predictability and efficiency, and explain the impacts of the choices that linux designers made among the alternatives.

    Answer: This was a really hard question. Those of you who looked ahead to the next question got an important clue.

    Predictability has many forms. The most important form is that process execution is not unnecessarily probabilistic or unnecessarily prone to race conditions. There are two main tradeoffs between predictability and efficiency:

    Both of these take overhead to accomplish. The alternative, in each case, is an astounding lack of predictability.

    For a complete description of the initialization problem, see the answer to the next problem. Uninitialized data exposes bugs that are very difficult to locate, in which the initial value of a variable is used before initialization.

    A classic example of the second issue -- atomicity -- is to consider what would happen if calls to write were not atomic. This would allow many more outcomes from a pair of competing write calls (e.g., write(1,"hi there\n", 9) and write(1,"ho ho ho\n", 9)) than the two outcomes that we discussed in class. Further, these variants would occur extremely infrequently, making it very difficult to debug programs, and would require programs to do their own explicit I/O locking to achieve predictable results. Thus, the operating system provides this locking transparently.

  5. (10 points EXTRA CREDIT) It is a little-known fact that when memory is allocated via sbrk in linux, its initial state is all zeros. Everything I have told you about operating systems makes this decision counter-intuitive: data structures should self-initialize; programmers should expect anything for initial values of allocated storage. Why did the designers of memory allocation make this rather obvious exception to the rule of saving time whenever possible?

    Answer: The best way to understand this behavior is to evaluate the alternatives. In both cases, however, there is a more subtle problem. One goal of the operating system is to make processes execute as repeatably as possible. You already know that not initializing a variable before use is a bug. This is just as true for heap variables as for stack variables.

    Thus, the real reason that heap frames are initialized to zero is that it avoids exposing initialization bugs in processes for the heap, which would make processes much more difficult to debug.