Comp150CPA: Clouds and Power-Aware Computing
Classroom Exercise 5
Scalability and elasticity
Spring 2011

group member 1: ____________________________ login: ______________

group member 2: ____________________________ login: ______________

group member 3: ____________________________ login: ______________

group member 4: ____________________________ login: ______________

group member 5: ____________________________ login: ______________

In class we have studied the concepts of server scaling and service switching, and how they relate to AppEngine development caveats. Let's explore a bit more about these concepts.

  1. List the disadvantages of flow-based switching that motivate a flowless switching model.
    Answer:
  2. Draw a time-space diagram of a service that makes two queries into the datastore, and mark the parts that service switching can control.
    Answer: I expect something like:
     
    client  --------------------------------------------------------
              \                   / 
    service ---==------===------==---------------------------------
                 \read/   \read/
    cloud   --------------------------------------------------------
    
    where "=" represents the time when the server is actually processing data as opposed to waiting for the cloud.
  3. In the (strongly consistent) AppEngine datastore, writes don't block. Draw a time-space diagram of a service that only writes data into the datastore and does not read anything back from the datastore. What parts of this diagram can be controlled via service switching?
    Answer: I expect something like:
     
    client  --------------------------------------------------------
              \                   / 
    service ---===================---------------------------------
                     |write 
    cloud   --------------------------------------------------------
    
    where "=" represents the time when the server is actually processing data.
  4. Is an array of 10 server instances with flowless service switching about "twice as fast" as an array of 5 server instances that host the same services? Why or why not?
    Answer: "Twice as fast" is exactly what scaling does not accomplish. Each client instance interacts with one server instance, and cannot respond any faster than one server can respond. What scaling does is to handle request volume rather than decreasing response time. An array of 10 instances handles roughly twice the request volume at the same speed.
  5. (Advanced) Suppose that you have a datastore that exhibits eventual consistency, combined with a flowless service switch. If the datastore is not strongly consistent, how can the application be written so that subsequent interactions with the datastore (e.g., service requests to fetch the results of a prior transaction) retrieve strongly consistent data?
    Answer: This is really hard because I didn't allow you to mess with the datastore itself. There is no reasonable answer without more assumptions. In my basic approach, I used two of them:
    1. The eventually consistent datastore exhibits transactional integrity, in the sense that a query returns either the object version before a transaction, or the version after a transaction, and nothing in-between. To paraphrase an old joke, you always get "an old worm" or "a new worm", but never "half a new worm". This is realistic; most eventually consistent datastores do exhibit this.
    2. For at least one kind of object, eventual consistency is bounded in clock time, i.e., if I make a request at time t, then the datastore will be consistent at time t+t0 for some known delay t0.
    My basic construction involves constructing one kind of object whose consistency can be measured. This is an object that takes two possible states, "empty" and "nonempty", and one can measure whether it is consistent in the nonempty state. Consider, e.g., an object that either contains an integer token or not. Suppose we can ensure that it is not changed other than making it empty when it is nonempty, and vice-versa, and cannot change it from containing one non-empty value to containing another. This is called a mailbox. Here's a simple one without the persistence annotations:
     
    class mailbox { 
       private Date whenGloballyConsistent; 
       private int token; 
       public Date getDate(); 
       public void setDate(Date d); 
       public int getToken(); 
       public void setToken(int t); 
    }; 
    

    To write something into the mailbox, we might use:

     
    mailbox m; 
    success=false; 
    while (!success) { 
      pm.begin(); 
      if ((m.getDate()==null 
        || m.getDate() < new Date()) 
       && m.getToken()==0) { // if consistent and empty
        m.setDate(new Date()+t0); // date arithmetic: time when it will be consistent
        m.setToken(t); // token is t!=0
        pm.commit(); 
        success=true; 
      } else { // not consistent or not empty
        pm.abort(); 
      } 
    } 
    
    In other words, to write a token, wait until the mailbox is empty and strongly consistent, and then write the token into it.

    To read something from the mailbox, we might use:

     
    mailbox m; 
    success=false; 
    while (!success) { 
      pm.begin(); 
      if (m.getDate()!=null 
       && m.getToken()!=0 
       && m.getDate() < new Date()) { // if consistent and nonempty
        t = m.getToken(); 
        m.setDate(new Date()+t0); // when this will be consistent
        m.setToken(0);         // empty
        pm.commit(); 
        success=true; 
      } else { // not consistent or not nonempty
        pm.abort(); 
      } 
    }
    
    In other words, wait until the mailbox is full and strongly consistent, and then read the value and erase the token. The point of this mailbox is that the first transaction will only succeed when the mailbox is empty and consistent, and will switch it from empty to nonempty. Likewise, the second will only succeed if it is full and consistent, and each changes the state of the mailbox from empty to full and vice-versa. This is -- in essence -- a semaphore for communication of consistency between service instances!

    The way this is used in the application is that if we have one object that is consistent, we can make any other object consistent with it by sharing a value between the objects and waiting for the shared values to agree. Suppose, for example, that we are concerned that references to a specific object should exhibit strong consistency.

    1. When we write a new value, we wait for the mailbox to be empty and increment a serial number in the object and store it in the mailbox.
    2. When we try to read the object later, we check the mailbox for consistency and the same serial number. If it is the same as the serial number of the object, the object exhibits strong consistency.
    3. When we are done with the object, we empty the mailbox in preparation for changing something else.
    The reason this works is that the datastore exhibits transactional integrity, which means that the transactions either occur or not, and not halfway, which means that if the mailbox is consistent within the global datastore and also consistent with the token in the target object, then the object is globally consistent in the datastore as well!

    The reason I used a mailbox, and not the object itself, to measure consistency, is that I needed an object limited to simple transactions. In general, I cannot get away with assuming global consistency after time t0 unless I keep my manipulations of the mailbox simple!

    If the datastore does not exhibit some form of transactional integrity, then all bets are off. One cannot even assure that the version of mailbox data read by a later service request is even complete, much less consistent. One potential avenue is to include a checksum or hash field in the distributed object that can be checked for validity and assure transactional integrity of the mailbox message, e.g. But this is an imperfect strategy, in the sense that there will be object states that verify against the hash and are not consistent!

    This is just one way to try to synthesize strong consistency in a weak consistency environment. All of them require more information to work.