Comp150CPA: Clouds and Power-Aware Computing
Classroom Exercise 2 Answers
Consistency and Concurrency
Spring 2011

group member 1: ____________________________ login: ______________

group member 2: ____________________________ login: ______________

group member 3: ____________________________ login: ______________

group member 4: ____________________________ login: ______________

group member 5: ____________________________ login: ______________

In class we have studied the concepts of strong and weak (eventual) consistency and strong and weak (optimistic) concurrency. Let's explore a few aspects of these concepts.

  1. Why are concurrency and consistency not typically a problem when creating and storing new individual entities?

    Answer: If an entry has no relationships with any others, then one can avoid any problems of concurrency by delaying the makePersistent() call until the object is completely instantiated. That way, it is not even in the cloud store until it is created as one (atomic) operation.

  2. On twitter, when you create an account, it does not appear in searches for a while. From the user's point of view, are twitter account objects strongly or eventually consistent? Why?

    Answer: Since user actions are not reflected in queries immediately after the action, this is an eventual consistency situation.

    One group pointed out (rightly) that the fact that something doesn't appear in search results immediately doesn't mean that twitter as a whole isn't strongly consistent, but just that twitter search isn't strongly consistent.

  3. With a two-column concurrency diagram like the one used in lecture, with two application instances in columns and time increasing in Y, demonstrate how naively changing the name "John Smith" to "Jon Smythe" in one instance of an application, without using transactions, can result in printing the name "Jon Smith" in another instance of the application(the access methods are called setFirstName(String), getFirstName(), setLastName(String), getLastName()).

    Answer: Consider the following schedule.
    instance 1instance 2
    setFirstName("Jon")  
      getFirstName()
      getLastName()
    setLastName("Smythe")  
    In the schedule, the two "gets" occur between the first and second "sets", thus creating the problem.

  4. Still considering the above situation, describe how begin() and commit() calls should be placed to circumvent the problem. Then exhibit another two-column concurrency diagram (with two application instances) that causes the transaction that is trying to change the first and last name together to throw an exception.

    Answer: The easiest solution is to surround the writes with a begin()-commit() block, as follows:
    instance 1instance 2
    begin()
    setFirstName("Jon")  
      getFirstName()
      getLastName()
    setLastName("Smythe")  
    commit()
    In this case, instance 2 will get the old information, because the commit has not occurred.

    To make the commit() throw an exception, consider:
    instance 1instance 2
    begin() 
    setFirstName("Jon")  
     setFirstName("Joan")
    setLastName("Smythe")  
    commit() 
    Instance 1 doesn't actually try to modify the datastore until the commit(). Meanwhile, instance 2 has violated the transactional integrity of the begin()-commit() block, by changing the data affected inside the block. Thus the commit() throws a (retryable) exception.

    Note that one cannot force an exception to occur in a situation where a begin()-commit() block contains only reads, because the writes are all that are committed!

  5. (Advanced) Is it possible to construct a cloud datastore so that changes to attributes other than the accessed or changed attributes between a begin() and a commit() do not cause consistency exceptions? Why or why not?
    Answer: Transactional integrity requires that anything used to create new attributes remain unchanged during the transaction period. This does not mean that other data cannot change. But the cloud has no way of knowing whether you have actually used what you access or not. So, the best one can do is to mark anything accessed between begin() and commit() as needing to remain unchanged, and then throw an exception if any of those attributes do change, whether they determine the results of the begin()--commit() block or not.

    In fact, this is exactly what the persistence factory does. It marks what persistent data is used in the code between a begin() and a commit(), and during the commit(), checks whether any of it has changed. If so, the commit() fails. So, if another instance modifies something persistent that is not referenced inside the begin() to commit() code fragment, this does not cause the commit() to fail.

    This is extremely clever, but it is possible to trick it. For example, suppose that I am in the commit block shown above to change first and last name, but inside that block, I read the salary for no particularly good reason. Then if another instance changes the salary during the commit block, the commit will fail, even though the salary change didn't affect the commit. The moral of this story is that the analysis done by the factory code is clever, but not extremely clever, and it can reject a commit for spurious reasons.