Comp150CPA: Clouds and Power-Aware Computing
Classroom Exercise 11 Answers
Service-Oriented Datastores
Spring 2011

group member 1: ____________________________ login: ______________

group member 2: ____________________________ login: ______________

group member 3: ____________________________ login: ______________

group member 4: ____________________________ login: ______________

group member 5: ____________________________ login: ______________

In class we have studied how to think of a cloud application as a service, including choices for front-facing cloud datastores. Let's explore that in more detail.

  1. What is the CAP class of a database server running on a single machine? Why?

    Answer:
    It is in at least class CA, because an outage on the server can lead to data loss. Arguably, since it cannot be partitioned, one could consider it to be in class P as well.
  2. Is there a reasonable use for cloud datastores in CAP class CA? What might it be?

    Answer:
    A thing in class CA has consistency and availability but not partition resilience. The appropriate uses of such a thing include

    The google search service is in class A and thus class CA. It doesn't need to worry about consistency, because its returns are all best-effort. it doesn't need to protect against partitioning, because it is read-only once it starts, and consists of farms of duplicate servers.

  3. I claimed in lecture that LinkedIn stores the results of a Pig job in a NoSQL datastore. For assignment 4, what is the key in this datastore, and what is the value?

    Answer:
    The key is the identity of one user, and the value is a structure of potential friends and the friends-in-common.
  4. In Amazon Dynamo, the datastore key for shopping cart contents is the content of a cookie stored locally in your browser. Suppose that one user opens two instances of a shopping cart in two panes of the same browser, and proceeds to update each one by deleting a different item. Draw a picture showing why this is a conflict in the vector clock algorithm. Then describe what the vector clock algorithm might do to resolve this situation, to business advantage.

    Answer:
    The vector clock algorithm stores both versions of objects and the version timestamps. In this case, the situation is something like:
     
    Version 1 (initial)
    Version 2 Derived from Version 1 at time 2
    Version 3 Derived from version 1 at time 3
    ...
    
    Which, as a picture, looks like this
     
           Version 1
    delete /        \ delete
      Version 2      \
                  Version 3
    
    The vector clock algorithm resolves this discrepancy according to business rules, in this case, merging the carts:
     
           Version 1
    delete /        \ delete
      Version 2      \
          \       Version 3
           \        /
           Version 4 (merge of Version 2 and Version 3) 
    
  5. (Advanced) A key feature of Amazon's Dynamo is what is called "business-logic-based recovery". If a server is lost during a post, so that the database is in an inconsistent state, consistency is restored according to business rules rather than computer science concepts. What are the appropriate business rules for merging versions of a purchase transaction, or a return authorization? Why?

    Answer:
    The business resolution rules must take into account what a customer expects. In the case of duplicate purchase transactions, the most likely problem is that a user changed his or her mind. So the appropriate resolution is to delete the earlier one and act on the later one.

    In the case of a return authorization, the appropriate resolution is to merge if possible. It is quite possible that two different return authorization requests were made for the same order.