Comp150CPA: Clouds and Power-Aware Computing
Classroom Exercise 11 Answers
Service-Oriented Datastores
Spring 2011

group member 1: ____________________________ login: ______________

group member 2: ____________________________ login: ______________

group member 3: ____________________________ login: ______________

group member 4: ____________________________ login: ______________

group member 5: ____________________________ login: ______________

In class we have studied how to think of a cloud application as a service, including choices for front-facing cloud datastores. Let's explore that in more detail.

  1. What is the CAP class of a database server running on a single machine? Why?

    It is in at least class CA, because an outage on the server can lead to data loss. Arguably, since it cannot be partitioned, one could consider it to be in class P as well.
  2. Is there a reasonable use for cloud datastores in CAP class CA? What might it be?

    A thing in class CA has consistency and availability but not partition resilience. The appropriate uses of such a thing include

    The google search service is in class A and thus class CA. It doesn't need to worry about consistency, because its returns are all best-effort. it doesn't need to protect against partitioning, because it is read-only once it starts, and consists of farms of duplicate servers.

  3. I claimed in lecture that LinkedIn stores the results of a Pig job in a NoSQL datastore. For assignment 4, what is the key in this datastore, and what is the value?

    The key is the identity of one user, and the value is a structure of potential friends and the friends-in-common.
  4. In Amazon Dynamo, the datastore key for shopping cart contents is the content of a cookie stored locally in your browser. Suppose that one user opens two instances of a shopping cart in two panes of the same browser, and proceeds to update each one by deleting a different item. Draw a picture showing why this is a conflict in the vector clock algorithm. Then describe what the vector clock algorithm might do to resolve this situation, to business advantage.

    The vector clock algorithm stores both versions of objects and the version timestamps. In this case, the situation is something like:
    Version 1 (initial)
    Version 2 Derived from Version 1 at time 2
    Version 3 Derived from version 1 at time 3
    Which, as a picture, looks like this
           Version 1
    delete /        \ delete
      Version 2      \
                  Version 3
    The vector clock algorithm resolves this discrepancy according to business rules, in this case, merging the carts:
           Version 1
    delete /        \ delete
      Version 2      \
          \       Version 3
           \        /
           Version 4 (merge of Version 2 and Version 3) 
  5. (Advanced) A key feature of Amazon's Dynamo is what is called "business-logic-based recovery". If a server is lost during a post, so that the database is in an inconsistent state, consistency is restored according to business rules rather than computer science concepts. What are the appropriate business rules for merging versions of a purchase transaction, or a return authorization? Why?

    The business resolution rules must take into account what a customer expects. In the case of duplicate purchase transactions, the most likely problem is that a user changed his or her mind. So the appropriate resolution is to delete the earlier one and act on the later one.

    In the case of a return authorization, the appropriate resolution is to merge if possible. It is quite possible that two different return authorization requests were made for the same order.