Comp150CPA: Clouds and Power-Aware Computing
Classroom Exercise 11 Answers
Service-Oriented Datastores
Spring 2011
group member 1: ____________________________ login: ______________
group member 2: ____________________________ login: ______________
group member 3: ____________________________ login: ______________
group member 4: ____________________________ login: ______________
group member 5: ____________________________ login: ______________
In class we have studied how to think of a cloud application as a service,
including choices for front-facing cloud datastores.
Let's explore that in more detail.
- What is the CAP class of a database server running on
a single machine? Why?
Answer:
It is in at least class CA, because an outage on the server can lead to
data loss. Arguably, since it cannot be partitioned,
one could consider it to be in class P as well.
- Is there a reasonable use for cloud datastores in CAP class CA?
What might it be?
Answer:
A thing in class CA has consistency and availability but not partition
resilience. The appropriate uses of such a thing include
- Best-effort services where transaction resilience is not promised,
e.g., social locator services.
- Situations in which there is no effect of partitioning, including
where a database is replicated precisely on a farm of servers.
The google search service is in class A and thus class CA. It
doesn't need to worry about consistency, because its returns are all
best-effort. it doesn't need to protect against partitioning, because
it is read-only once it starts, and consists of farms of duplicate
servers.
- I claimed in lecture that LinkedIn stores the results of a Pig
job in a NoSQL datastore. For assignment 4,
what is the key in this datastore, and what is the value?
Answer:
The key is the identity of one user, and the value is a structure
of potential friends and the friends-in-common.
- In Amazon Dynamo,
the datastore key for shopping cart contents
is the content of a cookie stored locally in your browser.
Suppose that one user opens two instances of a
shopping cart in two panes of the same browser, and proceeds to update
each one by deleting a different item. Draw a picture showing why
this is a conflict in the vector clock algorithm. Then describe what
the vector clock algorithm might do to resolve this situation,
to business advantage.
Answer:
The vector clock algorithm stores both versions of objects and the
version timestamps. In this case, the situation is something like:
Version 1 (initial)
Version 2 Derived from Version 1 at time 2
Version 3 Derived from version 1 at time 3
...
Which, as a picture, looks like this
Version 1
delete / \ delete
Version 2 \
Version 3
The vector clock algorithm resolves this discrepancy according to
business rules, in this case, merging the carts:
Version 1
delete / \ delete
Version 2 \
\ Version 3
\ /
Version 4 (merge of Version 2 and Version 3)
- (Advanced) A key feature of Amazon's Dynamo is what is called
"business-logic-based recovery". If a server is lost during a post,
so that the database is in an inconsistent state, consistency is
restored according to business rules rather than computer science
concepts. What are the appropriate business rules for merging
versions of a purchase transaction, or a return authorization? Why?
Answer:
The business resolution rules must take into account what a customer
expects. In the case of duplicate purchase transactions, the most
likely problem is that a user changed his or her mind. So the
appropriate resolution is to delete the earlier one and act on the
later one.
In the case of a return authorization, the appropriate resolution
is to merge if possible. It is quite possible that two different
return authorization requests were made for the same order.