Comp150CPA: Clouds and Power-Aware Computing
Classroom Exercise 6
Elasticity and deployment
Spring 2011
group member 1: ____________________________ login: ______________
group member 2: ____________________________ login: ______________
group member 3: ____________________________ login: ______________
group member 4: ____________________________ login: ______________
group member 5: ____________________________ login: ______________
In class we have studied the linkage between service
elasticity and speed of deployment. Let's explore that issue
in more detail.
- Suppose that each service request makes one read request of the
cloud, tcloud (including cloud network time)
is three times tserver, and
tserver happens entirely after tcloud, which
happens first (simulating a lookup of the session context). Give a
time-space diagram with time on X and three simultaneously arriving
requests on Y, that demonstrates how the requests are processed by one
application instance (with a single CPU) via latency hiding. For each
instance, distinguish between time spent waiting for the cloud,
waiting for CPU time, and using the CPU. Ignore effects of process
scheduling.
Answer: I am expecting something like:
r1 ============****
r2 ============wwww****
r3 ============wwwwwwww****
where = represents time waiting for the cloud,
* represents time processing, and w represents time waiting for
other requests to complete.
Obviously, this depends a lot on one's assumptions.
If each one has to wait to ask the cloud, then there would be a small
wait before the cloud request goes out:
r1 *============****
r2 w*============www****
r3 ww*============wwwwww****
- Let P(k) represent the worst-case performance for simultaneous
arrival of k identical requests in problem 1. Using the above
diagram, estimate P(3) in terms of tserver and
tnetwork, assuming that tnetworkrepresents
only the network communication outside the cloud. Hint: tcloud = 3*tserver, so tcloud can be removed.
Answer:
From the first answer to the first problem -- which more
accurately represents the assumptions -- P(3) <= 3*tserver+tcloud+tnetwork=6*tserver + tnetwork.
Note: tnetwork is the network overhead outside the cloud;
it is never added more than once.
- Let P(k) be the general solution to problem 2 and let M be the
number of server instances. Suppose that K requests arrive at the same time
and are equally distributed to server instances by a (flowless) switch.
Suppose that PSLA is the required response time according to an SLA.
What conditions on M will prevent an SLA violation? Why?
Answer:
If the switch is flowless, and k requests arrive at the same time,
it will distribute k/M to each server. Thus we want
P(k/M) <= tSLA. In the ideal case above,
P(k)=(k+3)*tserver + tnetwork, so
P(k/M)=(k/M+3)*tserver+tnetwork <= PSLA.
- Give an example of a plot of P against time (with
Pminimum, Psafe, and PSLA depicted as
horizontal lines) that causes the simple algorithm for service
elasticity given in class to provision more resources for a service,
even though it would not have violated its SLA even if no change in
provisioning were requested.
Answer: The key is that detecting a possible breach of
SLA is not the same thing as detecting a real breach. If response
time is increasing, that does not mean it will continue to increase.
So, response time can "look like" it is going to lead to a violation
and then suddenly turn downward, e.g.:
SLA-----------------------------------------
----
safe-------------/---|----------------------
/ |
---------------/ |
\---------------------
sufficient----------------------------------
min-----------------------------------------
- (Advanced) In class we went over a detailed estimation of
Psafe for an application instance, in terms of
PSLA and tdeploy. How would one go about
estimating the acceptable performance bound tsufficient
that determines how fast is "good enough"?
Answer:
This is really quite difficult to analyze but I hinted at the answer
in class. The answer is based upon three observations:
- When a server is swinging from serving one app to another,
it is not doing useful work.
- This swing time is still a significant portion of the monitoring
interval Δt, e.g., 10 seconds versus 60 seconds.
- If tsufficient is set too low, then it costs too much
in server footprint and power to assure it.
Thus we must choose tsufficient as a balance between minimizing
swings and saving power. Both are forms of waste.
- If tsufficient is too close to tsafe, then there
will be constant swings, wasting some measurable percentage of server
power on the swings themselves.
- If tsufficient is too close to tminimum, then
there will be too many servers allocated, and thus too much infrastructure
applied to the problem that could be applied to other applications or
powered down.
So, the appropriate strategy is to choose tsufficient so that
there is enough room between tsufficient and tsafe
to provide a comfortable (swing-free) operating zone, but not any more than that.