# Comp150CPA: Clouds and Power-Aware Computing Classroom Exercise 6 Elasticity and deployment Spring 2011

### group member 5: ____________________________ login: ______________

In class we have studied the linkage between service elasticity and speed of deployment. Let's explore that issue in more detail.

1. Suppose that each service request makes one read request of the cloud, tcloud (including cloud network time) is three times tserver, and tserver happens entirely after tcloud, which happens first (simulating a lookup of the session context). Give a time-space diagram with time on X and three simultaneously arriving requests on Y, that demonstrates how the requests are processed by one application instance (with a single CPU) via latency hiding. For each instance, distinguish between time spent waiting for the cloud, waiting for CPU time, and using the CPU. Ignore effects of process scheduling.
Answer: I am expecting something like:
```
r1 ============****
r2 ============wwww****
r3 ============wwwwwwww****
```
where = represents time waiting for the cloud, * represents time processing, and w represents time waiting for other requests to complete.

Obviously, this depends a lot on one's assumptions. If each one has to wait to ask the cloud, then there would be a small wait before the cloud request goes out:

```
r1 *============****
r2 w*============www****
r3 ww*============wwwwww****
```
2. Let P(k) represent the worst-case performance for simultaneous arrival of k identical requests in problem 1. Using the above diagram, estimate P(3) in terms of tserver and tnetwork, assuming that tnetworkrepresents only the network communication outside the cloud. Hint: tcloud = 3*tserver, so tcloud can be removed.
Answer: From the first answer to the first problem -- which more accurately represents the assumptions -- P(3) <= 3*tserver+tcloud+tnetwork=6*tserver + tnetwork.

Note: tnetwork is the network overhead outside the cloud; it is never added more than once.

3. Let P(k) be the general solution to problem 2 and let M be the number of server instances. Suppose that K requests arrive at the same time and are equally distributed to server instances by a (flowless) switch. Suppose that PSLA is the required response time according to an SLA. What conditions on M will prevent an SLA violation? Why?
Answer: If the switch is flowless, and k requests arrive at the same time, it will distribute k/M to each server. Thus we want P(k/M) <= tSLA. In the ideal case above, P(k)=(k+3)*tserver + tnetwork, so P(k/M)=(k/M+3)*tserver+tnetwork <= PSLA.
4. Give an example of a plot of P against time (with Pminimum, Psafe, and PSLA depicted as horizontal lines) that causes the simple algorithm for service elasticity given in class to provision more resources for a service, even though it would not have violated its SLA even if no change in provisioning were requested.
Answer: The key is that detecting a possible breach of SLA is not the same thing as detecting a real breach. If response time is increasing, that does not mean it will continue to increase. So, response time can "look like" it is going to lead to a violation and then suddenly turn downward, e.g.:
```
SLA-----------------------------------------
----
safe-------------/---|----------------------
/    |
---------------/     |
\---------------------
sufficient----------------------------------

min-----------------------------------------
```
5. (Advanced) In class we went over a detailed estimation of Psafe for an application instance, in terms of PSLA and tdeploy. How would one go about estimating the acceptable performance bound tsufficient that determines how fast is "good enough"?
Answer: This is really quite difficult to analyze but I hinted at the answer in class. The answer is based upon three observations:
1. When a server is swinging from serving one app to another, it is not doing useful work.
2. This swing time is still a significant portion of the monitoring interval Δt, e.g., 10 seconds versus 60 seconds.
3. If tsufficient is set too low, then it costs too much in server footprint and power to assure it.
Thus we must choose tsufficient as a balance between minimizing swings and saving power. Both are forms of waste.
• If tsufficient is too close to tsafe, then there will be constant swings, wasting some measurable percentage of server power on the swings themselves.
• If tsufficient is too close to tminimum, then there will be too many servers allocated, and thus too much infrastructure applied to the problem that could be applied to other applications or powered down.
So, the appropriate strategy is to choose tsufficient so that there is enough room between tsufficient and tsafe to provide a comfortable (swing-free) operating zone, but not any more than that.