Comp150CPA: Clouds and Power-Aware Computing
Classroom Exercise 6
Elasticity and deployment
Spring 2011

group member 1: ______________ login:

group member 2: ______________ login:

group member 3: ______________ login:

group member 4: ______________ login:

group member 5: ______________ login:

In class we have studied the linkage between service elasticity and speed of deployment. Let's explore that issue in more detail.

Suppose that each service request makes one read request of the cloud, t_cloud (including cloud network time) is three times t_server, and t_server happens entirely after t_cloud, which happens first (simulating a lookup of the session context). Give a time-space diagram with time on X and three simultaneously arriving requests on Y, that demonstrates how the requests are processed by one application instance (with a single CPU) via latency hiding. For each instance, distinguish between time spent waiting for the cloud, waiting for CPU time, and using the CPU. Ignore effects of process scheduling.
Answer: I am expecting something like:
```
 
r1 ============****
r2 ============wwww****
r3 ============wwwwwwww****
```
where = represents time waiting for the cloud, * represents time processing, and w represents time waiting for other requests to complete.
Obviously, this depends a lot on one's assumptions. If each one has to wait to ask the cloud, then there would be a small wait before the cloud request goes out:
```
 
r1 *============****
r2 w*============www****
r3 ww*============wwwwww****
```
Let P(k) represent the worst-case performance for simultaneous arrival of k identical requests in problem 1. Using the above diagram, estimate P(3) in terms of t_server and t_network, assuming that t_networkrepresents only the network communication outside the cloud. Hint: t_cloud = 3*t_server, so t_cloud can be removed.
Answer: From the first answer to the first problem -- which more accurately represents the assumptions -- P(3) <= 3*t_server+t_cloud+t_network=6*t_server + t_network.
Note: t_network is the network overhead outside the cloud; it is never added more than once.
Let P(k) be the general solution to problem 2 and let M be the number of server instances. Suppose that K requests arrive at the same time and are equally distributed to server instances by a (flowless) switch. Suppose that P_SLA is the required response time according to an SLA. What conditions on M will prevent an SLA violation? Why?
Answer: If the switch is flowless, and k requests arrive at the same time, it will distribute k/M to each server. Thus we want P(k/M) <= t_SLA. In the ideal case above, P(k)=(k+3)*t_server + t_network, so P(k/M)=(k/M+3)*t_server+t_network <= P_SLA.
Give an example of a plot of P against time (with P_minimum, P_safe, and P_SLA depicted as horizontal lines) that causes the simple algorithm for service elasticity given in class to provision more resources for a service, even though it would not have violated its SLA even if no change in provisioning were requested.
Answer: The key is that detecting a possible breach of SLA is not the same thing as detecting a real breach. If response time is increasing, that does not mean it will continue to increase. So, response time can "look like" it is going to lead to a violation and then suddenly turn downward, e.g.:
```
 
SLA-----------------------------------------
                  ----
safe-------------/---|----------------------
                /    |
---------------/     |
                      \---------------------
sufficient----------------------------------

min-----------------------------------------
```
(Advanced) In class we went over a detailed estimation of P_safe for an application instance, in terms of P_SLA and t_deploy. How would one go about estimating the acceptable performance bound t_sufficient that determines how fast is "good enough"?
Answer: This is really quite difficult to analyze but I hinted at the answer in class. The answer is based upon three observations:
1. When a server is swinging from serving one app to another, it is not doing useful work.
2. This swing time is still a significant portion of the monitoring interval Δt, e.g., 10 seconds versus 60 seconds.
3. If t_sufficient is set too low, then it costs too much in server footprint and power to assure it.
Thus we must choose t_sufficient as a balance between minimizing swings and saving power. Both are forms of waste.
- If t_sufficient is too close to t_safe, then there will be constant swings, wasting some measurable percentage of server power on the swings themselves.
- If t_sufficient is too close to t_minimum, then there will be too many servers allocated, and thus too much infrastructure applied to the problem that could be applied to other applications or powered down.
So, the appropriate strategy is to choose t_sufficient so that there is enough room between t_sufficient and t_safe to provide a comfortable (swing-free) operating zone, but not any more than that.

Comp150CPA: Clouds and Power-Aware Computing Classroom Exercise 6 Elasticity and deployment Spring 2011

group member 1: ____________________________ login: ______________

group member 2: ____________________________ login: ______________

group member 3: ____________________________ login: ______________

group member 4: ____________________________ login: ______________

group member 5: ____________________________ login: ______________

Comp150CPA: Clouds and Power-Aware Computing
Classroom Exercise 6
Elasticity and deployment
Spring 2011

group member 1: ______________ login:

group member 2: ______________ login:

group member 3: ______________ login:

group member 4: ______________ login:

group member 5: ______________ login: