Queuing Theory

Requests arriving at a server are stored in a queue so that they can be processed efficiently and fairly. The server may consist of one or several processors and the requests may be served on a first-come-first-served (FCFS) basis or according to some other discipline. Kendall notation is a shorthand for specifying the parameters of a queue as A/B/c/K/m/Z as follows:

A is the arrival time distribution
B is the service time distribution
c is the number of servers
K is the capacity of the queue
m is the number of customers
Z is the queue discipline

If the capacity and the number of customers are unbounded and the queue is FCFS this is shortened to A/B/c. Common distribution abbreviations are:

G for a completely general distribution
GI for a general distribution where interarrival or service times are independent
D for deterministic, or constant
M for memoryless, which leads to Poisson arrivals and exponential service
E for Erlang
H for hyperexponential

M/M/1 queue:

Given appropriate assumptions, a queue can be modelled as a Markov process with the state being given by the queue length, so s₀ stands for the queue being empty , s₁ stands for the queue having one request waiting to be served, etc. Let q = (q₀, q₁, q₂, ... ) be the stationary distribution for this process. If r is the rate at which new requests arrive at the queue and m is the rate at which a single server removes requests from the queue, we can write the following balance equations:

rq₀ = mq₁ and

(r + m)q_i = rq_i-1 + mq_i+1 for i > 0

We can rewrite the second equation as

rq_i- mq_i+1 = rq_i-1 - mq_i

which are all zero by the first equation, so

q_i = (r/m) q_i-1 = (r/m)ⁱq₀

The sum of these probabilities is a geometric series that must sum to 1, so q₀ = 1 - r/m, assuming r < m. If the arrival rate is greater than or equal to the service rate, there is no stationary distribution and the queue will grow without bound.

We can now evaluate the following (assuming r < m):

The server utilization is the proportion of time the server is busy. Assuming that the request at the head of the queue is the one being served and that it is not removed from the queue until service is completed then the server is busy exactly when the queue is nonempty, so the server utilization is r/m = 1 - q₀.
We can use the fact that the queue length is a geometric random variable with parameter r/m to compute the average number of requests in the system as r/(m-r).
Similarly the variance of the number of requests in the system is rm/(m-r)²
Little's formula (which we don't prove) states that the average number of requests in the system is the arrival rate, r, times the average response time. Solving for the average response time gives 1/(m-r)

M/M/c queue:

If there are c > 1 servers then requests can be removed from the queue at a rate im when the queue size, i, is at most c and at a rate cm when the queue size is larger. In this case the balance equations become

rq₀ = mq₁

(r + im)q_i = rq_i-1 + (i+1)mq_i+1 for 0 < i < c and

(r + cm)q_i = rq_i-1 + cmq_i+1 for i >= c

Solving these in the same way as above gives

q_i = q₀ (r/m)ⁱ /i! for i < c and

q_i = q₀ (r/m)ⁱ /(c! c^i-c) for i >= c

Solving for q₀ gives

1/q₀ = cm(r/m)^c/(c! (cm-r) ) + sum from 0 to c-1 of (r/m)ⁱ/i! for r < cm

From this we can derive the following:

The average number of requests in the system is
r/m + (r/m)^c * rmcq₀/(cm - r)²c!
If M is the number of busy servers, then P(M = i) = q_i for i < c and P(M = c) = cmq_c/(cm - r) = cmr^cq₀/(cm - r)c!m^c , so E[M] = r/m
The probability that a new request has to wait is P(M = c), given by the above formula, known as Erlang's C formula
Little's formula can again be used to calculate the average response time from the average number of requests in the system

References:

Operating Systems, Second edition
H. M. Deitel
Addison-Wesley, 1990

Probability and Statistics with Reliability, Queuing, and Computer Science Applications
Kishor S. Trivedi
Prentice-Hall, 1982.