Requests arriving at a server are stored in a queue so that they can be
processed efficiently and fairly. The server may consist of one or several
processors and the requests may be served on a first-come-first-served (FCFS)
basis or according to some other discipline. Kendall notation is a shorthand
for specifying the parameters of a queue as A/B/c/K/m/Z as follows:
- A is the arrival time distribution
- B is the service time distribution
- c is the number of servers
- K is the capacity of the queue
- m is the number of customers
- Z is the queue discipline
If the capacity and the number of customers are unbounded and the queue is FCFS
this is shortened to A/B/c. Common distribution abbreviations are:
- G for a completely general distribution
- GI for a general distribution where interarrival or service times are independent
- D for deterministic, or constant
- M for memoryless, which leads to Poisson arrivals and exponential service
- E for Erlang
- H for hyperexponential
M/M/1 queue:
Given appropriate assumptions, a queue can be modelled as a Markov process with
the state being given by the queue length, so s0 stands for the queue
being empty , s1 stands for the queue having one request waiting to
be served, etc. Let q = (q0, q1, q2, ... )
be the stationary distribution for this process. If r is the rate at which new requests
arrive at the queue and m is the rate at which a single server removes requests
from the queue, we can write the following balance equations:
rq0 = mq1 and
(r + m)qi = rqi-1 + mqi+1 for i > 0
We can rewrite the second equation as
rqi - mqi+1 = rqi-1 - mqi
which are all zero by the first equation, so
qi = (r/m) qi-1 = (r/m)iq0
The sum of these probabilities is a geometric series that must sum to 1,
so q0 = 1 - r/m, assuming r < m. If the arrival rate is greater than
or equal to the service rate, there is no stationary distribution and the queue
will grow without bound.
We can now evaluate the following (assuming r < m):
- The server utilization is the proportion of time the server is busy.
Assuming that the request at the head of the queue is the one being served
and that it is not removed from the queue until service is completed then
the server is busy exactly when the queue is nonempty, so the server
utilization is r/m = 1 - q0.
- We can use the fact that the queue length is a geometric random variable
with parameter r/m to compute the average number of requests in the system as
r/(m-r).
- Similarly the variance of the number of requests in the system is rm/(m-r)2
- Little's formula (which we don't prove) states that the average number of requests
in the system is the arrival rate, r, times the average response time. Solving for
the average response time gives 1/(m-r)
M/M/c queue:
If there are c > 1 servers then requests can be removed from the
queue at a rate im when the queue size, i, is at most c and
at a rate cm when the queue size is larger. In this case the balance equations
become
rq0 = mq1
(r + im)qi = rqi-1 + (i+1)mqi+1
for 0 < i < c and
(r + cm)qi = rqi-1 + cmqi+1
for i >= c
Solving these in the same way as above gives
qi = q0 (r/m)i /i! for i < c and
qi = q0 (r/m)i /(c! ci-c)
for i >= c
Solving for q0 gives
1/q0 = cm(r/m)c/(c! (cm-r) ) + sum from 0 to c-1 of
(r/m)i/i! for r < cm
From this we can derive the following:
References:
Operating Systems, Second edition
H. M. Deitel
Addison-Wesley, 1990
Probability and Statistics with Reliability, Queuing, and Computer Science
Applications
Kishor S. Trivedi
Prentice-Hall, 1982.