Applying Queuing Theory to Network Design
We think of waiting lines as the circumstance we encounter at the grocery store but in the telecommunications world lines can also form for packets waiting to be handled by routers or telephone calls waiting for a trunk to become available. Queuing theory defines a set of formulas that describe waiting line behaviors and can be applied to these and similar telecommunications situations. This newsletter will describe how to use queuing theory as a tool to predict and analyze several telecommunications situations using one simple formula that has many, easy to apply, capabilities.
A queue forms when demand for services is greater than the servicing capacity. Most service systems are designed so that the average servicing capacity is greater than the average arrivals for service. But even with this excess capacity to clear, a line will still form. If these arrivals were equally spaced, then the flow rate could be guaranteed for any time period – such as when a machine drops a part on a conveyor belt every 3 three seconds – then the arrival rate would be constant. In this case, no line would form since the system was designed with excess clearing capacity.
In a lot of real-world situations, there are random arrivals, a circumstance where arrivals are not equally spaced. In these cases, there will be time intervals where the number of arrivals is less than, equal to, or greater than the average arrival rate with a pattern or distribution that covers numerous values.
Another important related parameter to describe arrival patterns is dependency. Arrivals may be dependent or independent of one another. It is common to find that if the arrivals come from a very large population they are independent of one another and distributed randomly. Automobiles exiting an off-ramp of a highway generally do so independently of the actions of the surrounding cars on the same road, as an example. The arrival interval of a car to the stop light at the end of the ramp, the service entity, would be independent of the car arrival intervals of cars before it or after it.
Service times can be described in the same way. A constant service time means that, no matter what, each entity will get the same amount of attention – like every customer of a bank spending the exact same amount of time at the teller’s window. A more randomly distributed service time means that service times are widely variable around an average. Dependency is also a consideration with service; that is, the service time applied to one entity may be dependent upon the previous entities or be completely independent of one another.
In general, a given situation could be described in terms of the pattern or distribution of arrivals and the pattern or distribution of service with four possible combinations:
Systems that involve active participation of humans often fall into the last category – random arrivals and random service. This would be true for common telecommunications applications where people generate traffic throughout the day, often independently and randomly within their own sphere of activity, but almost always independently and randomly for a large population, like the entire office, neighborhood or city. An example of this would be the independent and random nature of calls arriving at a central office switch on a normal day. The average arrival rate may vary with time of day but the relationship of the call arrivals will be independent and random.
Telecommunications-based service activities can be independent and random as well. For example, the length of a telephone call or the size of a packet of data will impact the service time of telephone switches and routers. While a telephone call may average 4 minutes and a packet of data may average 700 bytes, some will be greater than, equal to, or less than this average. The length of the call or the size of the packet coming from a large population of users into a switch or router will generally be independent of the previous calls or the previous packets, therefore, telecommunications systems exhibit random service characteristics or close enough to a random pattern to make the math, as described below, easier to develop.
Randomness may seem to be a difficult thing to get your arms around when trying to predict events; however, mathematicians have found that certain distribution models relate quite well to random patterns. For example, the Poison distribution model and the exponential distribution model map well with actual random arrival patterns and random service patterns respectively, or close enough. I will not go into a description of these distributions models in this newsletter, but it is important to highlight this fact as having known distribution patterns for these events leads to the creation of formulas that describe waiting line systems. For example, there is a formula that will predict the average time you will wait in a grocery checkout line if everyone waited in a single line that fed multiple cashiers - like the model used in banks. The answer is; a lot less than the average time you wait standing in individual lines for each cashier. These formulas are based on the assumption that the arrivals are coming from a large population – there should be many, many people in the grocery store at the time - so that the “averages” and distributions are representative of a typical system.
These types of formulas are defined by an area of mathematics called Queuing Theory, or waiting line theory. Queuing theory formulas can be rigorously applied to the analysis and prediction of the performance of a variety of waiting line situations in the telecommunications field. I want to take one of the basic level formulas and apply it specifically to data communications systems in a very simple, back-of-the envelope manner to show important performance relationships in the operations of such systems. The delay formula presented can be used as a simple tool to understand telecommunications performance issues at a high level. This formula says that delay (D), the wait time for service or the time in queue, is a function of the utilization and the speed of the servicing entity.
The underlying theory of this formula, which is related to similar network engineering formulas, such as the Erlang B and C ones, is simple: When a resource you want to use is busy, you must wait for it to become free. The busier the resource the more you have to wait for it to become free. If the resource is 100% busy, you have to wait forever. A similar reasoning applies when a data packet is ready to be sent over a transmission link, such as an Ethernet LAN; it must wait its turn to be transmitted. Likewise, if a packet enters a busy router, it must wait its turn to be handled. The reason to show this formula is to demonstrate how the magnitude of this wait relates to the busyness of the resource and the fact that this relationship is not linear.
Utilization (U) is another way to describe how busy a resource is over a time period. For example, if a router is being used 1800 seconds out of an hour, it would be 50% utilized during that time. However, usually we are concerned more about a peak period, such as a 15 minute period. Also in this formula is a factor called TS, or service time. This is the amount of time a packet takes, on average, to use the resource once its turn comes up. For example, a 10,000 bit packet being transmitted across a 56k bps dial-up link would occupy that link 10,000/56,000 seconds, or .185 seconds, plus a little electrical propagation delay time. During this time, no other packets can use the link. This is the link's service time for that packet. Another, less technical, example involves a bank. After a customer has waited in line for a bank teller to become free, the amount of time the customer spends at the teller's position is the teller’s service time for one customer, which is then averaged over many customers to come up with TS for the bank’s tellers. Also in this case, (U) would represent the utilization of the teller over a measured period. Utilization is usually measured over a peak period, such as an hour or 15 minutes, to provide a worst case prediction.
Importance of Device Utilization
The real key to understanding this delay formula is the utilization relationship piece: U divided by 1 – U. For example, when the utilization is low, say 10%, this factor becomes 1/9th. When the utilization of the resource is high, say 90%, this factor becomes the whole number 9. And when the utilization of the resource is 50%, this multiplier factor becomes the whole number 1. In general, regardless of the service time, when the utilization is less than 50%, the wait for the resource is the service time multiplied by a fraction. When the utilization is greater than 50%, the wait for the resource is the service time multiplied by a whole number – a big difference. This utilization-based multiplier increases exponentially and does so rapidly after the 50% utilization point, as shown in the graph below. The translation is: if a resource’s utilization is much beyond 50%, there is a higher probability that congestion will occur. Keep in mind that at 100% utilization, the delay goes to infinity, which is the direction of these curves.
The service time factor has a modifying affect as shown with these approximate graphs of the delay formula. A very, very fast resource, like a multi-megabit per second transmission link, will have a corresponding very, very low service time, which will keep the delay low even when the utilization is high. This is shown by plot “C” in the diagram. In other words, a fast resource will have a flatter curve and a slow resource, like plot “A”, will have a steeper curve - a line will take longer to form with fast bank tellers compare to slow bank tellers.
The average service time per transaction will be dependent upon two factors – the speed of the servicing entity and the type of transaction being serviced. For example, a very fast cashier at a grocery store in the 10-items-or-less aisle will produce a fast moving, low-delay line (curve C) but that same cashier processing large carts filled with groceries will produce a slow moving, high-delay line (curve A) at the same cashier utilizations. A fast router processing small packets will cause less delay than a fast router processing large packets (one of the reasons Asynchronous Transfer Mode systems use small, 53-byte cells instead of large packets). This idea could be a consideration in system design or trouble shooting - small transactions are handled by service entities faster reducing the probability for transaction delay. In other words, large (delay) transactions will jam up a system faster than small (delay) transactions.
Another circumstance is that all systems will go to infinite wait time as the service element’s utilization approaches 100%; for example, if the bank teller is always busy the customer in line will never get served. This is another interesting observation, especially when applied to curve “C”, which represents a system with a very, very fast service element – like an Ethernet switch. In this case, the service element can run to a utilization level of 95%, 96%, or 97% without imparting very much delay when handling the packets. However, this only leaves 5%, 4%, or 3% of utilization before the service element gets to 100%, which we all now know what happens to the wait time at 100% utilization.
A slight increase in utilization may occur by simply adding another packet generating source to the network, such as another computer or application, and often leads to some level of confusion as to what’s wrong – "it was working fine this morning"! I have also witnessed a situation where the service entity, a mainframe front-end processor (dating myself), was running fine at 95% utilization when an unanticipated circumstance (a shift in application type use during lunch hour) caused the average transaction size to increase slightly, causing the average service time of the front-end processor to slightly increase, causing the utilization of the processor to increase 3%, causing a huge jump in delay through the processor and basically stopping all network activity.
This general delay formula concept is a useful one to describe the circumstances found in data communications networks. There are some underlying assumptions worth noting: the rate of arrivals to the serving resource is random and the way the packets are handled at the serving resource follows a random pattern also. This means that there are no controlling forces that direct the sending or handling of the packets, which is generally the case in most networks today. So, determining the delay of a packet through a switch or router at a good level of accuracy using this formula is possible.
Implications for VOIP Systems
However, for real-time-content-based packets, like VOIP packets, treating all packets the same by routers and switches during a congested period is not recommended. Some designers suggest applying priority handling of VOIP packets over other data packets through switches and routers as voice is more sensitive to delay than most data applications. The problem with this approach is predicting the performance of all packets through switches and routers that implement a priority scheme. There are more exotic queuing formulas that can approach a solution to this complex queuing situation but they are not easy to apply. Simulation of such a system would be a more accurate prediction method, but there are issues as well using simulation-based forecasting tools.
Another solution is simply to avoid congestion, as without congestion there is no queue. If the service rate is always greater than the arrival rate, even for peak random times, there will be a very low probability of any packets waiting. The phrase “throw bandwidth at the problem” captures the essence of this thought. A very, very fast switch; a very, very fast router; or a very, very fast transmission link will greatly reduce queuing delays for all packets and the use of complex priority mechanisms can be avoided. A serving entity that has a very low service time will minimize delays for all arrivals. Also, watch utilizations; when they get high reengineer the network by adding more switches, routers, bandwidth, etc.
One final comment: Queuing theory can be applied to any shared resource situation, which would include components of a device such as its memory, CPU, hard drive, internal data bus, etc. Here again, the busier the CPU the longer the wait for service. Another useful application is the rough prediction of a file server’s performance, which is the topic of another newsletter - see SIP Server Capacity Planning.
These types of tips and insights can be found in all training classes provided by McGuire Consulting. The high quality nature of these courses is based on the many years of work and training experience of the author, Jay D. McGuire.