队列理论简介

Why Disaster Happens at the Edges: An Introduction to Queue Theory

样例代码


  • When it comes to IT performance, amateurs look at averages. Professionals look at distributions.
  • The greater the variance in a system, the more of those outlier experiences there will be — and the more expensive it is to handle or avoid them.
  • Queues are everywhere in digital systems: executors, sockets, locks. Any process that operates asynchronously probably depends on a queue.

A queue’s performance depends on several factors including:

  1. Arrival rate: how many jobs arrive at the queue in a certain amount of time
  2. Service rate: how many jobs can be served in a certain amount of time
  3. Service time: how long it takes to process each job
  4. Service discipline: how jobs are prioritized (FIFO/LIFO/Priority)

This suggests an important rule of thumb: for decent quality of performance, keep utilization below 75%.
This means provisioning, not for typical loads but extreme ones. Without overcapacity, queues will form and latency will increase.


The lesson: whatever its source, variance is the enemy of performance.


The question then is what to do about it. It’s basically a trade-off between latency and error rate.
If we don’t return errors, or find an alternative way to shed the excess load, then latency will inevitably increase.


  • One approach, as we’ve seen, is to cap queue size and shed any job over the limit. However it’s handled, the goal is to remove excess requests from the overloaded queue.
  • Another closely related approach is to throttle the arrival rate. Instead of regulating the absolute number of jobs in the queue, we can calculate how much work the queue can handle in a given amount of time and start shedding load when the arrival rate exceeds the target.

Architecture Takeaways

To sum up: Variance is the enemy of performance and the source of much of the latency we encounter when using software.

To keep latency to a minimum:

  • As a rule of thumb, target utilization below 75%
  • Steer slower workloads to paths with lower utilization
  • Limit variance as much as possible when utilization is high
  • Implement backpressure in systems where it is not built-in
  • Use throttling and load shedding to reduce pressure on downstream queues

It follows that developers should aim to design and implement software that delivers not just high average performance, but consistent performance, i,e, with low variance.
Even an incremental reduction in variance can improve the user experience more than an equivalent increase in raw average speed.

The mean isn’t meaningless. It’s just deceptive.
To transform user experience, the place to put your effort isn’t in the middle of the curve but on the ends.
It’s the rare events — the outliers — that tell you what you need to know.