队列理论简介

Why Disaster Happens at the Edges: An Introduction to Queue Theory

样例代码


  • When it comes to IT performance, amateurs look at averages. Professionals look at distributions.
  • The greater the variance in a system, the more of those outlier experiences there will be — and the more expensive it is to handle or avoid them.
  • Queues are everywhere in digital systems: executors, sockets, locks. Any process that operates asynchronously probably depends on a queue.

A queue’s performance depends on several factors including:

  1. Arrival rate: how many jobs arrive at the queue in a certain amount of time
  2. Service rate: how many jobs can be served in a certain amount of time
  3. Service time: how long it takes to process each job
  4. Service discipline: how jobs are prioritized (FIFO/LIFO/Priority)

This suggests an important rule of thumb: for decent quality of performance, keep utilization below 75%.
This means provisioning, not for typical loads but extreme ones. Without overcapacity, queues will form and latency will increase.


The lesson: whatever its source, variance is the enemy of performance.


The question then is what to do about it. It’s basically a trade-off between latency and error rate.
If we don’t return errors, or find an alternative way to shed the excess load, then latency will inevitably increase.


  • One approach, as we’ve seen, is to cap queue size and shed any job over the limit. However it’s handled, the goal is to remove excess requests from the overloaded queue.
  • Another closely related approach is to throttle the arrival rate. Instead of regulating the absolute number of jobs in the queue, we can calculate how much work the queue can handle in a given amount of time and start shedding load when the arrival rate exceeds the target.

Architecture Takeaways

To sum up: Variance is the enemy of performance and the source of much of the latency we encounter when using software.

To keep latency to a minimum:

  • As a rule of thumb, target utilization below 75%
  • Steer slower workloads to paths with lower utilization
  • Limit variance as much as possible when utilization is high
  • Implement backpressure in systems where it is not built-in
  • Use throttling and load shedding to reduce pressure on downstream queues

It follows that developers should aim to design and implement software that delivers not just high average performance, but consistent performance, i,e, with low variance.
Even an incremental reduction in variance can improve the user experience more than an equivalent increase in raw average speed.

The mean isn’t meaningless. It’s just deceptive.
To transform user experience, the place to put your effort isn’t in the middle of the curve but on the ends.
It’s the rare events — the outliers — that tell you what you need to know.

微服务编排框架

发现一个优秀的微服务编排框架:zeebe

微服务核心研究之–编排

当一个系统采用了微服务架构后,会拆分成很多新的微服务,但原有的业务可能还是没有变化,如何在微服务架构下实现原有的业务?相对于传统架构,微服务架构下更需要通过各微服务之间的协作来实现一个完整的业务流程,可以说服务编排是微服务架构下的必备技能。但是,编排涉及到RPC、分布式事务等等,编排的质量不能仅仅取决于老师傅的手艺,需要有完善的编排框架来支撑。

关于微服务的组合(协调):
编制(Orchestration)—— 面向可执行的流程:通过一个可执行的流程来协同内部及外部的服务交互。通过中心流程来控制总体的目标,涉及的操作,服务调用顺序。
编排(Choreography)—— 面向合作:通过消息的交互序列来控制各个部分资源的交互。参与交互的资源都是对等的,没有集中的控制。

service mesh

A service mesh is a dedicated infrastructure layer that controls service-to-service communication over a network. It provides a method in which separate parts of an application can communicate with each other. Service meshes appear commonly in concert with cloud-based applications, containers and microservices.

A service mesh is in control of delivering service requests in an application. Common features provided by a service mesh include service discovery, load balancing, encryption and failure recovery. High availability is also common through utilizing software controlled by APIs rather than utilizing hardware. Service meshes can make service-to-service communication fast, reliable and secure.

As an example, an application structured in a microservices architecture might be composed of hundreds of services, all with their own instances operating in a live environment. This could make it challenging for developers to keep track of which components must interact, and make changes to their application if something goes wrong. Including communication protocols in a service rather than in a separate and dedicated layer would make the process of keeping track and making changes to an application fairly complex. Utilizing a service mesh allows developers the ability to separate service-to-service communication into a dedicated layer.

An organization may choose to utilize an API gateway, which handles protocol transactions, over a service mesh. However, developers must update the API gateway every time a microservice is added or removed.

How a service mesh works
A service mesh architecture uses a proxy instance called a sidecar in whichever development paradigm is in use, commonly containers and/or microservices. In a microservice application, a sidecar will attach to each service. In a container, the sidecar is attached to each application container, VM or container orchestration unit, such as a Kubernetes pod.

Sidecars can handle tasks abstracted from the service itself, such as monitoring and security.

Service instances, sidecars and their interactions make up what is called the data plane in a service mesh. A layer called the control plane manages tasks such as creating instances, monitoring and implanting policies, such as network management or network security policies. Control planes can connect to a CLI or a GUI interface for application management.

Service mesh benefits and drawbacks
A service mesh addresses some large issues with managing service-to-service communication, but not all. Some advantages of a service mesh include:

Simplifies communication between services in both microservices and containers.
Easier to diagnose communication errors, since they would occur on their own infrastructure layer.
Supports security features such as encryption, authentication and authorization.
Allows for faster development, testing and deployment of an application.
Sidecars placed next to a container cluster is effective in managing network services.
Some downsides to service meshes include:

Runtime instances increase by utilizing a service mesh.
Adds an extra step where each service call must first run through the sidecar proxy.
Service meshes do not address issues such as integrating with other services or systems and routing type or transformation mapping.
The service mesh market
A service mesh is commonly available as an open source technology from diverse creators. It can also be consumed as a service from major cloud providers.

Istio is an open source service mesh provided by Google, IBM and Lyft. Istio is designed as a universal control plane first targeted for Kubernetes deployments, but can be used on multiple platforms. Its data plane relies on proxies called Envoy sidecars. This service mesh features security measures such as identity and key management. It also supports fault injection and hybrid deployment.

Istio service mesh
The Istio service mesh architecture is one of the major designs available.
Linkerd is another open source, multiplatform service mesh. Linkerd was developed by Buoyant and is built on Twitter’s Finagle library. This service mesh supports platforms such as Kubernetes, Docker and Amazon ECS. Features include built-in service discovery and control plane, Namerd.