Total Pageviews

Friday, 2 June 2023

Doorman:全球分布式客户端速率限制工具

 Doorman 是 Youtube 开源的全球分布式客户端速率限制工具,使用 Go编写,基于 gRPC 通讯协议。

Doorman 是全球分布式客户端速率限制解决方案,客户端跟一个共享资源(比如一个数据库,gRPC 服务,RESTful API 或者其他)的通讯可以使用 Doorman 自动限制他们对资源库的使用(每秒的请求数)。Doorman 的一些高可用特性需要依赖分布式锁管理器。Doorman 当前支持 etcd,但相对简单的需求可以使用 Zookeeper 代替。

Doorman 当前是 Alpha 版本,服务器和 Go 客户端已经在 Google (功能和负载测试) 上经过了大量测试,所以是可用的。但是因为开源代码,需要从 Google 内部技术切换到其他开源替代,所以需要更多的测试。

Doorman 短期计划:

  • C++ 客户端
  • Python 客户端
  • Docker 镜像

Doorman 长期计划:

  • Ruby 客户端
  • 完全的语义化版本

注意:Doorman 并不是官方的 Google 软件。

Doorman 遵循 Apache 2.0 开源授权协议。

GitHub 地址:https://github.com/youtube/doorman

--------------------------------------------------------------

 Doorman: Global Distributed Client Side Rate Limiting.

Doorman

Build Status

Doorman is a solution for Global Distributed Client Side Rate Limiting. Clients that talk to a shared resource (such as a database, a gRPC service, a RESTful API, or whatever) can use Doorman to voluntarily limit their use (usually in requests per second) of the resource. Doorman is written in Go and uses gRPC as its communication protocol. For some high-availability features it needs a distributed lock manager. We currently support etcd, but it should be relatively simple to make it use Zookeeper instead.

Getting started

The purpose of Doorman is to apportion and distribute capacity to clients based on some definition of fairness. The capacity a client gets for a resource depends on four things:

  • The configured maximum capacity for the resource (world-wide).
  • The capacity need (wants) of this client.
  • The capacity needs (wants) of all other clients on the planet.
  • The exact algorithm used (as defined by the configuration) to apportion the capacity among all the clients.

The Doorman master server remembers all clients that currently have capacity and whenever a client asks for capacity it inserts the clients request into its memory and runs the algorithm to figure out what this client should get.

Lease length and refresh interval

Doorman only gives out capacity for a limited amount of time, in the form of leases. Each capacity grant comes with a lease length: The client is guaranteed that amount of capacity for the duration of the lease. A typical lease length is five minutes. On top of the lease length the Doorman server also returns a refresh interval. This is the interval after which the client is expected to check back in to get a new lease. A typical refresh interval is five seconds.

Note: The Doorman system is cooperative. The clients are expected to honour the capacity grant, the lease length, and the refresh interval. The system provides no protection against misbehaving clients.

In the normal operation of the system the clients all check in regularly with the server to refresh their capacity. The server knows of all clients and their resource needs, and on every request makes the best possible apportionment of the capacity. For optimization purposes (reduce qps on the Doorman server) the client code does bulk refreshes for all resources whenever it sends out a request to the Doorman server. This means that under specific circumstances (for instance when registering a new resource) a resource might get its capacity refreshed a bit sooner than expected.

The Doorman configuration specifies which algorithm should be used to distribute capacity among all clients. The page on algorithms explains which algorithms currently are available and how they apportion capacity to each client.

The two parameters lease_length and refresh_interval optimize a number of different behaviors of the system:

  • The load on the Doorman server.
  • The speed with which the system converges as resource needs change and clients appear and disappear.
  • How the system deals with the Doorman server being unreachable or slow.

When the Doorman server goes unreachable and comes back

When a client cannot reach the Doorman server the following happens:

  • The client misses one or more refresh intervals. This does not matter much for the client other than that the capacity the client has is not adjusted for potentially changed resource needs.
  • When the Doorman server is unavailable for a longer period of time leases expire and the resources revert to their configured safe capacity. This can be either:
    • -1, meaning an unbounded (infinite) rate limit, or
    • 0, meaning that all access to the resource is blocked, or a positive number
  • As soon as the Doorman server becomes reachable again the clients will resume requesting capacity.

Note: Doorman uses multiple servers in different clusters and a master election procedure to determine the current master.

Doorman does not share or store its internal database. That means that when a Doorman server becomes the master it starts with an empty repository of clients and outstanding leases. However this is not as problematic as it seems, because once the server is available all clients will start calling it to refresh their leases. Since the Doorman server knows that it does not have enough information to run its algorithms it will simply return the currently assigned capacity, or zero if it is a request from a client which currently does not have capacity for the resource. The server knows the currently assigned capacity because clients helpfully include it in every GetCapacity RPC. This phase of the server is known as learning mode.

During the learning mode of a resource every request will be answered with a new lease for the same capacity the client currently has. Practically speaking after a couple of refresh intervals the server can be reasonably sure that it has been contacted by every existing client out there. However for reasons of safety the default learning mode duration is the same as the lease length. This decision ensures that when the learning mode duration expires we can be sure that there are no leases out there that we don't know about (because these would have expired by then). If you want the system to converge faster after a Doorman master election you can explicitly configure a learning_mode_duration in the resource template (see the page on the Configuration of the system for more information).

Who wants what?

Doorman requires the clients to inform it of the desired capacity (the so-called wants). If you are using the low-level Doorman client you need to figure out your capacity need and call the appropriate methods to make sure that the client library requests that amount of capacity during its refresh cycle. However if you use the rate limiter objects provided by the Doorman clients the desired capacity is determined automatically by observing the behavior of the threads that want to access the resource. This automatic wants determination uses a moving average to smoothen out any spikes.

Next steps

Status and Plans

Doorman should be currently considered Alpha quality software. The server and Go client received a decent amount of testing at Google (both functional and load testing), so we are pretty confident they do what they are supposed to do. However, in the process of open-sourcing the code we switched from internal Google technologies to their Open Source equivalents – and this needs more testing. Finally, there's no proper versioning scheme at the moment.

Short term plans:

  • C++ client;
  • Python client;
  • Docker image;

Longer term plans:

  • Ruby client;
  • Proper semantic versioning.

Installation

First, you need to have Go installed. You can either follow the official installation instructions or, on OS X, just do

brew install go

As part of the initial setup, you have to set GOPATH, wihch is the location where Go keeps all its sources and binary artifacts.

export GOPATH=...

With this out of the way, Doorman is just one go get away:

go get github.com/youtube/doorman/go/cmd/doorman

If you are interested in a checkout of Doorman that you can modify, you can do:

mkdir -p $GOPATH/src/github.com/youtube
git clone git@github.com:youtube/doorman.git

Go version <= 1.5

If you are using a version of Go earlier than 1.6, you will need to set an environment variable to enable vendoring (see https://golang.org/s/go15vendor):

export GO15VENDOREXPERIMENT=1
from https://github.com/youtube/doorman  

 

No comments:

Post a Comment