etcd v3 认证设计

每连接认证，而不是每请求
- 基于用户ID + 密码的认证，实现为 gRPC API
- 在认证政策修改之后，认证必须刷新
功能应该和v2一样简单
- v3 提供扁平键空间，和v2的目录结构不同。提供权限检查,如间隔匹配(as interval matching)

主要需要更改

客户端必须在发送被验证的请求之前创建仅用于认证的专用连接
添加权限信息(用户 ID 和合法 revision) 到 Raft 命令 (etcdserverpb.InternalRaftRequest)
在状态机层做每个请求的权限检查，而不是在 API 层

认证的元数据也应该在存储中存储和管理，该存储被etcd的Raft协议控制，和其他在etcd中的数据一样。要求不牺牲整个etcd集群的可用性和一致性。如果读取或写入元数据（例如权限信息）需要每个节点（超过法定人数）的同意，则单节点故障会让整个集群停止。要求所有节点立即同意意味着，如果任意集群成员宕机，即使群集具有可用的法定人数，检查普通的读/写请求也无法完成。这种全场一致方案最终会降低集群的可用性; 从raft而来的基于法定人数的共识就足够了，因为合约遵循一致的顺序。

The authentication mechanism in the etcd v2 protocol has a tricky part because the metadata consistency should work as in the above, but does not: each permission check is processed by the etcd member that receives the client request (etcdserver/api/v2http/client.go), including follower members. Therefore, it’s possible the check may be based on stale metadata.

This staleness means that auth configuration cannot be reflected as soon as operators execute etcdctl. Therefore there is no way to know how long the stale metadata is active. Practically, the configuration change is reflected immediately after the command execution. However, in some cases of heavy load, the inconsistent state can be prolonged and it might result in counter-intuitive situations for users and developers. It requires a workaround like this:

Inconsistent permissions are unsafe for linearized requests

Inconsistent authentication state is most serious for writes. Even if an operator disables write on a user, if the write is only ordered with respect to the key value store but not the authentication system, it’s possible the write will complete successfully. Without ordering on both the auth store and the key-value store, the system will be susceptible to stale permission attacks.

At first, a client must create a gRPC connection only to authenticate its user ID and password. An etcd server will respond with an authentication reply. The reponse will be an authentication token on success or an error on failure. The client can use its authentication token to present its credentials to etcd when making API requests.

The client connection used to request the authentication token is typically thrown away; it cannot carry the new token’s credentials. This is because gRPC doesn’t provide a way for adding per RPC credential after creation of the connection (calling grpc.Dial()). Therefore, a client cannot assign a token to its connection that is obtained through the connection. The client needs a new connection for using the token.

Notes on the implementation of `Authenticate()` RPC

Authenticate() RPC generates an authentication token based on a given user name and password. etcd saves and checks a configured password and a given password using Go’s bcrypt package. By design, ‘s password checking mechanism is computationally expensive, taking nearly 100ms on an ordinary x64 server. Therefore, performing this check in the state machine apply phase would cause performance trouble: the entire etcd cluster can only serve almost 10 Authenticate() requests per second.

For good performance, the v3 auth mechanism checks passwords in etcd’s API layer, where it can be parallelized outside of raft. However, this can lead to potential time-of-check/time-of-use (TOCTOU) permission lapses:

client A sends a request Authenticate()
the API layer processes the password checking part of Authenticate()
another client B sends a request of ChangePassword() and the server completes it
the state machine layer processes the part of getting a revision number for the Authenticate() from A
the server returns a success to A

Resolving a token in the API layer

After authenticating with , a client can create a gRPC connection as it would without auth. In addition to the existing initialization process, the client must associate the token with the newly created connection. grpc.WithPerRPCCredentials() provides the functionality for this purpose.

Every authenticated request from the client has a token. The token can be obtained with grpc.metadata.FromIncomingContext() in the server side. The server can obtain who is issuing the request and when the user was authorized. The information will be filled by the API layer in the header (etcdserverpb.RequestHeader.Username and etcdserverpb.RequestHeader.AuthRevision) of a raft log entry (etcdserverpb.InternalRaftRequest).

The auth info in is checked in the apply phase of the state machine. This step checks the user is granted permission to requested keys on the latest revision of auth store.

Two types of tokens: simple and JWT

There are two kinds of token types: simple and JWT. The simple token isn’t designed for production use cases. Its tokens aren’t cryptographically signed and servers must statefully track token-user correspondence; it is meant for development testing. JWT tokens should be used for production deployments since it is cryptographically signed and verified. From the implementation perspective, JWT is stateless. Its token can include metadata including username and revision, so servers don’t need to remember correspondence between tokens and the metadata.

The etcd v3 model requires multiple lookup of the metadata unlike the file system like systems. The worst case lookup cost will be sum the user’s total granted keys and intervals. The cost cannot be avoided because v3’s flat key space is completely different from Unix’s file system model (every inode includes permission metadata). Practically the cost won’t be a serious problem because the metadata is small enough to benefit from caching.