LINUX &&|| PROGRAMMING

Zanzibar: Google Global Authorization System

Finally I had a chance to go into details about Zanzibar - Google global authorization system. I already mentioned it in OpenFGA overview where authors said that they based their solution on Zanzibar architecture principles.

Let's check how the system that performed millions of authorization checks per minute is organized:

✏️ Any authorization rule takes a form of a tuple `user U has relation R to object O`. For example, User 15 is an owner of doc:readme. This unification helps to support efficient reads and incremental updates.

✏️ Zanzibar stores ACLs and their metadata in Google Spanner database. Zanzibar logic strongly relies on Spanner external consistency guarantees. So each ACL update gets a timestamp that reflects its order. If update x happens before y, then x has an earlier timestamp.

✏️ Each ACL is identified by shard ID, object ID, relation, user, and commit timestamp. Multiple tuple versions are stored in different rows, that helps to evaluate checks and reads at any timestamp within the garbage collection window (7 days).

✏️ Each Zanzibar client gets a special consistency token called zookie. Zookie contains the current global timestamp. Client uses zookie to ensure that authorization check is based on ACL data at least as fresh as the change.

✏️ Zookies are also used in read requests to guarantee that clients get a data snapshot not earlier than a previous write.

✏️ Incoming requests are handled by aclservers clusters. Each server in the cluster can delegate intermediate results computation to other servers.

✏️ To provide performance isolation Zanzibar measures how much CPU each RPC uses in cpu-seconds. Each client has a global CPU usage limit, and if it goes over, its requests may be slowed down. Each aclserver also limits the total number of active RPCs to manage memory usage.

✏️ Request hedging with 99th percentile threshold is used to reduce tail-latency.

According to the whitepaper, authorization checks are performed for each object independently. It means that each search request for service like Drive or Youtube can trigger from tens to hundreds of authorization checks. That's why the overall architecture is heavily focused on keeping authorization request latency as low as possible.

Implementation results are impressive: Zanzibar handles over 2 trillion relation tuples, that occupy more than 100 terabytes of storage. The load is spread across 10,000+ servers in dozens of clusters worldwide. Despite that scale, it keeps the 95th percentile latency at ~9 ms for in-zone requests and ~60 ms other requests.

#systemdesign #usecase #architecture

research.google

Zanzibar: Google’s Consistent, Global Authorization System

28 views10:24

About

Blog

Apps

Platform