LINUX &&|| PROGRAMMING

Hashicorp Plugin Ecosystem

When Go didn't have a plugin package, Hashicorp implemented their own plugin architecture. The main difference from other plugin systems is that it works over RPC. At first, that might sound a bit unusual, but the approach shows really good results and it is actively used in many popular products like Hashicorp Valut, Terraform, Nomad, Velero.

Key concepts:
✏️ Plugin is a binary that runs an RPC (or gRPC) server.
✏️ A main application loads plugins from a specified directory and runs them as OS child processes.
✏️ A single connection is made between each plugin and the host process.
✏️ The connection is bidirectional, so plugin can also call application APIs.
✏️ Plugin and the application itself must be on the same host and use local network only, no remote calls are allowed.
✏️ Each plugin provides a protocol version that can be used as its API version.
✏️ A special handshake is used to establish a connection. The plugin writes its protocol version, network type, address and protocol to stdout, and the main app uses this information to connect.

Benefits of the approach:
✏️ Plugins can't crash the main process
✏️ Plugins can be written in different languages
✏️ Easy installation - just put a binary into the folder
✏️ Stdout/Stderr Syncing. While plugins are subprocesses, they can continue to use stdout/stderr as usual and their output will get mirrored to the host app.
✏️ Host upgrade while a plugin is running. Plugins can be "reattached" so that the host process can be upgraded while the plugin is still running.
✏️ Plugins are secure: Plugins have access only to the interfaces and args given to it, not to the entire memory space of the process.

In cloud ecosystem, plugins can be delivered as init containers. During startup, the plugin binary from the init container is copied into the main app container.

If you're designing some pluggable architecture, Hashicorp RPC Plugins is definitely the approach to look at.

#systemdesign #engineering

51 views09:14

Forwarded from TechLead Bits

Latency Insurance: Request Hedging

One more interesting concept I came across recently is request hedging. I've not seen it's actively used in enterprise software, but it can be useful for some scenarios where tail latency is critical.

Imagine that service A calls service B, service B has multiple instances. These instances can have different response times—some are fast, some are slow. There are number of protentional reasons for such behavior, but we'll skip them for simplicity.

Request hedging is a technique where the client sends the same request to multiple instances, uses the first successful response, and cancel the other requests.

Obviously, if you do this for all requests, the system load will increase and the overall performance will degrade.

That's why hedging is usually applied only to a subset of requests.

The following strategies are used for a request selection:
✏️ Token Buckets. Use a token bucket that refills every N operation and send a sub-request only if there is an available token (rate-limiting).
✏️ Slow Responses Only. Send hedged requests only if the first request takes longer than a specific latency threshold (95th percentile, 99th percentile)
✏️ Threshold. Send hedge requests only if Nth percentile latency exceeds expectation. For example, if the threshold is the 99th percentile, only 1% of requests will be duplicated.

Request hedging is efficient approach to reduce tail latency. It prevents occasional slow operations from slowing the overall user interaction. But If the variance in a system is already small, then request hedging will not provide any improvement.

#systemdesign #patterns

42 views11:00

LINUX &&|| PROGRAMMING

Forwarded from TechLead Bits

Zanzibar: Google Global Authorization System

Finally I had a chance to go into details about Zanzibar - Google global authorization system. I already mentioned it in OpenFGA overview where authors said that they based their solution on Zanzibar architecture principles.

Let's check how the system that performed millions of authorization checks per minute is organized:

✏️ Any authorization rule takes a form of a tuple `user U has relation R to object O`. For example, User 15 is an owner of doc:readme. This unification helps to support efficient reads and incremental updates.

✏️ Zanzibar stores ACLs and their metadata in Google Spanner database. Zanzibar logic strongly relies on Spanner external consistency guarantees. So each ACL update gets a timestamp that reflects its order. If update x happens before y, then x has an earlier timestamp.

✏️ Each ACL is identified by shard ID, object ID, relation, user, and commit timestamp. Multiple tuple versions are stored in different rows, that helps to evaluate checks and reads at any timestamp within the garbage collection window (7 days).

✏️ Each Zanzibar client gets a special consistency token called zookie. Zookie contains the current global timestamp. Client uses zookie to ensure that authorization check is based on ACL data at least as fresh as the change.

✏️ Zookies are also used in read requests to guarantee that clients get a data snapshot not earlier than a previous write.

✏️ Incoming requests are handled by aclservers clusters. Each server in the cluster can delegate intermediate results computation to other servers.

✏️ To provide performance isolation Zanzibar measures how much CPU each RPC uses in cpu-seconds. Each client has a global CPU usage limit, and if it goes over, its requests may be slowed down. Each aclserver also limits the total number of active RPCs to manage memory usage.

✏️ Request hedging with 99th percentile threshold is used to reduce tail-latency.

According to the whitepaper, authorization checks are performed for each object independently. It means that each search request for service like Drive or Youtube can trigger from tens to hundreds of authorization checks. That's why the overall architecture is heavily focused on keeping authorization request latency as low as possible.

Implementation results are impressive: Zanzibar handles over 2 trillion relation tuples, that occupy more than 100 terabytes of storage. The load is spread across 10,000+ servers in dozens of clusters worldwide. Despite that scale, it keeps the 95th percentile latency at ~9 ms for in-zone requests and ~60 ms other requests.

#systemdesign #usecase #architecture

research.google

Zanzibar: Google’s Consistent, Global Authorization System

28 views10:24

About

Blog

Apps

Platform