Forwarded from TechLead Bits
Latency Insurance: Request Hedging
One more interesting concept I came across recently is request hedging. I've not seen it's actively used in enterprise software, but it can be useful for some scenarios where tail latency is critical.
Imagine that service A calls service B, service B has multiple instances. These instances can have different response times—some are fast, some are slow. There are number of protentional reasons for such behavior, but we'll skip them for simplicity.
Request hedging is a technique where the client sends the same request to multiple instances, uses the first successful response, and cancel the other requests.
Obviously, if you do this for all requests, the system load will increase and the overall performance will degrade.
That's why hedging is usually applied only to a subset of requests.
The following strategies are used for a request selection:
✏️ Token Buckets. Use a token bucket that refills every N operation and send a sub-request only if there is an available token (rate-limiting).
✏️ Slow Responses Only. Send hedged requests only if the first request takes longer than a specific latency threshold (95th percentile, 99th percentile)
✏️ Threshold. Send hedge requests only if Nth percentile latency exceeds expectation. For example, if the threshold is the 99th percentile, only 1% of requests will be duplicated.
Request hedging is efficient approach to reduce tail latency. It prevents occasional slow operations from slowing the overall user interaction. But If the variance in a system is already small, then request hedging will not provide any improvement.
#systemdesign #patterns
One more interesting concept I came across recently is request hedging. I've not seen it's actively used in enterprise software, but it can be useful for some scenarios where tail latency is critical.
Imagine that service A calls service B, service B has multiple instances. These instances can have different response times—some are fast, some are slow. There are number of protentional reasons for such behavior, but we'll skip them for simplicity.
Request hedging is a technique where the client sends the same request to multiple instances, uses the first successful response, and cancel the other requests.
Obviously, if you do this for all requests, the system load will increase and the overall performance will degrade.
That's why hedging is usually applied only to a subset of requests.
The following strategies are used for a request selection:
✏️ Token Buckets. Use a token bucket that refills every N operation and send a sub-request only if there is an available token (rate-limiting).
✏️ Slow Responses Only. Send hedged requests only if the first request takes longer than a specific latency threshold (95th percentile, 99th percentile)
✏️ Threshold. Send hedge requests only if Nth percentile latency exceeds expectation. For example, if the threshold is the 99th percentile, only 1% of requests will be duplicated.
Request hedging is efficient approach to reduce tail latency. It prevents occasional slow operations from slowing the overall user interaction. But If the variance in a system is already small, then request hedging will not provide any improvement.
#systemdesign #patterns