Clark's Curious Corollary of Concurrency*

In electrical engineering, there's a number used for characterizing amplifiers called the gain-bandwidth product. We can collapse two variables (gain and bandwidth) into one because it’s possible to trade them off for each other with a simple, generic circuit. It’s a mental shortcut that lets us turn a 2D definition of better" into a 1D definition.

Ideally, a similar shortcut should be possible for software performance. We have two similar metrics that should be interchangeable: throughput and latency.

To trade latency off for throughput, there are countless generic tricks. Most of these tricks exploit parallelism, which is getting more and more "free" as time moves on. One such trick is queueing at the input and output, and using more than one thread to read off the queue and write to the output queue. This can let you double your throughput, with only a slight impact on latency.

Another trick is pipelining. Every non-trivial computation is composed of several steps. Just do each step on a different core, and pass the intermediate results from core to core. This is so powerful it's done on every level, from your web browser down to your CPU. This affects latency more that queueing. But, you get some parallelism out of inherently sequential tasks.

Trading latency off for throughput is a breeze! Now how about trading throughput off for latency?

...

...

Shit.

We can't.

Well, to be fair, we can. But it's always either reversing one of the tricks above, or exploiting some domain-specific parallelism. There's no generic trade-off, no simple feedback loop we can surround our "circuit" with. You can have the most throughput in the world, but if you're taking a full second to respond to something, there's no simple fix. Note that this doesn't mean there are no fixes! There's always "make your software better" or "think really hard of some parallelism to exploit".

What a curious corollary of concurrency! If you lose throughput, but have low latency, you can easily tune it to your needs. But if you have high latency and high throughput, it's difficult and domain-specific to fix. This implies that high performance software is developed with a focus on low latency first and foremost. This should be what's measured. Stop talking about how many requests per second your web server can take. Tell me the 99th percentile latency. Track it. Improve it. When needed, throughput comes easy.

Performance is hard, but we should all be working together towards the things that matter the most. Latency is more important to optimize than throughput. Let's just focus on that.

*Admittedly, the title should read: "Clark's Curious Corollary of Parallelism". That just didn't read as nice.