The abstraction-optimization tradeoff
There are several well-known tradeoffs in computer science. For example, the time-space tradeoff, where often you can use more memory to compute an answer faster. Or the time-accuracy tradeoff, where you can compute an approximate answer quickly, and a precise one given more time.
But one tradeoff that I think is widely known in folklore but not widely taught is the abstraction-optimization tradeoff.
Before I try to explain what it is, let me give two examples.
- Compilation to a portable format, and its execution: Let’s take the concrete case of the Java language, and the Java Virtual Machine. The two are distinct. A compiler transforms your pretty high-level Java source code into Java bytecode, which is essentially portable assembly language1. Now bytecode is still at a higher level of abstraction than the real assembly language of your hardware architecture, which is why you need a Java Virtual Machine to further compile that bytecode into fast, efficient machine code.
Java source has rich structure. It can be represented by an abstract syntax tree, which is the first thing the compiler does with the source code. The process of conversion to bytecode is the process of whittling down that structure to something simple and flat. Bytecode is simple, flat and structureless compared to the source that it came from.
The rules bytecode must adhere to are much more lax compared to those for Java source. In fact, you can write bytecode which has no corresponding legal Java source2. This is the reason why the JVM needs to verify bytecode before executing it3.
When it’s time to execute bytecode and further compile it down to the machine code for the real hardware its running on, the very first thing the JVM does is build a control flow graph4, which is a feeble attempt at recovering some of that high-level structure of the original source code that was lost in the process of compiling down to bytecode. This is a prerequisite for almost every compiler optimization. Most optimizations depend on teasing out high-level structure from the bytecode.
We started with something with rich structure (source code), compiled it down to something flat and structureless (bytecode), and then had to infer some of that lost structure when it was time to execute and optimize bytecode.
The astute reader might suggest at this point: why not simply make the portable format, the bytecode, something that is higher-level and encodes more structure? I spent a couple of years in grad school exploring that route56. It had some compelling advantages over bytecode – it was safe, much more compact, and had the high-level structure necessary for optimization. But it was a non-starter for the “real” world because such a high-level encoding was too coupled to the Java language. If you wanted a portable format for another language, you would have to build a new encoding.
The upside of having a low-level bytecode format is that it decouples the source from the bytecode. This is what allows multiple languages to compile down into bytecode and inter-operate (Jython, JRuby etc).
And that is exactly the tradeoff I’m getting at.
But first, one more example. Because two points are all I need to fit my curve.
- Effectively Caching IO in an operating system: A similar effect takes place in the caching and IO sub-systems of operating system kernels. Operating systems cache frequently accessed data in memory, to avoid hitting disk as much as possible. This cache is critical for achieving anything like reasonable performance. But the operating system has very little information to go on. It doesn’t know which data is going to get accessed next. So most use a least-recently-used strategy as a next best guess. The longer data in the cache remains un-accessed, the likelier it is to get pushed out. And this usually works pretty well.
So who does know more about data access patterns? The application! And it can tell the OS what kinds of patterns to expect, using “hinting” mechanisms like the POSIX fadvise7 call.
Again we see the same pattern. Higher levels in the stack have richer and more structured knowledge, most of which is lost as it travels down to the lower layers of the stack, but that very knowledge turns out to be critical when optimizing for performance. In this case, the application has knowledge of IO access patterns, which is lost in the OS kernel, but is critical for achieving decent IO performance.
Exokernels8 was an attempt to solve this problem. As they put it, “An exokernel eliminates the notion that an operating system should provide abstractions on which applications are built. Instead, it concentrates solely on securely multiplexing the raw hardware: from basic hardware primitives, application-level libraries and servers can directly implement traditional operating system abstractions, specialized for appropriateness and speed. ” But with great performance comes a great price, and that price is that they have done away with abstraction. Every time you write an application, you have to pick and choose the OS primitives you want, and think about how you want to wire them together. That’s a big burden.
With the examples behind us, let me make another attempt at phrasing the tradeoff:
The abstraction-optimization tradeoff is between building upon layers of abstraction and achieving optimal performance. Building upon abstraction is often quicker and results in easier to understand solutions, but they might have sub-optimal performance. Sacrificing abstraction leads to near-optimal performance, but at the cost of a more complex solution. The underlying assumption is that richer, more structured information about the thing being optimized (code, time, space) exists at higher layers in the abstraction stack, is lost in the lower layers, yet is critical for achieving optimal performance.
Strictly speaking, it is stack-based and not register-based, like most common architectures (x86, PowerPC, ARM etc) in use today. ↩︎
Security Threats from Deviant Java Byte Code ↩︎