Joachim Breitner's Homepage
Faster Winter 7: The Zipper
(This is the seventh optimization presented in the “faster winter” series, please see that post for background information.)
The last bit of performance optimization could be considered a domain-specific optimization, as one might describe it as “introducing a control stack to the interpreter”. But in a different light, one could consider it the introduction of a more appropriate data structure, by applying a “zipper”-like transformation to the existing data structure.
The problem is that the state of the system (datatype Code
) is a stack of values and a stack of instructions. Simplifying this for this post, we might have
data Code = Code [Value] [Instr]
data Instr
= Const Value | Add | Sub | Return
| Labeled Int Code
The interpreter gets such a Code
, but does not always just execute the top-most instruction: If that is a Labeled
instruction, it has to execute the next instruction in the argument of that Labeled
. (It is not very important at this point to understand what a Labeled
is used for). So we might have a Code
that looks like the following:
= Code [] [Labeled 1 (Code [2] [Labeled 2 (Code [3,4] [Add]), Sub]), Return] c1
The next instruction to execute is actually the Add
. But in order to find that, the function step
has to look under the first Labeled
, look under the second Labeled
, then execute step (Code [3,4] [Add]) = Code [7] []
, and finally wrap it again in the two Labeled
, to produce:
= Code [] [Labeled 1 (Code [2] [Labeled 2 (Code [7] []), Sub]), Return] c2
Then eval
calls step
again, and step
has to look inside the Labeled
again to find the next instruction to execute.
It would be easier if the next instruction to execute would be presented to step
more conveniently, right as a field of Code
. But in order to do this, we have to move the Labeled
entries “out of the way”. I do that by adding a new, first parameter to Code
where I keep a stack of all the Label
constructor that were in the way, in reverse order. So the c1
above would now be
data Code = Code Control [Value] [Instr]
data Control = Empty | Labeled Int Code
= Code (Labeled 2 (Code (Labeled 1 (Code Empty [] [Return])) [2] [Sub]) [3,4] [Add] c1'
Can you see how this relates to c1
above? The important piece here is that the interpreter finds the next instruction to execute always at the head of the instruction list right of the outermost code, and as long as there is something to execute there, it doesn’t have to touch the Control
stack at all.
This required touching some more lines of code than the previous optimizations, but doing so didn’t require much creativity, as the old and new Code
types are in clear correspondance, and that guides me in how to use adjust the code. And it’s certainly worth it:
Improvement: Allocations: -46.17% Time: -36.88% (Commit e66f1e0…57f3e9d)
Have something to say? You can post a comment by sending an e-Mail to me at <mail@joachim-breitner.de>, and I will include it here.