Joachim Breitner's Homepage
Faster Winter 6: Simpler Code
(This is the sixth optimization presented in the “faster winter” series, please see that post for background information.)
The Wasm reference interpreter has a function step
that performs one single one step of evaluation, by taking the state before (in a type called code
), and returning the state after the step. The function eval
calls step
, checks if the result is “done” (no instructions left to do), and if it is not done, recursively calls eval
again. This way, evaluation progresses step by step until it is done.
The Haskell port follows the same pattern of a single-step step
and a driver eval
, but it chose to write the code continuation passing style: Instead of returning, the function step
takes a function that it passes the result to. So the code looks like this (slightly simplified):
type CEvalT m a = ReaderT Config (ExceptT EvalError m) a
step :: PrimMonad m => Code -> (Code -> CEvalT m r) -> CEvalT m r
= … k new_c …
step c k
eval :: PrimMonad m => Code -> CEvalT m (Stack Value)
=
eval c if is_done c
then stack c
else step c eval
There must have been a good reason to prefer this style over the plain style, but I was curious if it was really helpful. So I changed it to the following, more straight-forward code:
type CEvalT m a = ReaderT Config (ExceptT EvalError m) a
step :: PrimMonad => Code -> CEvalT m Code
= … new_c …
step c k
eval :: PrimMonad => Code f m -> CEvalT f m (Stack Value)
=
eval c if is_done c
then stack c
else step c >>= eval
And indeed, the simpler code worked better:
Improvement: Allocations: -9.6 Time: -16.91% (Commit eb8add3…e66f1e0)
Again, avoiding function values (as we construct them as the contination closure) might have helped here.
Or maybe the simpler code allowed GHC to notice that the Code
value, which is simply a tuple with a different name, is constructed by step
but immediatelly deconstructed by eval
, and thus GHC could optimize that away (by “unboxing” the argument and/or result of step
and simply passing the components of the tuple).
And finally, independent of all the performance questions, it also makes the code easier to understand.
Have something to say? You can post a comment by sending an e-Mail to me at <mail@joachim-breitner.de>, and I will include it here.