Joachim Breitner's Homepage
Faster Winter 5: Eta-Expanding ReaderT
(This is the fifth optimization presented in the “faster winter” series, please see that post for background information.)
Another good approach to performance turning is look at the code after GHC optimized it. So I planted a
{-# OPTIONS_GHC -ddump-simpl -dsuppress-coercions -dsuppress-unfoldings -dsuppress-module-prefixes #-}
at the top of Eval.hs
, and looked through the code. This is not Haskell, but rather “GHC Core”, a simpler functional programming language that GHC uses internally. It is more verbose and less pretty, so it takes a bit of getting used to, but it’s nothing to be afraid of when you are a somewhat experienced Haskell developer.
There I found (much simplified) the following pattern:
step :: Instr -> Config -> IO Result
= case e of
step e Add -> \c -> … do stuff …
Del -> \c -> … do stuff …
…
That’s bad! Yes, Haskell is a functional language, and passing around anonymous functions is very nice to write expressive code, and for most purposes it is not too slow … but in an inner loop, you really don’t want any such closures. So where did this come from? And as expected, the Haskell source did not have those inner lambdas. Instead, it was using a very innocent looking monad transformer:
step :: Instr -> ReaderT Config IO Result
= case e of
step e Add -> do stuff …
Del -> do stuff …
…
A ReaderT r m a
is just a different way of writing r -> m a
that allows us to use do
-notation or the monad combinators without having to pass the r
around explicity, and as such it is indeed very convenient. But not as efficient as if we had written
step :: Instr -> Config -> IO Result
= case e of
step e c Add -> … do stuff …
Del -> … do stuff …
…
where the step
function takes two arguments right away, and no anonymous functions are created.
Why doesn’t our amazing Haskell compiler figure out that this would be better? Because it is not better in all situations: If we store step e :: ReaderT Config IO Result
somewhere and and use it many times, with the same e
but passing many different c :: Config
, then we have to do the case e
analysis only once. This can sometimes be better, so the compiler has to leave it in that form, in case we did it intentionally.
(Incidentially, the question of how to allow the compiler to eta-expand more functions seems to eternally haunt me, and its pursuit even led to a PhD thesis.)
So how can we fix it? One relatively crude way is to shove it into the compiler face that we really want step
to be a function with two parameters by wrapping the whole body in, well, a lambda.. But we still want to use the Reader
monad in the body of step
…
So I came up with this:
step :: Instr -> ReaderT Config IO Result
= ReaderT $ \c -> ($ c) $ runReaderT $ case e of
step e Add -> … do stuff …
Del -> … do stuff …
…
Now the \c ->
is outside the case, the compiler adds it to the arguments of step
and we get the code that we want (confirmed by a quick peek at the Core).
Improvement: Allocations: -23.20% Time: -23.00% (Commit f5a0dd2…894070f)
I used this pattern in more than once place, so I wanted to abstract it into a little helper definition. But that’s not so easy: If I just write
etaReaderT :: ReaderT r m a -> ReaderT r m a
= ReaderT $ \c -> ($ c) $ runReaderT m
etaReaderT m
step :: Instr -> ReaderT Config IO Result
= etaReaderT $ case e of
step e Add -> … do stuff …
Del -> … do stuff …
…
then the whole thing doesn’t work any more! Because now, the case e
is again “outside” the \c ->
.
I whined on twitter about this, and Sebastian Graf reminded me helpfully of GHC.Exts.oneShot
, a little magic function that was added to GHC 5 years ago … by some forgetful person: me.
If we use this in the right place inside etaReaderT
it tells GHC in a soothing voice “hey! it’s ok! you can move this lambda out of cases. believe me. it’s gonna be ok”. And with this, it works:
etaReaderT :: ReaderT r m a -> ReaderT r m a
= ReaderT . oneShot . runReaderT etaReaderT
I wonder if this function would make a good addition to Control.Monad.Trans.Reader
.
Incidentally, if you look at the code at the end of all my optimizations, there is no mention of etaReaderT
any more: Subsequent optimizations simplified the code so much that eventually GHC was able to do this transformation without my help.
Have something to say? You can post a comment by sending an e-Mail to me at <mail@joachim-breitner.de>, and I will include it here.