Joachim Breitner's Homepage
Faster Winter 1: Vectors
(This is the first optimization presented in the “faster winter” series, please see that post for background information.)
Immutable vectors
The first very low hanging fruit was to be found in the Wasm.Runtime.Memory
module. The memory of the Wasm module was defined as follows:
import Data.Vector (Vector)
data MemoryInst m = MemoryInst
_miContent :: Mutable m (Vector Word8)
{ _miMax :: Maybe Size
, }
An immutable vector in a mutable box (Mutable m
is something like IORef
or STRef
, whatever is appropriate for the monad m
).
This means that every write to the memory has to copy the whole memory into a new vector. This is nice from a semantic point of view (e.g. you can take a snapshot of the memory at a specific point in time, evaluate the code some more, and the immutable vector is untouched), but obviously very inefficient.
Mutable vectors
So I changed it to a mutable vector (MVector
) that allows in-place updates:
import Data.Vector (MVector)
data MemoryInst m = MemoryInst
_miContent :: Mutable m (MVector (PrimState m) Word8)
{ _miMax :: Maybe Size
, }
I still need to place that mutable vector in a mutable box, because an MVector
still has fixed size but Wasm programs can resize their memory. This mean we still copy the whole memory when the memory grows, but not during normal writes.
Improvement: Allocations: -97.71% Memory: +16.87% Time: -98.88% (Commit af6adfc…254d700)
This is much more … usable!
Oddly, memory consumption increases,and I am not entirely sure why: My guess is that because it runs faster, the garbage collector runs less often, the program can allocate more stuff in between garbage collector runs.
Unboxed vectors
In a second step I also replaced the vector with an unboxed vector: Above we have the normal Data.Vector
vector, which stores pointers (8 bytes!) to elements, even if the element is a single byte like here. By importing Data.Vector.Unboxed
(no other change!) I get a nicely packed array of bytes.
Interestingly, while it improved maximum memory usage quite a little bit, it looks like it made the program actually a bit slower – potentially because it now had to box and unbox these Word8
values.
Improvement: Memory: -27.04% Time: +15.82% (4061fe6)
There may be more improvements possible here (e.g. when writing a Word32
, maybe that can be done in one go, instead of writing the four bytes separately), but I did not follow this clue any further for now.
Instead, the next thing that looked like it was worth investigating was overly polymorphic code.
Have something to say? You can post a comment by sending an e-Mail to me at <mail@joachim-breitner.de>, and I will include it here.