Joachim Breitner's Homepage
Constructing a list in a Monad
In Haskell, lists are ubiquitous. In real-world Haskell, monads are ubiquitous. So it should be an easy task to fill a list with values obtained in a monad, i.e. a list of n random numbers (in the Rand
monad), or getting some input from the user, and storing it in a list, until the user chooses to finish the input.
The task
For now, assume a function getInput :: IO Int
; we want to run it 10000 and then have the list of the values returned. Sounds simple and straight forward, but it is actually really hard to get the run-time behaviour that we naively and optimistically expect.
The expectation
In an imperative setting, we would have a simple counting loop. In the loop, we run getInput
, allocate a new cell for the linked list, and append it to the list. Clearly, this
- consumes no stack space and
- at the end of the loop, the list is already fully evaluated in memory, ready to be used.
Can we achieve that with Haskell? Let’s look at a few alternatives (complete code available)
Attempt 1: The accumulator
I’ll start with the probably worst attempt. Unfortunately, from what I have seen from correcting Haskell exams, this is what beginners would write, especially if they have the above imperative algorithm in mind, and know about the “convert loops to recursion with accumulators” approach to solving Haskell problems:
accumAppend :: Int -> IO [Int] accumAppend n = go n [] where go 0 l = return l go n l = do {x <- getInput; go (n-1) (l++[x])}
To check the space usage of this, I pass -Kn
to the GHC runtime and see what limit is just enough to let this program calculate accumAppend 10000
and fully evaluate the resulting list. It turns out that we need 328336 bytes of stack space.
So what about the second requirement; that the list is fully evaluated? Experienced Haskellers see it immediatelly: The accumulator is not used in the loop, so each iteration adds a thunk around the previous value, that would add the next entry to the end. Only when the list is actually used, these are run, each traversing the whole list, causing quadratic cost. We can verify this using ghc-vis:
Clearly (and not surprisingly) this is not the best solution.
Attempt 2: The accumulator (forced)
Lets try to fix the second issue, by making sure that the list is always fully evaluated. We import Control.DeepSeq
and write
accumAppendSeq :: Int -> IO [Int] accumAppendSeq n = go n [] where go 0 l = return l go n l = do {x <- getInput ; go (n-1) $!! (l++[x])}
As the accumulator gets fully evaluated before being passed to the next invocation of go
, the result will not contain thunks. But note that this has also fixed the stack consumption: The code now runs the smallest setting for -K
, 8 bytes. This works as both go
and deepseq
are tail-recursive.
But this is still not the solution we are looking for, as the quadradic cost caused by using ++
is still there.
Attempt 3: The accumulator (reversed)
Since we know that l++[x]
is expensive, but x:l
is cheap, we can simply use this to update the accumulator. This has the slightly annoying consequence that the entries of the list are in the reverse order, but we can fix that in the base case:
accumReverse :: Int -> IO [Int] accumReverse n = go n [] where go 0 l = return (reverse l) go n l = do {x <- getInput ; go (n-1) (x:l)}
And indeed, we get noticably faster code that still runs in 8 bytes of stack. Unfortunately, we still don’t construct the final list directly.
Attempt 4: Naive recursion
Maybe we can do without an accumulator. The most naive such code would be
listRecurse :: Int -> IO [Int] listRecurse n = go n where go 0 = return [] go n = do {x <- getInput; r <- go (n-1); return (x:r)}
and I’d expect that most experienced Haskellers would write that code (if they are asked not to use a library function). And indeed, this code does create the final list directly. But it requires a large stack or 164616 bytes. The reason is that this is no longer tail recursive: After calling go
again we need to take the result and prepend x
Attempt 5: library functions
Maybe we should simply not worry about the implementation, and use library functions? Clearly, the authors of such libraries have found the best solution. So how does
listReplicateM :: Int -> IO [Int] listReplicateM n = Control.Monad.replicateM n getInput
fare? Unfortunately, not better than the naive recursion. In fact, the stack space requirement is the same 164616 bytes. The same holds when using special libraries for generating and consuming list, e.g. the Streams module from the vector package:
listStreams :: Int -> IO [Int] listStreams n = MStream.toList $ MStream.replicateM n getInput
Attempt 6: Difference lists
It is time to dig deeper into our box of tricks. What if we have lists with constant time Snoc (a.k.a. appending a new element)? Then Attempt 3 would not require reversing the list. Such lists are provided by difference lists; there are libraries for that, but we can simply implement them using lambdas:
accumDList :: Int -> IO [Int] accumDList n = go n id where go 0 l = return (l []) go n l = do {x <- getInput; go (n-1) (l . (x:))}
Indeed, no stack is required (8 bytes are enough). But we are cheating here: We are still not constructing the final list, but rather we combine functions that append elements to lists. So when accumDList
returns, what we have in memory, is a thunk and sequence of partial function applications in the wrong order, and evaluating the thunk does basically the same thing as reverse
in Attempt 3. It might be a bit faster, is still not satisfying enough:
Attempt 7: Unsafe functions
Even deepter in the box of tricks, somewhere in the grit and dust where one usually should not reach, there is a function that can help us out here: unsafeInterleaveIO
. This allow us to modify the naive recursion in a way that first creates the Cons-cell, and then continues the loop:
listUnsafeInterleave :: Int -> IO [Int] listUnsafeInterleave n = do {l <- go n; return $!! l} where go 0 = return [] go n = do x <- getInput r <- unsafeInterleaveIO (go (n-1)) return (x:r)
The stack consumption is 8 bytes. It is worth noting that the all to deepseq
(in $!!
) will actually drive the evaluation of go
. Also, it guarantees that the unsafe effects of unsafeInterleaveIO
are confined to the execution of listUnsafeInterleave
, i.e. are actually safe.
Run time statistics
I ran criterion over the functions (generating lists of length 1000), and found these results:
- accumAppend: 6.5 ms
- accumAppendSeq: 8.2 ms
- accumReverse: 14.0 us
- listRecurse: 8.0 us (same as listReplicateM and listStreams)
- accumDList: 12.0 us
- listUnsafeInterleave: 174.0 us
Conclusion
There are many ways to write a list-constructing list in Haskell, none are perfect – they either fill the heap, or do not construct the list directly (usually requiring a second pass through it), or rely on IO
-specific unsafe functions. The naive recursion performs the best, but may cause a stack overflow (note, though that from GHC 7.8, the stack will be unlimited). The next best option seems to be the difference lists
I’m still wondering: What would be required from Haskell, GHC or the monads in question to have a fully satisfactory solution here, i.e. one that is as fast as the naive recursion, as stack efficient as the difference list solution, and allocates no thunks, only list cells?
Comments
(Note that I noticed that the numbering was broken and I fixed that; I believe you are referring to the difference list solution.)
AFAIK, there is no way around the stack space or heap space trade off when building lists, at least for non-mutating code. You have to pick one or the other. I don’t think this is a haskell specific problem.
I fail to see, what the difficulty is for a (Haskell) compiler to recognize, that the written algorithm is best compiled to a (possibly partly unrolled) loop and that the thunks are immediately evaluated (not even built), as the data is used afterwards anyway.
> go n l = do {x <- getInput ; go (n-1) $!! (x : l)}
what about
liftList :: Int -> IO [Int]
liftList n = go n (return [])
where
go 0 = id
go n = liftM2 (:) getInput . go (n-1)
?
It seems to execute in reasonable time but I'm not sure about the stack consumption... (by the way, how exactly do you do that?)
You can check the stack by limiting it (+RTS -K...), and see if it uses a non-trivial amount of it.
If I just write it
go 0 = return []
go n = liftM2 (:) getInput $ go $ n-1
it is exactly the same as listRecurse... I guess I'm a the naive kind!
Small talk explaining that: https://www.youtube.com/watch?v=EiIZlX_k89Y&ab_channel=ErlangSolutions or https://www.youtube.com/watch?v=VDA1kyWZYHc&ab_channel=LambdaWorld
(same talk on 2 conferences, slides linked in the comments).
At least that would be my 2 cents :)
Have something to say? You can post a comment by sending an e-Mail to me at <mail@joachim-breitner.de>, and I will include it here.