Joachim Breitner's Homepage
rec-def: Program analysis case study
At this week’s International Conference on Functional Programming I showed my rec-def
Haskell library to a few people. As this crowd appreciates writing compilers, and example from the realm of program analysis is quite compelling.
To Throw or not to throw
Here is our little toy language to analyze: It has variables, lambdas and applications, non-recursive (lazy) let bindings and, so that we have something to analyze, a way to throw and to catch exceptions:
type Var = String
data Exp
= Var Var
| Throw
| Catch Exp
| Lam Var Exp
| App Exp Exp
| Let Var Exp Exp
Given such an expression, we would like to know whether it might throw an exception. Such an analysis is easy to write: We traverse the syntax tree, remembering in the env
which of the variables may throw an exception:
canThrow1 :: Exp -> Bool
= go M.empty
canThrow1 where
go :: M.Map Var Bool -> Exp -> Bool
Var v) = env M.! v
go env (Throw = True
go env Catch e) = False
go env (Lam v e) = go (M.insert v False env) e
go env (App e1 e2) = go env e1 || go env e2
go env (Let v e1 e2) = go env' e2
go env (where
= M.singleton v (go env e1)
env_bind = M.union env_bind env env'
The most interesting case is the one for Let
, where we extend the environment env
with the information about the additional variable env_bind
, which is calculated from analyzing the right-hand side e1
.
So far so good:
ghci> someVal = Lam "y" (Var "y")
ghci> canThrow1 $ Throw
True
ghci> canThrow1 $ Let "x" Throw someVal
False
ghci> canThrow1 $ Let "x" Throw (App (Var "x") someVal)
True
Let it rec
To spice things up, let us add a recursive let to the language:
data Exp
…| LetRec [(Var, Exp)] Exp
How can we support this new constructor in canThrow1
? Let use naively follow the pattern used for Let
: Calculate the analysis information for the variables in env_bind
, extend the environment with that, and pass it down:
LetRec binds e) = go env' e
go env (where
= M.fromList [ (v, go env' e) | (v,e) <- binds ]
env_bind = M.union env_bind env env'
Note that, crucially, we use env'
, and not just env
, when analyzing the right-hand sides. It has to be that way, as all the variables are in scope in all the right-hand sides.
In a strict language, such a mutually recursive definition, where env_bind
uses env'
which uses env_bind
is basically unthinkable. But in a lazy language like Haskell, it might just work.
Unfortunately, it works only as long as the recursive bindings are not actually recursive, or if they are recursive, they are not used:
ghci> canThrow1 $ LetRec [("x", Throw)] (Var "x")
True
ghci> canThrow1 $ LetRec [("x", App (Var "y") someVal), ("y", Throw)] (Var "x")
True
ghci> canThrow1 $ LetRec [("x", App (Var "x") someVal), ("y", Throw)] (Var "y")
True
But with genuine recursion, it does not work, and simply goes into a recursive cycle:
ghci> canThrow1 $ LetRec [("x", App (Var "x") someVal), ("y", Throw)] (Var "x")
^CInterrupted.
That is disappointing! Do we really have to toss that code and somehow do an explicit fixed-point calculation here? Obscuring our nice declarative code? And possibly having to repeat work (such as traversing the syntax tree) many times that we should only have to do once?
rec-def to the rescue
Not with rec-def! Using RBool
from Data.Recursive.Bool
instead of Bool
, we can write the exact same code, as follows:
import qualified Data.Recursive.Bool as RB
canThrow2 :: Exp -> Bool
= RB.get . go M.empty
canThrow2 where
go :: M.Map Var RBool -> Exp -> R Bool
Var v) = env M.! v
go env (Throw = RB.true
go env Catch e) = RB.false
go env (Lam v e) = go (M.insert v RB.false env) e
go env (App e1 e2) = go env e1 RB.|| go env e2
go env (Let v e1 e2) = go env' e2
go env (where
= M.singleton v (go env e1)
env_bind = M.union env_bind env
env' LetRec binds e) = go env' e
go env (where
= M.fromList [ (v, go env' e) | (v,e) <- binds ]
env_bind = M.union env_bind env env'
And it works!
ghci> canThrow2 $ LetRec [("x", App (Var "x") someVal), ("y", Throw)] (Var "x")
False
ghci> canThrow2 $ LetRec [("x", App (Var "x") Throw), ("y", Throw)] (Var "x")
True
I find this much more pleasing than the explicit naive fix-pointing you might do otherwise, where you stabilize the result at each LetRec
independently: Not only is all that extra work hidden from the programmer, but now also a single traversal of the syntax tree creates, thanks to the laziness, a graph of RBool
values, which are then solved “under the hood”.
The issue with x=x
There is one downside worth mentioning: canThrow2
fails to produce a result in case we hit x=x
:
ghci> canThrow2 $ LetRec [("x", Var "x")] (Var "x")
^CInterrupted.
This is, after all the syntax tree has been processed and all the map lookups have been resolved, equivalent to
ghci> let x = x in RB.get (x :: RBool)
^CInterrupted.
which also does not work. The rec-def
machinery can only kick in if at least one of its function is used on any such cycle, even if it is just a form of identity (which I ~ought to add to the library~ since have added to the library):
ghci> idR x = RB.false ||| x
ghci> let x = idR x in getR (x :: R Bool)
False
And indeed, if I insert a call to idR
in the line
= M.fromList [ (v, idR (go env' e)) | (v,e) <- binds ] env_bind
then our analyzer will no longer stumble over these nasty recursive equations:
ghci> canThrow2 $ LetRec [("x", Var "x")] (Var "x")
False
It is a bit disappointing to have to do that, but I do not see a better way yet. I guess the def-rec
library expects the programmer to have a similar level of sophistication as other tie-the-know tricks with laziness (where you also have to ensure that your definitions are productive and that the sharing is not accidentally lost).
Have something to say? You can post a comment by sending an e-Mail to me at <mail@joachim-breitner.de>, and I will include it here.