Joachim Breitner's Homepage
Handling explicit and implicit recursion in Haskell data
For a while I have been pondering over a problem that arises when your functionally written program has some state with cross references – for example a list of users, each of which uses a number of computers, and a list of computers, each having an owner.
Implicit referencing
For doing queries on such data, it would be convenient if every reference is just the referenced object itself. Although we would visualize this as a graph, semantically, it is more like an infinite tree. This is possible in Haskell, due to laziness, and if you create the data structure cleverly, it even uses constant memory, no matter how “deep” you enter this infinite tree (a recent post of mine talks about this). A possible data definition would be:
data User = User { userName :: String, uses :: [Computer] } data Computer = Computer { computerName :: String, owner :: User -- references the Users } data State = State [User] [Computer] testState = let user = User "Conrad" [computer] computer = Computer "Z3" user in State [user] [computer]
Explicit referencing
But such a representation is very unsuitable for updates (at least I can’t think if a nice way of updating such a graph without breaking the internal cross-references) and serialization, which is a must for a HAppS based application. So what one would probably start with is this data structure:
data User = User { userId :: Int, userName :: String, uses :: [Int] -- references the Computers } data Computer = Computer { computerId :: Int, computerName :: String, owner :: Int -- references the Users } data State = State [User] [Computer] testState = State [User 0 "Conrad" [1]] [Computer 1 "Z3" 0]
I think the semantics of this are clear. Note that the referencing is currently not type-safe, but this can be provided by phantom types. Maybe I’ll write more about that later.
Now imaging you want to display the information about the first computer with your web application. You extract the information with
let State _ cs = testState in head cs
and pass that to your templating engine. But what if your template wants to display the name of the owner? It only has access to his userId
. You would either need to know what information the template will ever need, extract that from the state beforehand and pass it along, or give the template access to the whole state. In that case, though, there has to be lookup-logic in your output code, which is also not nice.
Woudln’t it be nice if you could, in your application logic, work with the explicit references, which are easy to modify and store, but somehow turn that into the implicit referencing?
Duplicated representation
One way would be to have two unrelated sets of data structures, ExplicitState
, ExplicitUser
, ExplicitComputer
, which use explicit identifiers to reference each other, and ImplicitState
,... which are defined as the first representation of our state. It is then mostly trivial to write a function that converts ExplicitState
to ImplicitState
.
The big disadvantage of this is that you have to maintain these two different hierarchies. It also means that every function on the state has to be defined twice, which often almost identical code. Clearly, this is not desirable.
Annotated representation
It would be more elegant if the state is stored in one data type that, controlled by a type parameter, comes in the one or the other representation. To do that, we need two types: One that contains a value, and one that contains just a reference:
newtype Id v = Id v deriving (Show, Typeable, Data) newtype Ref v = Ref Int deriving (Show, Typeable, Data)
Then we need to adjust our data definitions, to make use of these. (I’ll leave out the names, to keep the code smaller)
data User ref = User { userId :: Int, uses :: [ref (Computer ref)] } data Computer ref = Computer { computerId :: Int, owner :: ref (User ref) } data State ref = State [User ref] [Computer ref]
Here we introduce a type parameter “ref”, which will later be either Id
or Ref
. Note that now a reference also states the object it is a reference for, which greatly increases type safety. Functions on these data types that don’t work with the references will be polymorphic in the “ref” type parameter, so only need to be written once. A User Id
is a complete user with all related data, while User Ref
is a user with only references. And a Ref (User Ref)
is reference to a user, which contains references...
Not so kind kinds
Did you notice the lack of a deriving clause? Our data structures have the relatively peculiar kind ((* -> *) -> *
), which makes it hard for the compiler to derive instances for things like Show
. But we already know that we will only use Id
or Ref
for the type variable, so we can use ghc’s StandaloneDeriving
language extension and have these instances created:
deriving instance Show (User Id) deriving instance Show (User Ref) deriving instance Show (Computer Id) deriving instance Show (Computer Ref) deriving instance Show (State Id) deriving instance Show (State Ref)
Toggling a type parameter
The next step is to write the conversion function. It will have type
unrefState :: State Ref -> State Id
For that, and for later, we need lookup functions:
unrefUserRef :: State Id -> Ref (User Ref) -> Id (User Id) unrefUserRef (State l _) (Ref i) = Id $ fromJust $ find (\u@(User i' _) -> i == i') l unrefComputerRef :: State Id -> Ref (Computer Ref) -> Id (Computer Id) unrefComputerRef (State _ l) (Ref i) = Id $ fromJust $ find (\u@(Computer i' _) -> i == i') l
These expect a State (with implicit referencing) and a reference, and look up this reference. The function unrefState
then looks like this:
unrefState :: State Ref -> State Id unrefState (State us cs) = let unrefState = State (map (unrefUser unrefState) us) (map (unrefComp unrefState) cs) in unrefState where unrefUser :: State Id -> User Ref -> User Id unrefUser s (User i refs) = User i (map (unrefComputerRef s) refs) unrefComp :: State Id -> Computer Ref -> Computer Id unrefComp s (Computer i ref) = Computer i (unrefUserRef s ref)
Note how we “tie the knot” in the let
expression. This is the trick that ensures constant memory consumption, because every reference points back to the same place in memory.
Satisfied already?
So what do we have? We have no duplication of data types, we can write general functions, and we can resolve the explicit referencing. We can also easily write functions like unrefUser :: State Ref -> User Ref -> User Id
, which transform just a part of the state.
But writing unrefState
is very tedious when the State becomes more complex. Each of the other unrefSomething
functions are very similar, but need to be written anyways. This is unsatisfactory. What we want, is a generic function, something like
gunref :: State Ref -> a Ref -> a Id
which, given a state with explicit references, replaces all explicit references in the first argument (which could be State Ref
or User Ref
or anything like that) with implicit ones. This function can not exist, because we would not know anything about a
and b
. But maybe we can do this:
gunref :: (Data (a Id), Data (a Ref)) => State Ref -> a Ref -> a Id
Typeable and Data
Before being able to do so, we need to derive Data
for our types. We can start with
deriving instance Data (User Id) deriving instance Data (User Ref) deriving instance Data (Computer Id) deriving instance Data (Computer Ref) deriving instance Data (State Id) deriving instance Data (State Ref
but that will complain about missing Typeable
instances. Unfortunately, ghc’s deriver for Typeable
(even the stand-alone-one), does not handle our peculiar kind, so we need to do it by hand. With some help from quicksilver on #haskell, I got it to work:
instance Typeable1 ref => Typeable (User ref) where typeOf _ = mkTyConApp (mkTyCon "User") [typeOf1 (undefined :: ref ())] instance Typeable1 ref => Typeable (Computer ref) where typeOf _ = mkTyConApp (mkTyCon "Computer") [typeOf1 (undefined :: ref ())] instance Typeable1 ref => Typeable (State ref) where typeOf _ = mkTyConApp (mkTyCon "State") [typeOf1 (undefined :: ref ())]
everywhere
is not enough
Turning to the documentation of Data.Generics
, I notice with some disappointment that there is no function that is able to change a type – they all seem to replace a value by another value of the same type. But the functions gfoldl
and gunfold
sounded like they could be used for this.
Warning: What comes now is a very non-haskellish hack that subverts the type system, just to get the job done. Please read it with a healthy portion of distrust. If you know of a cleaner way of doing that, please tell me!
Wrapped Data
I want to do some heavy type hackery, so I need to disable haskell’s type system. There is Data.Dynamic
, but not even that is enough, as we need to carry a type’s Data
instance around as well. So let’s wrap that up:
data AData where AData :: Data a => a -> AData instance Show AData where show (AData a) = "<" ++ show (typeOf a) ++ ">" fromADataE :: forall b. Data b => AData -> b fromADataE (AData d) = case cast d of Just v -> v Nothing -> error $ "Type error, trying to convert " ++ show (typeOf d) ++ " to " ++ show (typeOf (undefined :: b)) ++ "."
There is also a function that converts an AData
back to a normal type, if possible. If it’s not possible, then there is a bug in our code, so we give an informative error message.
AData transformers
Similar to everywhere
, we want to have transformers that combinable. They need to have the change to convert an arbitrary value, but also signal that they could not convert something (and this something has to be recursed into). I came up with this:
type ADataT = AData -> Maybe AData extADT :: forall a b. (Data a, Data b) => ADataT -> (a -> b) -> ADataT extADT at t a@(AData v) = case cast v of Just x -> Just (AData (t x)) Nothing -> at a doNothing :: ADataT doNothing = const Nothing
ADataT
is the type for such a transformer. doNothing
will not transform anything, and extADT
can be used to add any function to the list of tried transformers, in the spirit of extT
.
The ugly part
To apply such a transformer, I want to use this function, which I’ll describe in the code comments:
everywhereADT :: forall a b. (Data a, Data b) => ADataT -> a -> b -- first check if we can transform this value already everywhereADT f v = case f (AData v) of -- if so, coerce it to the users’ requested type, which hopefully goes well Just r -> fromADataE r -- if not, we need to recurse into the data structure Nothing -> recursed -- for that, we first need to figure out the arguments to the -- constructor. We store them in the untyped list where args :: [AData] (Const args) = gfoldl k z v -- gfoldl lets us have a look at each argument. We wrap it in AData -- and append it to the list k (Const args) arg = Const (AData arg : args) z start = Const [] -- We need the data constructor of the input data type. If the user did not want -- it to be transformed, it will be the same c = toConstr v -- To give better error messages, we make sure that the outmost type constructor -- of the requested type actually has the data constructor we were given. Otherwise -- gunfold will complain in a non-helpful way input_type = dataTypeRep (constrType c) output_type = dataTypeRep (dataTypeOf (undefined :: b)) recursed = if input_type /= output_type then error $ "Can not convert <" ++ show input_type ++ ">"++ " to <" ++ show output_type ++ ">." -- the types match, so we assemble the output type, using gunfold else snd (gunfold k' z' c) k' :: forall b r . Data b => ([AData], b -> r) -> ([AData],r) -- we start by reversing the input list z' start = (reverse args,start) -- then we call us recursively on the argument and feed the result -- to the output constructor k' ((AData a) : args, append) = (args, append (everywhereADT f a)) -- Used for folding (we don’t need to retain the original constructor) data Const a b = Const a
What a beast! But surprisingly, it works. Here are some examples. Note that I always have to specify the requested output type:
bool2Int :: Bool -> Int bool2Int False = 0 bool2Int True = 1 *Main> everywhereADT (doNothing `extADT` bool2Int) True :: Int 1 *Main> everywhereADT (doNothing `extADT` bool2Int) True :: () *** Exception: Type error, trying to convert Int to (). *Main> everywhereADT (doNothing `extADT` bool2Int) (True,False) :: (Int,Int) (1,0) *Main> everywhereADT (doNothing `extADT` bool2Int) ([True,False],True,()) :: ([Int],Int,()) ([1,0],1,()) *Main> everywhereADT (doNothing `extADT` bool2Int) ([True,False],True,()) :: [()] *** Exception: Can not convertto . *Main> everywhereADT (doNothing `extADT` bool2Int) [True] :: [Bool] [*** Exception: Type error, trying to convert Int to Bool.
I hope this code does not inflict too much pain on any Haskell-loving reader. I know I violated the language, but I didn’t know how to do it better (at least not without using Template Haskell). I also know that this is not very good performance wide: Every single value in the input will be deconstructed, type-compared several times and re-assembled. If that is an issue, then this function should only be used for prototyping.
Almost done
To apply this to our state, we only need to glue it to our lookup functions from above:
gunref :: (Data (a Id), Data (a Ref)) => State Ref -> a Ref -> a Id gunref s w = let unrefState = gunref' unrefState s in gunref' unrefState w gunref' :: (Data (a Id), Data (a Ref)) => State Id -> a Ref -> a Id gunref' unrefState = everywhereADT unref' where unref' = doNothing `extADT` unrefUserRef unrefState `extADT` unrefComputerRef unrefState
Now we have a generic unreferencer. I set the type a bit more specific than necessary, to make it safe to use (under the assumption that the list of lookup functions is complete and will not leave any Ref
in the output).
*Main> testState State [User {userId = 0, uses = [Ref 1]}] [Computer {computerId = 1, owner = Ref 0}] *Main> gunref testState testState State [User {userId = 0, uses = [Id (Computer {computerId = 1, owner = Id (User {userId = 0, uses = [Id (Computer {computerId = 1, owner = Id (User {userId = 0, uses = ..
Oh, and by the way, if you want to test this code, you’ll need at least:
{-# LANGUAGE DeriveDataTypeable, FlexibleInstances, GADTs, FlexibleContexts, StandaloneDeriving, ScopedTypeVariables #-}
Comments
thanks for forging ahead.
It strikes me that the problem of cross-referencing may require a lot of work in Haskell, where as in an OO environment it is trivial.
Any comments?
Thanks again,
Paul
Have something to say? You can post a comment by sending an e-Mail to me at <mail@joachim-breitner.de>, and I will include it here.