Towards monadic bidirectional serialization
This is written in Literate Haskell.
{-# LANGUAGE RecordWildCards #-}
{-# LANGUAGE InstanceSigs #-}
module Bidirectional.Serialization where
import Control.Applicative
import Data.Bool (bool)
import Data.Binary (Binary(..), Get)
import Data.Binary.Get (runGet)
import Data.Binary.Put (runPut, PutM)
import Data.ByteString.Lazy (ByteString)
import Data.Profunctor
-- Intentional synonyms of undefined.
...) :: omittedForBrevity
(...) = (...)
(
???) :: can'tSolveThis
(???) = (???) (
Recently I came across the codec_ package. It is a library to write a serializer and a deserializer as a single bidirectional artefact.
It extends the functional pearl Pickler Combinators, an earlier elementary solution by Andrew J. Kennedy (2004).
I’ve been trying to push further the ideas of these two.
The basics: Pickler Combinators
In this section I summarize the Pickler Combinators paper.
The UP
(un)pickler type (“PU
” in the pearl) consists of an
Unpickler (deserializer) and a Pickler (serializer).
The two components are parameterized over contexts r
and w
for unpickling
(reading) and pickling (writing) respectively.
data UP r w a = UP
unpickle :: r a
{ pickle :: a -> w ()
, }
The types Get
and PutM
from the binary package are examples of contexts for
UP
. The Binary
typeclass implies an (un)pickler for every instance.
type BinaryUP = UP Get PutM
binaryUP :: Binary a => BinaryUP a
binaryUP = UP get put
-- Deserialize
-- > runGet :: Get a -> ByteString -> a
binaryUnpickle :: BinaryUP a -> ByteString -> a
binaryUnpickle = runGet . unpickle
-- Serialize
-- > runPut :: PutM () -> ByteString
binaryPickle :: BinaryUP a -> a -> ByteString
binaryPickle up = runPut . pickle up
There are combinators, to (un)pickle products by concatenation:
pairUP
:: (Applicative r, Applicative w)
=> UP r w a -> UP r w b -> UP r w (a, b)
pairUP aUP bUP = UP
unpickle = liftA2 (,) (unpickle aUP) (unpickle bUP)
{ pickle = \(a, b) -> pickle aUP a *> pickle bUP b
,
}
-- Infix synonym.
>|)
(:: (Applicative r, Applicative w)
=> UP r w a -> UP r w b -> UP r w (a, b)
>|) = pairUP (
To (un)pickle sums:
altUP
:: Alternative r
=> UP r w a -> UP r w b -> UP r w (Either a b)
altUP aUP bUP = UP
unpickle =
{ Left <$> unpickle aUP <|>
Right <$> unpickle bUP
pickle = either (pickle aUP) (pickle bUP)
, }
The one above assumes that a
and b
have disjoint picklings,
so that they can be distinguished by an unpickler failing.
A more straightforward way to pickle sums is to precede their picklings with a
tag:
eitherUP
:: (Monad r, Applicative w)
=> UP r w Bool -> UP r w a -> UP r w b
-> UP r w (Either a b)
eitherUP boolUP aUP bUP = UP
unpickle = unpickle boolUP >>= bool
{ Right <$> unpickle bUP)
(Left <$> unpickle aUP)
(pickle = either
, a -> pickle boolUP True *> pickle aUP a)
(\b -> pickle boolUP False *> pickle bUP b)
(\ }
Finally, we can map over (un)picklers with isomorphisms (bijections):
in other words, UP
is a functor between the category of types and
isomorphisms and the category of types and functions, Hask
.
-- For (to, from) :: Iso a b, we assume:
--
-- > to . from = id :: b -> b
-- > from . to = id :: a -> a
type Iso a b = (a -> b, b -> a)
mapUP :: Functor r => Iso a b -> UP r w a -> UP r w b
mapUP (to, from) aUP = UP
unpickle = fmap to (unpickle aUP)
{ pickle = pickle aUP . from
, }
Using the above, we can program (un)picklers, but it is not as convenient
as it might seem.
Every operation involved must be invertible (obviously for mapUP
,
while pairUP
, altUP
, and eitherUP
rely on pattern matching).
UP
definitions for large records are rather tedious as
one has to write explicitly how to construct and destruct every record.
-- Assume for the sake of example that this type exists...
data Date
-- ... with an UP.
dateUP :: BinaryUP Date
dateUP = (...)
data User = User
userId :: Int
{ userName :: String
, userCreated :: Date
, userEmail :: String
,
}
userUP :: BinaryUP User
userUP =
mapUP
userId, userName), userCreated), userEmail) ->
( (\(((User{..})
User{..} ->
, (\userId, userName), userCreated), userEmail))
(((binaryUP >| binaryUP >| dateUP >| binaryUP) ) (
Half the definition of userUP
is boilerplate
for restructuring a tuple into/out of a User
.
A possible improvement is to derive the isomorphism generically, or with meta-programming.
However, we can design a much nicer interface by spending some effort to fit common abstractions in Haskell: applicative functors and monads.
I found something that works but I can really see that it looks good a posteriori,
whereas I have trouble giving an a priori motivation to work in that direction.
One is that that functional programmers are already familiar with these
abstractions, and that we can reasonably expect the r
and w
context to be
instances of Applicative
or even Monad
, so it might make sense that
a “product” of those inherits of such structure.
Applicative Codec
UP r w
is not a Haskell Functor
(endofunctor of Hask
), because
pickling is contravariant (of type a -> w ()
).
The Trick
The codec package dissociates the types being parsed (i.e., unpickled, deserialized) and produced (i.e., pickled, serialized).
data Codec r w x a = Codec
parse :: r a
{ produce :: x -> w ()
, }
We easily get Functor
, Applicative
and even Alternative
.
instance Functor r => Functor (Codec r w x) where
fmap f codec = codec { parse = fmap f (parse codec) }
instance (Applicative r, Applicative w)
=> Applicative (Codec r w x) where
pure a = Codec (pure a) (\_ -> pure ())
f <*> a = Codec
parse = parse f <*> parse a
{ produce = \x -> produce f x *> produce a x
,
}
instance (Alternative r, Alternative w)
=> Alternative (Codec r w x) where
empty = Codec empty (\_ -> empty)
a <|> a' = Codec
parse = parse a <|> parse a'
{ produce = \x ->
, produce a x <|> produce a' x
}
UP r w a
is isomorphic to Codec r w a a
;
we’re indeed working with a generalization of (un)picklers.
upToCodec :: UP r w a -> Codec r w a a
upToCodec (UP parse produce) = Codec parse produce
codecToUP :: Codec r w a a -> UP r w a
codecToUP (Codec unpickle pickle) = UP unpickle pickle
However if we work only with Codec r w a a
, we cannot use Applicative
,
because the context Codec r w a :: * -> *
is related to the content a :: *
.
To modify the context,
we note that Codec r w x a
is contravariant with respect to x
.
In fact, we have a Profunctor
.
instance Functor r => Profunctor (Codec r w) where
lmap :: (y -> x) -> Codec r w x a -> Codec r w y a
lmap from = liftA2 Codec parse ((. from) . produce)
rmap :: (a -> b) -> Codec r w x a -> Codec r w x b
rmap = fmap
In the produce
direction, lmap
makes the Codec
accept a
larger structure, a y
containing an x
that can be extracted with
from
.
As an aside, notice that
-- dimap
-- :: (y -> x) -> (a -> b)
-- -> Codec r w x a -> Codec r w y b
-- dimap from to = lmap from . rmap to
generalizes, with (y -> x) ~ (b -> a)
,
-- mapUP
-- :: (a -> b, b -> a) -- Iso a b
-- -> UP r w a -> UP r w b
An example of from :: y -> x
function is a field getter;
we can now easily define a Codec
for a record.
Assume we have a Codec
for each field of User
:
type BinaryCodec a = Codec Get PutM a a
-- For Int, String, etc.
binaryCodec :: Binary a => BinaryCodec a
binaryCodec = Codec get put
dateCodec :: BinaryCodec Date
dateCodec = (...)
Define an infix synonym for niceness:
=.)
(:: Functor r
=> (y -> x) -> Codec r w x a -> Codec r w y a
=.) = lmap (
The following definition looks much nicer than the one using mapUP
.
userCodec :: BinaryCodec User
userCodec = User
<$> userId =. binaryCodec
<*> userName =. binaryCodec
<*> userCreated =. dateCodec
<*> userEmail =. binaryCodec
We can move fields around, (de)serializing them in a different order, with one
less location to modify compared to an UP
definition (the to
component
being mostly implicit here), though it still looks unwieldly.
userReversedCodec :: BinaryCodec User
userReversedCodec =
email created name id ->
(\User id name created email)
<$> userEmail =. binaryCodec
<*> userCreated =. dateCodec
<*> userName =. binaryCodec
<*> userId =. binaryCodec
Magic record construction
The codec package actually does not work in the way I just presented.
It provides an Applicative
instance, but is missing the Profunctor
instance, or more specifically an (=.)
(lmap
), to work with
Applicative
.
In fact, codec takes another approach.
With some boilerplate generated via Template Haskell,
it allows to define Codec
s with a syntax very similar to the above.
It has the additional feature that permuting the fields does not require
rewriting the constructor as I did in userReversedCodec
.
All you need to do is provide a de/serializer for every record field in any order you like, and you get a de/serializer for the whole structure. The type system ensures that you provide every field exactly once.
Going monad
After getting an Applicative
, one is naturally led to wonder whether
there is a Monad
as well.
If we try to implement it, we realize Codec
is unfortunately not endowed
with such a structure. parse
is fine, but there is no way
to obtain a produce
from the second operand.
-- Failed
instance (Monad r, Applicative w)
=> Monad (Codec r w x) where
a >>= f = Codec
parse = parse a >>= parse . f
{ produce = \x ->
, produce a x *>
???) -- Can't apply f
( }
From here on, I have gone through a succession of choices, that I haven’t considered in detail individually, but I’m seeing something promising at the end.
Carry a projection
A simple fix is to make explicit the intent that in Codec r w x a
,
the x
should contain an a
.
data Codec0 r w x a = Codec0
parse0 :: r a
{ produce0 :: x -> w ()
, project0 :: x -> a
,
}
instance Functor (Codec0 r w x) where fmap = (...)
instance Applicative (Codec0 r w x) where
pure = (...) ; (<*>) = (...)
-- No Alternative?
instance Profunctor (Codec0 r w) where dimap = (...)
instance (Monad r, Applicative w)
=> Monad (Codec0 r w x) where
a >>= f = Codec0
parse0 = parse0 a >>= parse0 . f
{ produce0 = \x ->
, produce0 a x *>
produce0 (f (project0 a x)) x
project0 = \x ->
, project0 (f (project0 a x)) x
}
The issue with that definition is that there is a duplication of code between
produce0 :: x -> w ()
and project0 :: x -> a
, made evident if we
unroll a composition of (>>=)
and lmap
:
-- lmap g a >>= f
-- =
-- Codec0
-- { produce0 = \x ->
-- produce0 a (g x) *>
-- produce0 (f (project0 a (g x))) x
-- , ..
-- }
(g x)
occurs twice, and we would like to factor it,
but the compiler won’t see it.
Factor the projection
That duplication might be avoided by factoring project
out of produce
:
data Codec1 r w x a = Codec1
parse1 :: r a
{ produce1 :: a -> w ()
, project1 :: x -> a
, }
But that is just UP
with a new field, and we face again contravariance
with respect to a
, and lose so much niceness (though how much of an
inconvenience it causes is still unclear).
The Trick (bis)
I would try to apply again the trick that led from UP
to Codec
in the
first place, splitting the covariant and contravariant occurences of a
:
-- (Maybe come up with another name?)
-- The ordering here is chosen to be compatible
-- with the Profunctor typeclass...
data Codec3 r w k x a = Codec3
parse3 :: r a
{ produce3 :: k -> w ()
, project3 :: x -> k
,
}
-- ... but I really have some diagram x -> k -> a in mind.
type Codec' r w x k = Codec3 r w k x
instance Functor (Codec3 r w k x) where fmap = (...)
instance Applicative (Codec3 r w k x) where
pure = (...) ; (<*>) = (...)
instance Profunctor (Codec3 r w k) where dimap = (...)
Monad
is unfortunately still out of reach.
It now seems quite foolish, I erased the link that I made just earlier between
x
and a
.
Parameterized monad
After spending some time with this puzzle, I would generalize the Haskell
Monad
as follows:
bindCodec'
:: (Monad r, Applicative w)
=> Codec' r w k a a
-> (a -> Codec' r w b k b)
-> Codec' r w b k b
bindCodec' = (...)