Surgery for data types
In this post I will show an example of data type surgery. Surgeries are operations to change a data type by bits and pieces: modify its fields and constructors, you get another type.
A couple of simple surgeries using GHC.Generics
can be found in my freshly
released Haskell library:
generic-data-surgery.
Suggestions for more surgeries are welcome on the issue
tracker.
The general motivation is to improve the applicability of various
generic1 definitions, such as aeson’s generic instances for ToJSON
and FromJSON
.
Such a library often offers several options to customize the generic
implementations, but it can still happen that none of them quite fit your
external requirements and you have to resort to manual implementations, even
with only small mismatches with the generic implementations.
Surgeries are a new way to adapt generic implementations to such conditions
outside of your control.
A toy example
Consider the task of deserializing a simple record from JSON:
data Rec = Rec
iden :: Int
{ header1 :: Int
, header2 :: Int
, payload :: String
, deriving (Eq, Generic, Show) }
The JSON we want to parse will follow the same structure (an object with keys
"iden"
, "header1"
, "header2"
, "payload"
), but the "payload"
key may
also be missing from the JSON object, and that should be interpreted as an
empty payload (payload = ""
). Two examples of well-formed objects:
{"iden":1,"header1":2,"header2":3}
{"iden":1,"header1":2,"header2":3,"payload":"Hello"}
The aeson library’s generic implementation of parseJSON
is close to doing the
right thing, except for that extra requirement about the missing "payload"
key. Is this an all-or-nothing situation, where you would give up entirely the
existing automation for the smallest of issues?
Solution 1: a parameterized type
There is a simple solution that doesn’t require anything as fancy as surgeries.
aeson can handle missing keys in one specific case2: with the
option omitNothingFields=True
, missing keys corresponding to fields of type
Maybe Something
will be parsed as Nothing
. So if we change the
type of the field payload
from String
to Maybe String
…
data Rec' = Rec'
iden :: Int
{ header1 :: Int
, header2 :: Int
, payload :: Maybe String
, deriving (Eq, Generic, Show) }
… then aeson’s generic parser can accept a missing payload
.
instance FromJSON Rec' where
parseJSON = genericParseJSON defaultOptions{omitNothingFields=True}
But of course that breaks any existing function that uses payload
as a plain
String
. To avoid that, we can generalize over the difference between Rec
and
Rec'
with a parameterized type:
data Rec_ string = Rec
iden :: Int
{ header1 :: Int
, header2 :: Int
, payload :: string
, deriving (Eq, Functor, Generic, Show)
}
type Rec = Rec_ String
type Rec' = Rec_ (Maybe String)
We get a Functor
instance for free (thanks to the DeriveFunctor
extension)
to transform the payload.
Now we can use genericParseJSON
to first parse a Rec_ (Maybe String)
, and
use fmap
to massage it into a proper Rec
.
instance FromJSON Rec where
parseJSON :: Value -> Parser Rec
parseJSON = (fmap . fmap) defString
. genericParseJSON defaultOptions{omitNothingFields=True}
-- Helper function to turn a missing string into the empty string.
defString :: Maybe String -> String
defString Nothing = ""
defString (Just s) = s
Here is a little diagram detailing the intermediate types in the definition of
parseJSON
:
-- Value
-- genericParseJSON opts ->
-- Parser (Rec_ (Maybe String)))
-- (fmap . fmap) defString ->
-- Parser (Rec_ String)
-- = Parser Rec
This is a nice solution, but still less than ideal. Mangling our type may make compilation errors in unrelated places harder to understand, and it may even introduce new ambiguity errors. These global costs might not be worth a benefit as local as automating the implementation of a serializer.
Thus, we would like both:
- to keep the original simple
Rec
type, wherepayload
has typeString
; - to use
genericParseJSON
with a different record type likeRec'
wherepayload
has typeMaybe String
.
And let’s keep it DRY:
that second record type must somehow be derived from Rec
, rather than
redeclared explicitly.
Solution 2: type surgery
Conceptually, we’re looking for a very simple operation: change the type of the
payload
field. So the code should look just as simple on the surface
(or at least, not much more complicated than the previous solution):
instance FromJSON Rec where
parseJSON :: Value -> Parser Rec
parseJSON
= fmap (fromOR . modifyRField @"payload" defString . toOR')
. genericParseJSON defaultOptions{omitNothingFields=True}
-- Defined previously
defString :: Maybe String -> String
The surgery modifyRField @"payload" defString
is a function that takes some
record with a field payload
of type Maybe String
and applies defString
to
that field, producing a new record where payload
has type String
instead.
Actually, surgeries don’t operate directly on records; these need to be
converted to a generic representation via fromOR
and toOR'
.
Here is another little diagram of what the types look like in the argument of
fmap
above. The record types are “expanded” to illustrate what is going on
informally.
The runtime data flows top-down, but the “surgery” on types might be more
easily read bottom-up:
{ ..., payload :: Maybe String } (a synthetic type)
toOR' ->
OR { ..., payload :: Maybe String }
modifyRField ->
OR { ..., payload :: String }
fromOR ->
{ ..., payload :: String } (Rec, a natural type)
On the bottom end, we know a Rec
must be getting out (from the expected type
of parseJSON
).
Moving up in the diagram, fromOR
puts the record type in an “operating room”
(OR
) (that’s the metaphor; concretely this is a mapping between a type and a
generic representation like GHC.Generics.Rep
).
Inside the operating room, we can apply the surgery modifyRField
to change
the type of the payload
field.
At the top of the diagram, going into toOR'
is a synthetic record type,
call it SRec
, created by the surgery, and for which there is no data
declaration,3 as opposed to a natural type like Rec
.
If that may seem magical, all you need to know is that this synthetic type
SRec
has the same generic representation as the previous altered Rec'
type
where payload
is given type Maybe String
(Rep SRec = Rep Rec'
), and that
is also all genericParseJSON
needs to see to do its thing.
No type annotations are necessary: the synthetic generic type that comes out
of genericParseJSON
and goes into toOR'
can be inferred from the fact that
the original Rec
type is expected on the other end of the operating room.
Closing notes
generic-data-surgery is still very much in the proof-of-concept stage, but any issue encountered while trying to use it would help move the project forward. I haven’t looked at performance; compile times are expected to be abysmal, but at run time there should be no overhead with proper inlining. Error messages are another aspect to improve: a typical error currently prints an unfolded
Rep
spanning a hundred lines with stuck type families. If you are interested in these matters, whatever your level of expertise, please get in touch (e.g., via GitHub or via e-mail)! I would be happy to help you navigate in the code.I can hear mumbles about row polymorphism, extensible types, etc. Indeed, generic-data-surgery contains pretty much yet another record library, but only means to use it as an intermediate representation with which to instantiate various generic implementations such as
genericParseJSON
above. In particular, it carefully preserves the type metadata found in theM1
constructor ofGHC.Generics
. A better story for row polymorphism in Haskell may very well overtake this library.Modifying the contents of fields is also a good fit for lenses, and in fact the
modifyRField
surgery presented here can be implemented as a “type-changing lens” from generic-lens (to be released with generic-lens-1.1.0.0) plus a unification constraint on the type metadata carried byGHC.Generics.M1
.4 Other surgeries that add or remove fields and constructors seem less easy to express that way though.I got the idea of first-class-families while doing type-level acrobatics in generic-data-surgery.
A compilable version of this example can be found in the library repository.
by “generic” I mean both
GHC.Generics
, which is what the rest of the post will be using, and the more general idea of “data-type generic programming” for whichGHC.Generics
is one approach among many, alongside “Scrap your boilerplate” and Template Haskell (in some respects).↩︎Arguably too specific.↩︎
Of course, you’ll find some “naturally declared” types if you look at the definition of the synthetic type, but those are really implementation details. Make abstraction of them, and look at that synthetic type as a type with a similar structure to
Rec
.↩︎EDIT 2018-12-31: generic-data now has a bit of documentation about using generic-lens for surgeries.↩︎