Surgery for data types

Posted on November 26, 2018

In this post I will show an example of data type surgery. Surgeries are operations to change a data type by bits and pieces: modify its fields and constructors, you get another type.

A couple of simple surgeries using GHC.Generics can be found in my freshly released Haskell library: generic-data-surgery. Suggestions for more surgeries are welcome on the issue tracker.

The general motivation is to improve the applicability of various generic1 definitions, such as aeson’s generic instances for ToJSON and FromJSON. Such a library often offers several options to customize the generic implementations, but it can still happen that none of them quite fit your external requirements and you have to resort to manual implementations, even with only small mismatches with the generic implementations. Surgeries are a new way to adapt generic implementations to such conditions outside of your control.

A toy example

Consider the task of deserializing a simple record from JSON:

data Rec = Rec
  { iden    :: Int
  , header1 :: Int
  , header2 :: Int
  , payload :: String
  } deriving (Eq, Generic, Show)

The JSON we want to parse will follow the same structure (an object with keys "iden", "header1", "header2", "payload"), but the "payload" key may also be missing from the JSON object, and that should be interpreted as an empty payload (payload = ""). Two examples of well-formed objects:

{"iden":1,"header1":2,"header2":3}
{"iden":1,"header1":2,"header2":3,"payload":"Hello"}

The aeson library’s generic implementation of parseJSON is close to doing the right thing, except for that extra requirement about the missing "payload" key. Is this an all-or-nothing situation, where you would give up entirely the existing automation for the smallest of issues?

Solution 1: a parameterized type

There is a simple solution that doesn’t require anything as fancy as surgeries.

aeson can handle missing keys in one specific case2: with the option omitNothingFields=True, missing keys corresponding to fields of type Maybe Something will be parsed as Nothing. So if we change the type of the field payload from String to Maybe String

data Rec' = Rec'
  { iden    :: Int
  , header1 :: Int
  , header2 :: Int
  , payload :: Maybe String
  } deriving (Eq, Generic, Show)

… then aeson’s generic parser can accept a missing payload.

instance FromJSON Rec' where
  parseJSON = genericParseJSON defaultOptions{omitNothingFields=True}

But of course that breaks any existing function that uses payload as a plain String. To avoid that, we can generalize over the difference between Rec and Rec' with a parameterized type:

data Rec_ string = Rec
  { iden    :: Int
  , header1 :: Int
  , header2 :: Int
  , payload :: string
  } deriving (Eq, Functor, Generic, Show)

type Rec  = Rec_ String
type Rec' = Rec_ (Maybe String)

We get a Functor instance for free (thanks to the DeriveFunctor extension) to transform the payload.

Now we can use genericParseJSON to first parse a Rec_ (Maybe String), and use fmap to massage it into a proper Rec.

instance FromJSON Rec where
  parseJSON :: Value -> Parser Rec
  parseJSON = (fmap . fmap) defString
            . genericParseJSON defaultOptions{omitNothingFields=True}

-- Helper function to turn a missing string into the empty string.
defString :: Maybe String -> String
defString Nothing  = ""
defString (Just s) = s

Here is a little diagram detailing the intermediate types in the definition of parseJSON:

--                            Value
-- genericParseJSON opts      ->
--                            Parser (Rec_ (Maybe String)))
-- (fmap . fmap) defString    ->
--                            Parser (Rec_ String)
--                          = Parser Rec

This is a nice solution, but still less than ideal. Mangling our type may make compilation errors in unrelated places harder to understand, and it may even introduce new ambiguity errors. These global costs might not be worth a benefit as local as automating the implementation of a serializer.

Thus, we would like both:

And let’s keep it DRY: that second record type must somehow be derived from Rec, rather than redeclared explicitly.

Solution 2: type surgery

Conceptually, we’re looking for a very simple operation: change the type of the payload field. So the code should look just as simple on the surface (or at least, not much more complicated than the previous solution):

instance FromJSON Rec where
  parseJSON :: Value -> Parser Rec
  parseJSON
    = fmap (fromOR . modifyRField @"payload" defString . toOR')
    . genericParseJSON defaultOptions{omitNothingFields=True}

-- Defined previously
defString :: Maybe String -> String

The surgery modifyRField @"payload" defString is a function that takes some record with a field payload of type Maybe String and applies defString to that field, producing a new record where payload has type String instead. Actually, surgeries don’t operate directly on records; these need to be converted to a generic representation via fromOR and toOR'.

Here is another little diagram of what the types look like in the argument of fmap above. The record types are “expanded” to illustrate what is going on informally. The runtime data flows top-down, but the “surgery” on types might be more easily read bottom-up:

                    { ..., payload :: Maybe String }  (a synthetic type)
toOR'            ->
                 OR { ..., payload :: Maybe String }
modifyRField     ->
                 OR { ..., payload :: String }
fromOR           ->
                    { ..., payload :: String }     (Rec, a natural type)

On the bottom end, we know a Rec must be getting out (from the expected type of parseJSON). Moving up in the diagram, fromOR puts the record type in an “operating room” (OR) (that’s the metaphor; concretely this is a mapping between a type and a generic representation like GHC.Generics.Rep). Inside the operating room, we can apply the surgery modifyRField to change the type of the payload field. At the top of the diagram, going into toOR' is a synthetic record type, call it SRec, created by the surgery, and for which there is no data declaration,3 as opposed to a natural type like Rec.

If that may seem magical, all you need to know is that this synthetic type SRec has the same generic representation as the previous altered Rec' type where payload is given type Maybe String (Rep SRec = Rep Rec'), and that is also all genericParseJSON needs to see to do its thing. No type annotations are necessary: the synthetic generic type that comes out of genericParseJSON and goes into toOR' can be inferred from the fact that the original Rec type is expected on the other end of the operating room.

Closing notes


  1. by “generic” I mean both GHC.Generics, which is what the rest of the post will be using, and the more general idea of “data-type generic programming” for which GHC.Generics is one approach among many, alongside “Scrap your boilerplate” and Template Haskell (in some respects).

  2. Arguably too specific.

  3. Of course, you’ll find some “naturally declared” types if you look at the definition of the synthetic type, but those are really implementation details. Make abstraction of them, and look at that synthetic type as a type with a similar structure to Rec.