In the past few weeks I’ve made some improvements to *hs-to-coq*. In particular, I wanted to verify the `Data.Sequence`

module from the *containers* library. I’ve managed to translate most of the module to Coq so I can start proving stuff.

In this post, I will present some of the changes made in *hs-to-coq* to be able to translate `Data.Sequence`

.

*hs-to-coq* had already been used to verify `Data.Set`

and `Data.IntSet`

, and their map analogues, which are the most commonly used modules of the *containers* library.^{1} The main feature distinguishing `Data.Sequence`

from those is polymorphic recursion. There were a couple of smaller issues to solve beyond that, and some usability improvements made in the process.

As its name implies, `Data.Sequence`

offers a data structure to represent sequences. The type `Seq a`

has a meaning similar to the type of lists `[a]`

, but `Seq a`

supports faster operations such as indexing and concatenation (logarithmic time instead of linear time). The implementation is actually in `Data.Sequence.Internal`

, while `Data.Sequence`

reexports from it.

The type `Seq`

is a thin wrapper around the type `FingerTree`

which is where the fun happens. `FingerTree`

is what one might call an *irregular recursive type*. In the type declaration of `FingerTree`

, the recursive occurrence of the `FingerTree`

type constructor is applied to an argument which is not the variable which appears in the left-hand side of the definition. The right-hand side of the type declaration mentions `FingerTree (Node a)`

, rather than `FingerTree a`

itself:

```
-- An irregular type. (Definitions of Digit and Node omitted.)
data FingerTree a
= EmptyT
| Single a
| Deep Int (Digit a) (FingerTree (Node a)) (Digit a)
newtype Elem a = Elem a
newtype Seq a = Seq (FingerTree (Elem a))
```

*Regular recursive types*^{2} are much more common. For example, the type of lists, `List a`

below, is indeed defined in terms of the same `List a`

as it appears on the left-hand side:

```
-- A regular type
data List a = Nil | Cons a (List a)
```

*hs-to-coq* has no trouble translating irregular recursive types such as `FingerTree`

; do the naive thing and it just works. Problems start once we look at functions involving them. For example, consider a naive recursive size function, `sizeFT`

:

```
sizeFT :: FingerTree a -> Int
sizeFT EmptyT = 0
sizeFT (Single _) = 1
sizeFT (Deep _ l m r) = sizeDigit l + sizeFT m + sizeDigit r
-- This is wrong.
```

We want to count the number of `a`

in a given `FingerTree a`

, but the function above is wrong. In the recursive call, `m`

has type `FingerTree (Node a)`

, so we are counting the number of `Node a`

in the subtree `m`

, when we should actually count the number of `a`

in every `Node a`

, and sum them up. The function above actually counts the sum of all “digits” in a `FingerTree`

, which isn’t a meaningful quantity when trees are viewed as sequences.

While it may seem roundabout, probably the most straightforward way to fix this function is to first define `foldMap`

:^{3}

```
foldMapFT :: Monoid m => (a -> m) -> FingerTree a -> m
foldMapFT _ EmptyT = mempty
foldMapFT f (Single x) = f x
foldMapFT f (Deep _ l m r) = foldMap f l <> foldMapFT (foldMap f) m <> foldMap f r
sizeFT :: FingerTree a -> Int
sizeFT = getSum . foldMapFT (\_ -> Sum 1) -- Data.Monoid.Sum
```

What makes `foldMapFT`

unusual (and also `sizeFT`

even though its behavior is unexpected) is that its recursive occurrence has a different type than its signature. On the left-hand side, `foldMapFT`

is applied to `f :: a -> m`

; in its body on the right-hand side, it is applied to `foldMap f :: Node a -> m`

. This is what it means for `foldMapFT`

to be *polymorphic recursive*: its own definition relies on the polymorphism of `foldMapFT`

in order to specialize it to a different type than its type parameter `a`

.

In Haskell, type parameters are often implicit; a lot of details are inferred, so we don’t think about them. In Coq, type parameters are plain function parameters. Whenever we write a lambda, if it is supposed to be polymorphic, it will take one or more extra arguments. And now, because of polymorphic recursion, it matters where type parameters are introduced relative to the fixpoint operator.

```
(* A polymorphic recursive foldMapFT *)
fix foldMapFT (a : Type) (m : Type) (_ : Monoid m) (f : a -> m) (t : FingerTree a) : m :=
...
(* Here, foldMapFT : forall a m `(Monoid m), (a -> m) -> FingerTree a -> m *)
(* A non-polymorphic recursive foldMapFT, won't typecheck *)
fun (a : Type) (m : Type) (_ : Monoid m) =>
fix foldMapFT (f : a -> m) (t : FingerTree a) : m :=
...
(* Here, foldMapFT : (a -> m) -> FingerTree a -> m *)
```

In the body of the first function, `foldMapFT`

is polymorphic. In the body of the second function, `foldMapFT`

is not polymorphic.

As you might have guessed, *hs-to-coq* picked the wrong version. I created an edit to make the other choice:

```
polyrec foldMapFT
# Make foldMapFT polymorphic recursive
```

The funny thing is that *hs-to-coq* internally goes out of its way to factor out the type parameters of recursive definitions, thus preventing polymorphic recursion. This new edit simply skips that step. One could consider just removing that code path, but I didn’t want that change to affect existing code. My gut feeling is that it might still be useful. It’s unlikely that there is one single rule that will work for translating all definitions to Coq, so “hey it works” is good enough for now, and things will improve as more counterexamples show up.

In Coq, functions are total. To define a recursive function, one must provide a *termination annotation* justifying that the function terminates. There are a couple of variants, but the general idea is that some quantity must “decrease” at every recursive call (and it cannot decrease indefinitely). The most basic annotation (`struct`

) names one of the arguments as “the decreasing argument”.

*hs-to-coq* already allowed more advanced annotations to be specified as edits, but not this most basic variant—until I implemented it. It can be inferred in simple situations, but at some point it is still necessary to make it explicit.

When we write a recursive function, we refer to its decreasing argument by its name, but what really matters is its position in the list of arguments. For example, here is a recursive function `f`

with two arguments `x`

and `y`

:

```
fix f x y {struct y} := ...
```

The annotation `{struct y}`

indicates that `y`

, the second argument of `f`

, is the “decreasing argument”. The function is well-defined only if all occurrences of `f`

in its body are applied to a second argument which is “smaller” than `y`

in a certain sense. Otherwise the compiler throws an error.

That the argument is *named* is a problem when it comes to *hs-to-coq*: in Haskell, some arguments don’t have names because we immediately pattern-match on them. When translated to Coq, all arguments are given generated names, and they are renamed/decomposed in the body of every function.

```
-- A recursive function whose second argument is decreasing,
-- [] or (x : xs) depending on the branch, but there is no variable to refer to it.
map :: (a -> b) -> [a] -> [b]
map f [] = []
map f (x : xs) = f x : map f xs
```

*hs-to-coq* now allows specifying the decreasing argument by its position in the Haskell definition, i.e., ignoring type parameters. To implement that feature, we have to be a little careful since type parameters in Coq are parameters like any other, so they shift the positions of arguments. That turned out to be a negligible concern because, in the code of *hs-to-coq*, type parameters are kept separate from “value” parameters until a very late phase.

```
termination f {struct 2}
# The second argument of f is decreasing
```

Another potential solution is to fix the name generation to be more predictable. The arguments of top-level functions are numbered sequentially `arg_1__`

, `arg_2__`

, etc., which may be fine, but local functions just keep counting from wherever that left off (going up to `arg_38__`

in one case). Maybe they should also start counting from 1.

More complex termination annotations than `struct`

involve arbitrary terms mentioning those variables. For those, there is currently no workaround, one must use those fragile names to refer to a function’s arguments.

I initially expected that some functions in `Data.Sequence`

would have to be shown terminating based on the size of a tree as a decreasing measure, which involves more sophisticated techniques than justifications based on depth. In fact, only one function needs such sophistication (`thin`

, an internal function used by `liftA2`

). As mentioned earlier, the “size” of a `FingerTree`

is actually a little tricky to formalize, and that makes it even harder to use as part of such a termination annotation. Surprisingly, the naive and “wrong” version of `sizeFT`

shown earlier also works as a simpler decreasing measure for this function.

With the above two changes, *hs-to-coq* is now able to process quite a satisfactory fragment of `Data.Sequence.Internal`

. A few parts are not handled yet; they require either whole new features or more invasive edits than I have experience with at the moment.

There remains another issue with the `thin`

function we just mentioned: it is mutually recursive with another function. *hs-to-coq* currently does not support the combination of mutually recursive functions with termination annotations other than the basic one (`struct`

).

At the very beginning, *hs-to-coq* simply refused to process `Data.Sequence`

because *hs-to-coq* doesn’t handle pattern synonyms. Now it at least skips pattern synonyms with a warning instead of failing. One still has to manually add edits to ignore declarations that use pattern synonyms, since it’s not too easy to tell whether that’s the case without a more involved analysis than is currently done.

The remaining bits are partial functions, internally use partial functions, or are defined by recursion on `Int`

and I haven’t looked into how to do it yet.

Some changes that aren’t strictly necessary to get the job done, but made my life a little easier.

In Haskell, declarations can be written in any order (except when Template Haskell is involved) and they can refer to each other just fine.

In Coq, declarations must be ordered because of the restrictions on recursion. Type classes further complicate this story because of their implicitness: we cannot know whether an instance is used in an expression without type checking, and *hs-to-coq* currently stops at renaming.

For now, we have a “best guess” implementation using a “stable topological sort”, trying to preserve an *a priori* order as much as possible, putting instances before top-level values, and otherwise ordering value declarations as they appear in the Haskell source. Of course that doesn’t always work, so there are edits to create artificial dependencies between declarations.

It took me a while to notice something wrong with the implementation: independent definitions were sorted in reverse order, which is the opposite of what a “stable sort” should do. The sort algorithm itself was fine: the obvious dependencies were satisfied. And you expect to have things to fix by hand because of the underspecified nature of the problem at that point. So any single discrepancy was easily dismissed as “just what the algorithm does”. But after getting annoyed enough that nothing was where I expected it to be, I went to investigate. The culprit was GHC^{4}: renaming produces a list of declarations in reverse order! This is usually not a problem since the order of declarations should not matter in Haskell^{5}, but in our case we have to sort the declarations in source order before applying the stable topological sort. That ensures that the order in our Coq output is similar to the order in the Haskell input.

In edits files, identifiers must be fully qualified. This prevents ambiguities since edits don’t belong to any one module.

Module names can get quite long. It was tedious to repeat `Data.Sequence.Internal`

over and over. There was already an edit to *rename* a module, but that changes the name of the file itself and affects other modules using that module. I added a new edit to *abbreviate* a module, without those side effects. In fact, that edit only affects the edits file it is in. The parser expands the abbreviation on the fly whenever it encounters an identifier, and after the parser is done, the abbreviation is completely forgotten.

```
module alias Seq Data.Sequence.Internal
# "Seq" is now an abbreviation of "Data.Sequence.Internal"
```

Ready, Set, Verify!, IFCP 2018.↩︎

I don’t know whether

*irregular*/*regular*is conventional terminology, but my intuition to justify those names is that they generalize regular expressions. A regular recursive type defines a set of trees which can be recognized by a finite state machine (a*tree automaton*; Tree Automata, Techniques and Applications is a comprehensive book on the topic).↩︎Link to source which looks a bit different for performance reasons.↩︎

Tested with GHC 8.4↩︎

And the AST is annotated with source locations so we don’t get lost.↩︎

Can we prove `Monad`

instances lawful using *inspection-testing*?

In this very simple experiment, I’ve tried to make it work for the common monads found in *base* and *transformers*.

Main takeaways:

- Associativity almost holds for all of the considered monads, with the main constraint being that the transformers must be applied to a concrete monad such as
`Identity`

rather than an abstract`m`

. - The identity laws were relaxed to hold “up to eta-expansion”.
`[]`

cheats using rewrite rules.- This is a job for CPP.

The source code is available in this gist.

Let’s see how to use inspection testing through the first example of the associativity law. It works similarly for the other two laws.

Here’s the associativity law we are going to test. I prefer this formulation since it makes the connection with monoids and categories obvious:

`((f >=> g) >=> h) = (f >=> (g >=> h))`

To use inspection testing, turn the two sides of the equation into functions:

```
assoc1, assoc2 :: Monad m => (a -> m b) -> (b -> m c) -> (c -> m d) -> (a -> m d)
assoc1 f g h = (f >=> g) >=> h
assoc2 f g h = f >=> (g >=> h)
```

These two functions are not the same if we don’t know anything about `m`

and `(>=>)`

. So choose a concrete monad `m`

. For example, `Identity`

:

```
assoc1, assoc2 :: (a -> Identity b) -> (b -> Identity c) -> (c -> Identity d) -> (a -> Identity d)
assoc1 f g h = (f >=> g) >=> h
assoc2 f g h = f >=> (g >=> h)
```

GHC will be able to inline the definition of `(>=>)`

for `Identity`

and simplify both functions.

Using the *inspection-testing* library, we can now assert that the simplified functions in GHC Core are in fact equal:

```
{-# LANGUAGE TemplateHaskell #-}
inspect $ 'assoc1 ==- 'assoc2
```

This test is executed at compile time. The quoted identifiers `'assoc1`

and `'assoc2`

are the names of the functions as values (different things from the functions themselves), that the function `inspect`

uses to look up their simplified definitions in GHC Core. The `(==-)`

operator asserts that they must be the same, while ignoring coercions and type lambdas—constructs of the GHC Core language which will be erased in later compilation stages.

These tests can be tedious to adapt for each monad. The main change is the monad name; another concern is to use different function names for each test case. The result is a fair amount of code duplication:

```
assoc1Identity, assoc2Identity
:: (a -> Identity b) -> (b -> Identity c) -> (c -> Identity d) -> (a -> Identity d)
assoc1Identity f g h = (f >=> g) >=> h
assoc2Identity f g h = f >=> (g >=> h)
inspect $ 'assoc1Identity ==- 'assoc2Identity
assoc1IO, assoc2IO
:: (a -> IO b) -> (b -> IO c) -> (c -> IO d) -> (a -> IO d)
assoc1IO f g h = (f >=> g) >=> h
assoc2IO f g h = f >=> (g >=> h)
inspect $ 'assoc1IO ==- 'assoc2IO
```

The best way I found to handle the boilerplate is a CPP macro:

```
{-# LANGUAGE CPP #-}
#define TEST_ASSOC(NAME,M,FFF) \
assoc1'NAME, assoc2'NAME :: (a -> M b) -> (b -> M c) -> (c -> M d) -> a -> M d ; \
assoc1'NAME = assoc1 ; \
assoc1'NAME = assoc2 ; \
inspect $ 'assoc1'NAME FFF 'assoc2'NAME
```

It can be used as follows:

```
TEST_ASSOC(Identity,Identity,==-)
TEST_ASSOC(Maybe,Maybe,==-)
TEST_ASSOC(IO,IO,==-)
TEST_ASSOC(Select,Select,=/=)
```

Template Haskell is the other obvious candidate, but it is not as convenient:

- There’s no syntax to parameterize quotes by function names; at best, they can be wrapped in a pattern or expression quote, but type declarations require raw names; I object to explicitly constructing the AST.
- The
`inspect`

function must execute after the two given functions are defined; these two steps cannot be done in a single splice.

The inspection tests pass for almost all of the monads under test. Three tests fail. One (`Writer`

) could be fixed with a little tweak. The other two (`Select`

and `Product`

) can probably be fixed, I’m not sure.

Nevertheless, thinking through why the other tests succeed can also be an instructive exercise.

The writer monad consists of pairs, where one component can be thought of as a “log” produced by the computation. All we really need is a way to concatenate logs, so logs can formally be elements of an arbitrary monoid:

```
newtype Writer log a = Writer (a, log)
instance Monoid log => Monad (Writer log) where
return a = Writer (a, mempty)
Writer (a, log) >>= k =
let Writer (b, log') = k a in
b, log <> log') (
```

The writer monad does not pass any of the three inspection tests out-of-the-box (associativity, left identity, right identity) because the order of composition using `(>=>)`

is reflected after inlining in the order of composition using `(<>)`

,^{1} which GHC cannot reassociate in general.

A simple fix is to instantiate the monoid `log`

to a concrete one whose operations do get reassociated, such as `Endo e`

. While that makes the test less general technically, it can also be argued that this is such a localized change that we should still be able to derive from it a fair amount of confidence that the law holds in the general case.

The fact that `Maybe`

passes the test is a good illustration of one extremely useful simplification rule applied by GHC: the “case-of-case” transformation.

Expand both sides of the equation:

`((f >=> g) >=> h) = (f >=> (g >=> h))`

The left-hand side is a `case`

expression whose scrutinee is another `case`

expression:

The right-hand side is a `case`

expression containing a `case`

expression in one of its branches:

The code in the latter Figure B tends to execute faster. One simple reason for that is that if `f a`

evaluates to `Nothing`

, the whole expression will then immediately reduce to `Nothing`

, whereas Figure A will take one more step to reduce the inner `case`

before the outer `case`

. Computations nested in `case`

scrutinees also tend to require additional bookkeeping when compiled naively.

The key rule, named “case-of-case”, stems from remarking that eventually, a case expression reduces to one of its branches. Therefore, when it is surrounded by some context—an outer `case`

expression—we might as well apply the context to the branches directly. Figure A transforms into the following:

And the first branch reduces to `Nothing`

.

This transformation is not always a good idea to apply, because it duplicates the context, once for each branch of the inner `case`

. That rule pays off when some of these branches are constructors and when the context is a `case`

, so the transformation turns them into “case of constructor” which can be simplified away.

The representation of `IO`

in GHC Core looks like a strict state monad.

`data IO a = IO# (State# RealWorld -> (# a, State# RealWorld #))`

However, the resemblance between `IO`

and `State`

is purely syntactic, viewing Haskell programs only as terms to be rewritten, rather than mathematical functions “from states to states”. The token that is being passed around as the “state” in `IO`

has no meaning other than as a gadget to maintain the evaluation order required by the semantics of `IO`

. It is merely an elegant coincidence that the implementation of `IO`

matches perfectly the mechanics of the state monad.

Out of all the examples considered in this experiment, the continuation monad is the only example of a monad transformer applied to an abstract monad `m`

. All the other transformers are specialized to the identity monad.

That is because the other monad transformers use the underlying monad’s `(>>=)`

in their own definition of `(>>=)`

, and that blocks simplification. `ContT`

is special: its `Monad (ContT r m)`

instance does not even use a `Monad m`

instance. That allows it to compute where other monad transformers cannot.

This observation also suggests only using concrete monads as a strategy for optimizations to take place. The main downside is the lack of modularity. Some computations are common to many monads (e.g., traversals), and it also seems desirable to not have to redefine and recompile them from scratch for every new monad we come up with.

For the list monad, `(>>=)`

is `flip concatMap`

:

`concatMap :: (a -> [b]) -> [a] -> [b]`

`concatMap`

is a recursive function, and GHC does not inline those. Given that, it may be surprising that it passes the inspection test. This is thanks to bespoke rewrite rules in the standard library to implement list fusion.

You can confirm that by defining your own copy of the list monad and see that it fails the test.

Another idea was to disable rewrite rules (`-fno-enable-rewrite-rules`

), but this breaks even things unrelated to lists for mysterious reasons.

`pure`

to the right of `(>>=)`

cancels out.

`(u >>= pure) = u`

The right-hand side is very easy to simplify: there is nothing to do.

The problem is that on the left-hand side, we need to do some work to combine `u`

and `pure`

, and that almost always some of that work remains visible after simplification. Sadly, the main culprit is laziness.

For example, in the `Reader`

monad, `u >>= pure`

reduces to the following:

`Reader (\r -> runReader u r)`

If we ignore the coercions `Reader`

and `runReader`

, then we have:

`r -> u r \`

That is the eta-expanded^{2} form of `u`

. In Haskell, where `u`

might be undefined but a lambda is not undefined, `\r -> u r`

is not equivalent to `u`

. To me, the root of the issue is that we can use `seq`

on everything, including functions, and that allows us to distinguish `undefined`

(blows up) from `\x -> undefined x`

(which is equivalent to `\x -> undefined`

; does not blow up until it’s applied). A perhaps nicer alternative is to put `seq`

in a type class which can only be implemented by data types, excluding various functions and computations. That would add extra constraints on functions that do use strictness on abstract types, such as `foldl'`

. It’s unclear whether that would be a flaw or a feature.

So `u`

and `\r -> u r`

are not always the same, but really because of a single exception, when `u`

is undefined. So they are still *kinda* the same. Eta-expansion can only make an undefined term somewhat less undefined, but arguably not in any meaningful way.

This suggests to relax the equality relation to allow terms to be equal “up to eta-expansion”:

`f = g if (\x -> f x) = (\x -> g x)`

Furthermore, eta-expansion is an idempotent operation:

`r -> (\r1 -> u r1) r = \r -> u r \`

So to compare two functions, we can expand both sides, and if one side was already eta-expanded, it will reduce back to itself.

We can write the test case as follows:

```
lid1, lid2 :: Reader r a -> Reader r a
lid1 x = eta x
lid2 x = eta (x >>= pure)
eta :: Reader r a -> Reader r a
eta (Reader u) = Reader (\r -> u r)
inspect $ 'lid1 ==- 'lid2
```

The notion of “eta-expansion” can be generalized to other types than function types, notably for pairs:

```
eta :: (a, b) -> (a, b)
eta xy = (fst x, snd y)
```

The situation is similar to functions: `xy`

may be undefined, but `eta xy`

is never undefined.^{3}

This suggests the definition of a type class for generalized eta-expansion:

```
class Eta a where
-- Law: eta x = x modulo laziness
eta :: a -> a
instance Eta (a, b) where
eta ~(x, y) = (x, y) -- The lazy pattern is equivalent to using projections
instance Eta (Reader r a) where
eta (Reader f) = Reader (\r -> f r)
```

The handling of type parameters here is somewhat arbitrary: one could also try to eta-expand the components of the pair for instance.

Two more interesting cases are `ContT`

and `IO`

.

For `ContT`

, we not only expand `u`

to `\k -> u k`

, but we also expand the continuation to get `\k -> u (\x -> k x)`

.

```
instance Eta (ContT r m a) where
eta (ContT u) = ContT (\k -> u (\x -> k x))
```

It is also possible, and necessary, to eta-expand `IO`

, whatever that means.

```
instance Eta (IO a) where
eta u = IO (\s -> case IO f -> f s)
-- Note: eta is lazier than id.
-- eta (undefined :: IO a) /= (undefined :: IO a)
```

`pure`

on the left of `(>>=)`

cancels out.

`(pure x >>= k) = k x`

The left identity has the same issue with eta-expansion that we just described for the right identity. It also has another problem with sharing.

In the `Reader`

monad for example, `(pure x >>= k)`

first expands to—ignoring the coercions for clarity:

`r -> k x r \`

However, GHC also decides to extrude the `k x`

because it doesn’t depend on `r`

:

`let u = k x in \r -> u r`

The details go a little over my head, but I found a cunning workaround in the magical function `GHC.Exts.inline`

in the `Eta`

instance for `Reader`

:

```
instance Eta (ReaderT e m a) where
eta u = ReaderT (\x -> runReaderT (inline u) x)
```

When these inspection tests pass, that is proof that the monad laws hold.

If we reduce what the compiler does to inlining and simplification, then on the one hand, not all monads can be verified that way (e.g., lists that don’t cheat with rewrite rules); on the other hand, when the proof works, it proves a property stronger than “lawfulness”.

Let’s call it “definitional lawfulness”: we say that the laws hold “by definition”, with trivial simplification steps only. There is some subjectivity about what qualifies as a “trivial” simplification; it boils down to how dumb the compiler/proof-checker can be. Nevertheless, what makes definitional lawfulness interesting is that:

it is immediately inspection-testable and the test is actually a proof, unlike with random property testing (QuickCheck) for example;

if the compiler can recognize the monad laws by mere simplification, that very likely implies that it can simplify the overhead of more complex monadic expressions.

That implication is not obviously true, it’s actually false in practice without some manual help, but definitional lawfulness gets us some of the way there. A sufficient condition is for inlining and simplification to be confluent (“the order of simplification does not matter”), but inlining being limited by heuristics jeopardizes that property because those heuristics depend on the order of simplifications.

Custom rewrite rules also make the story more complicated, which is why I just consider it cheating, and prefer structures that enable fusion by simplification, such as difference lists and other continuation-passing tricks.

`(<>)`

is also called`mappend`

, and at the level of Core there is an unfortunately visible difference, which is why the source code uses`mappend`

.↩︎Paradoxically, it is sometimes called “eta-reduction” even if it makes the term look “bigger”, because it also makes them look more “regular”.↩︎

There is in fact a deeper analogy. Pairs can be seen as (dependent) functions with domain

`Bool`

. Pairs and functions can also be viewed in terms of a more general notion of “negative types”, “codata”.↩︎

The Haskell library *generic-data* provides generic implementations of standard type classes. One of the goals of *generic-data* is to generate code which performs (at least) as well as something you would write manually. Even better, we can make sure that the compiled code is identical to what one would obtain from hand-written code, using inspection testing^{1}.

During the exercise of building some inspection tests for *generic-data*, the most notable discrepancy to resolve was with the `Traversable`

class.

To improve the traversals generated by *generic-data*, a useful data structure is *applicative difference lists* (it’s also been called `Boggle`

before). It is a type-indexed variant of difference lists which simplifies applicative operations at compile time. This data structure is available as a library on Hackage: *ap-normalize*.

The `Traversable`

type class describes type constructors `t`

which can be “mapped over” similarly to `Functor`

, but using an effectful function `a -> f b`

:

```
class (Functor t, Foldable t) => Traversable t where
traverse :: forall f a b. Applicative f => (a -> f b) -> t a -> f (t b)
```

(We will not discuss the `Functor`

and `Foldable`

super classes.)

Throughout this post, fixing an applicative functor `f`

, *actions* are what we call values of type `f x`

, to evoke the idea that they are first-class computations.

Intuitively, a traversal walks over a data structure `t a`

which contains “leaves” of type `a`

, and performs some action (given by the `a -> f b`

function) to transform them into “leaves” of type `b`

, producing a new data structure `t b`

.

There is a straightforward recipe to define `traverse`

for many data types. This is best illustrated by a example. We will call this the “naive” definition because it’s just the obvious thing to write if one were to write a traversal. That is not meant to convey that it’s bad in any way.

Using applicative laws,

`Example <$> pure a`

can fuse into`pure (Example a)`

, that in turn can fuse with the following`(<*>)`

into`Example a <$> ...`

.Going the other way, we can expand

`Example <$>`

into`pure Example <*>`

for a more uniform look.`liftA2`

can also be used to fuse the first`(<$>)`

and`(<*>)`

.

For the sake of completeness, the recipe is roughly as follows, for a data type with a type parameter `a`

:

traverse each field individually, in one of the following ways depending on its type:

if its type does not depend on

`a`

(e.g.,`Int`

), then the field is kept intact, and returned purely (using`pure`

);if its type is equal to

`a`

, then we can apply the function`a -> f b`

to it;if its type is of the form

`t a`

, where`t`

is a traversable type we`traverse`

it recursively;

combine the field traversals using the

`Applicative`

combinator`(<*>)`

.

Noticing that the only case where we need another type to be traversable is to traverse fields whose type depend on `a`

, we can define `traverse`

for all types which don’t involve non-traversable primitive types such as `IO`

or `(->)`

.

This is quite formulaic, and it can be automated in many ways. The most practical solution is to use the GHC extension `DeriveTraversable`

(which implies `DeriveFunctor`

and `DeriveFoldable`

):

```
{-# LANGUAGE DeriveTraversable #-}
data Example a = Example Int a (Maybe a) (Example a)
deriving (Functor, Foldable, Traversable)
```

You may be wondering: if it’s built into GHC, why would I bother with generic deriving? There isn’t a substantial difference from a user’s perspective (so you should just use the extension). But from the point of view of implementing a language, deriving instances for a particular type class is a pretty ad hoc feature to include in a language specification. Generic deriving subsumes it, turning that feature into a library that regular people, other than compiler writers, can understand and improve independently from the language itself.

Well, that’s the theory. Generic metaprogramming in Haskell has a ways to go before it can fully replace an integrated solution like `DeriveTraversable`

. The biggest issue currently is that GHC does not perform as much inlining as one might want. A coarse but effective answer to overcome this obstacle might be an annotation to explicitly indicate that some pieces of source code must be gone after compilation.

So the other way to derive `traverse`

that I want to talk about is to use `GHC.Generics`

. *generic-data* provides a function `gtraverse`

which can be used as the definition of `traverse`

for many types with a `Generic1`

instance. Although it does not use the `deriving`

keyword^{2}, it is still a form of “deriving” since the syntax of the instance does not depend on the particular shape of `Example`

.

```
{-# LANGUAGE DeriveGeneric #-}
data Example a = Example Int a (Maybe a) (Example a)
deriving Generic1
instance Traversable Example where
traverse = gtraverse
```

All three instances above *behave* the same (the naive one, the `DeriveTraversable`

one, and the generic one). However, if we look not only at the *behavior* but the generated *code* itself, the optimized GHC Core code produced by the compiler is not the same in all cases. The definition of `gtraverse`

until *generic-data* version 0.8.3.0 results in code which looks like the following (various details were simplified for clarity’s sake^{3}):

That function traverses each field (using `pure`

, `update`

, `traverse update`

), wraps them in a newtype `K1`

, and collects them in a pair of pairs `(_ :*: _) :*: (_ :*: _)`

, and then replaces those pairs with the `Example`

constructor.

Clearly, this does not look the same as the naive version shown earlier. Let’s enumerate the differences:

There are many uses of

`(<$>)`

, which can be fused together.It constructs and immediately destructs intermediate pairs

`(:*:)`

. It would be more direct to wrap the fields in`Example`

.The actions are associated differently (

`(a <*> b) <*> (c <*> d)`

) whereas the previous two implementations associate actions to the left (`((a <*> b) <*> c) <*> d`

).

This definition cannot actually be simplified because the applicative functor `f`

and its operations `(<$>)`

and `(<*>)`

are abstract, their definitions are not available for inlining. This definition (Figure B) is only equivalent to the naive one (Figure A) if we assume the laws of the `Applicative`

class (notably associativity), but the compiler has no knowledge of those. And so the simplifier is stuck there.

To be fair, it’s actually not so clear that these differences lead to performance problems in practice. Here are some mitigating factors to consider:

For many concrete applicative functors, inlining

`(<$>)`

and`(<*>)`

does end up simplifying away all of the noise.Even if we didn’t build up pairs explicitly using

`(:*:)`

,`(<*>)`

may allocate closures which are about as costly as pairs anyway.Tree-like (

`Free`

) monads are more performant when associating actions to the right (`a <*> (b <*> (c <*> d))`

).

Nevertheless, it seems valuable to explore alternative implementations. To echo the three points just above:

The new definition of

`gtraverse`

will simplify to the naive version even while the applicative functor is still abstract.Properly measuring the subtle difference between pairs and closures sounds like a pain. Knowing that the code that eventually runs allows one to switch from one system (

`DeriveTraversable`

) to another (*generic-data*) without risk of regressions—modulo all the transient caveats that make this ideal story not true today.If actions must be associated another way, this is just another library function to be written.

The main idea in this solution is that the definition of `gtraverse`

should explicitly reassociate and simplify the traversal.

An obvious approach is thus to represent the syntax of the traversal explicitly, as an algebraic data type where a constructor encodes the applicative combinator `(<*>)`

, possibly in a normalized form. This is a free applicative functor:

```
data Free f a where
Pure :: a -> Free f a
Ap :: Free f (b -> a) -> f b -> Free f a
-- This is actually a regular data type, just using GADTSyntax for clarity.
```

However, this is a recursive structure: that blocks compiler optimizations because GHC does not inline recursive functions (if it did, this could be a viable approach).

Notice that this free applicative functor is basically a list, a heterogeneous list of `f b`

values where `b`

varies from element to element. If recursion is the problem, maybe we should find another representation of lists which is not recursive. As it turns out, difference lists will be the answer.

Let us digress with a quick recap of difference lists, so we’re on the same page, and as a reference to explain by analogy the fancier version that’s to come.

Here’s a list.

`1 : 2 : 3 : []`

A difference list is a list with a hole `_`

in place of its end `[]`

. The word “difference” means that it is the result of “subtracting” the end from the list:

`1 : 2 : 3 : _`

In Haskell, a difference list is represented as a function, whose input is a list to fill the hole with, and whose output is the whole list after adding the difference around the hole. A function “from holes to wholes”.

```
type DList a = [a] -> [a]
example :: DList Int
example hole = 1 : 2 : 3 : hole
```

Difference lists are interesting because they are super easy to concatenate: just fill the hole in one difference list with the other list. In Haskell, this is function composition, `(.)`

. For instance, the list above is the composition of the two lists below:

```
ex1, ex2 :: DList Int
ex1 hole = 1 : hole
ex2 hole = 2 : 3 : hole
example = ex1 . ex2
```

Difference lists are an alternative representation of lists with a performant concatenation operation, which doesn’t allocate any transient intermediate structure. The trade-off is that other list operations are more difficult to have, notably because it’s expensive to inspect the list to know whether it is empty or not.

The following functions complete the picture with a constructor of difference lists and an eliminator into regular list. Internally they involve the list constructors in very simple ways, which is actually key to the purpose of difference lists. `singleton`

is literally the list constructor `(:)`

(thus representing the singleton list `[x]`

as a list with a hole `x : []`

), and `toList`

applies a difference list to the empty list (filling the hole with `[]`

).

```
singleton :: a -> DList a
singleton = (:)
toList :: DList a -> [a]
toList u = u []
```

Why were we talking about lists? Applicative programming essentially consists in describing lists of actions (values of type `f x`

for some `x`

) separated by `(<*>)`

and terminated (on the left end) by `pure _`

(we’ll come back later to putting `(<$>)`

back).

```
pure Example
<*> pure a
<*> update b
<*> traverse update c
<*> traverse update d
```

Once we have a notion of lists, a difference list is a list with a hole `_`

in place of its end:

```
_
<*> pure a
<*> update b
<*> traverse update c
<*> traverse update d
```

So constructing a traversal as a difference list of actions would allow us to maintain this structure of left-associated `(<*>)`

. In particular, this will guarantee that there are no extra `(<$>)`

in the middle. Once we’ve completed the list, we top it off with a term `pure _`

, where the remaining hole expects a pure function without any of the applicative functor nonsense which was blocking compile-time evaluation.

Let’s see how it looks in Haskell. This is where things ramp up steeply. While the types may look daunting, I want to show that we can ignore 90% of it to see that, under the hood, this is the same as plain difference lists `[a] -> [a]`

.

You’re not missing much from ignoring 90% of the code: it is entirely constrained by the types. That’s the magic of parametricity.

The first thing to note is that since we’ve replaced the list constructor `(:)`

with `(<*>)`

, actions `f x`

represent both individual list elements `a`

and whole lists `[a]`

.

We will draw the analogy between simple difference lists and applicative difference lists by *erasure*. Erase the type parameter of `f`

, and anything that has something to do with that parameter. What is left is basically the code of `DList`

, carefully replacing `f`

with `a`

or `[a]`

as is appropriate.^{4}

A first example is that `(<*>)`

thus corresponds to both `flip (:)`

and `(++)`

(`flip (:)`

because as we will soon see, we are really building snoc lists here).

```
-- (<*>) vs (:) and (++)
(<*>) :: f (x -> y) -> f x -> f y
:: f -> f -> f -- erased
flip (:) :: [a] -> a -> [a]
(++) :: [a] -> [a] -> [a]
```

For another warm-up example, `fmap`

, which acts on the type parameter of `f`

, erases to the identity function.

```
-- fmap vs id
fmap :: (x -> y) -> f x -> f y
:: f -> f -- erased
id :: [a] -> [a]
```

The applicative difference lists described above are given by the following type `ApDList`

. Similar to simple difference lists, they are also functions “from holes to wholes”, where both “holes” and “wholes” (complete things without holes) are actions in this case, `f (x -> r)`

and `f r`

, if we ignore `Yoneda`

. We will not show the definition of `Yoneda`

, but for the purpose of extending the metaphor with `DList`

, `Yoneda f`

is the same as `f`

:

```
newtype ApDList f x = ApDList (forall r. Yoneda f (x -> r) -> f r)
type ApDList f = ( f -> f ) -- erased
type DList a = ( [a] -> [a])
```

While simple difference lists define a monoid, applicative difference lists similarly define an applicative functor, with a product `(<*>)`

and an identity `pure`

which indeed erase to the concatenation of difference lists (function composition) and the empty difference list (the identity function).

An important fact about this instance is that it has no constraints whatsoever; the type constructor `f`

can be anything. The lack of constraints restricts the possible constructs used in this instance. It’s all lambdas and applications: that’s how we can tell without even looking at the definitions that `pure`

and `(<*>)`

can only be some flavor of the identity function and function composition.

```
instance Applicative (ApDList f) where
pure x = ApDList (\t -> lowerYoneda (fmap ($ x) t))
ApDList uf <*> ApDList ux = ApDList (\t -> ux (Yoneda (\c -> uf (fmap (\d e -> c (d . e)) t))))
```

Empty difference lists: erasure of `pure`

, which corresponds to `id`

.

```
-- pure vs id, signature
pure :: x -> ApDList f x -- Empty ApDList
id :: DList a -- Empty DList
-- pure vs id, definition
pure x = ApDList (\t -> lowerYoneda (fmap ($ x) t))
id = (\t -> t )
```

Where `lowerYoneda`

is also analogous to the identity function.

`lowerYoneda :: Yoneda f x -> f x`

Concatenation of difference lists: erasure of `(<*>)`

, which corresponds to `(.)`

.

```
-- (<*>) vs (.), signature
(<*>) :: ApDList f (x -> y) -> ApDList f x -> ApDList f y -- Concatenate ApDList
(.) :: DList a -> DList a -> DList a -- Concatenate DList
-- (<*>) vs (.), definition
ApDList uf <*> ApDList ux = ApDList (\t -> ux (Yoneda (\c -> uf (fmap (\d e -> c (d . e)) t))))
uf . ux = (\t -> ux ( uf t ))
```

Remark: this composition operator is actually flipped. The standard definition goes `uf . ux = (\t -> uf (ux t))`

. This is fine here because applicative lists are actually snoc lists—the reverse of “cons”—where elements are added to the right of lists, so the “holes” of the corresponding difference lists are on the left:

```
uf = ((_ <*> a) <*> b) <*> c) -- A snoc list, separated by (<*>)
ux = (_ <*> x) <*> y
uf <*> ux = ((((_ <*> a) <*> b) <*> c) <*> x) <*> y
```

To concatenate `uf`

and `ux`

, we put `uf`

in the hole on the left end of `ux`

, rather than the other way around; this is why, in the definition above, `uf`

is inside and `ux`

is outside.

We have defined *concatenation* to combine applicative difference lists. We also need ways to construct and eliminate them. We *lift* elements as singleton lists and *lower* lists into simple actions.

Lifting creates a new `ApDList`

, with the invariant that it represents a left-associated list of actions separated by `(<*>)`

(the left-associativity is why it needs to be a snoc list). That invariant is preserved by the concatenation operation we defined just earlier. One can easily check that `liftApDList`

is the only function in this little `ApDList`

library (4 functions) where `(<*>)`

from the `Applicative f`

instance is used.

```
-- Singleton ApDList
liftApDList :: Applicative f => f x -> ApDList f x
listApDList u = ApDList (\t -> lowerYoneda t <*> u)
-- Singleton DList (snoc version)
snocSingleton :: a -> DList a
snocSingleton u = (\t -> t ++ [u])
```

Lowering consumes an `ApDList`

by filling the hole with an action, producing a whole action. This is the only function about `ApDList`

where `pure`

from `Applicative f`

is used. We use `pure`

to terminate lists of actions in the same way `[]`

terminates regular lists. (This is oversimplified from the real version.)

```
lowerApDList :: Applicative f => ApDList f x -> f x
lowerApDList (ApDList u) = u (Yoneda (\f -> pure f))
-- By analogy
toList :: DList a -> [a]
toList u = u []
-- lowerApDList vs toList
lowerApDList (ApDList u) = u (Yoneda (\f -> pure f))
u = u ( pure _) -- erased
toList u = u []
```

Having defined difference lists and their basic operations, we can pretend that they are really just lists. Similarly, we can pretend that applicative difference lists `ApDList f x`

are just actions `f x`

, thanks to there being an instance of the same `Applicative`

interface. With that setup, fixing the generic `gtraverse`

function is actually a very small change, that will be explained through an example. We started with this result:

After the patch, we get the following (Figure C), where the only textual difference is that we inserted `lowerApDList`

at the root and `liftApDList`

at the leaves of the traversal. That changes the types of things in the middle from `f`

to `ApDList f`

. In the source code behind `gtraverse`

, this type change appears as a single substitution in one type signature.^{5}

Again leveraging parametricity, we don’t need to work out the simplification in detail. Just from looking at the `traverseExample`

definition here and the `ApDList`

library outlined above, we can tell the following:

the resulting term will have three occurrences of

`(<*>)`

from`Applicative f`

, since there are three uses of`liftApDList`

;all

`Applicative`

combinators for`ApDList`

(`pure`

,`(<*>)`

and`<$>`

) maintain a list structure as an invariant, so that in the end these three`(<*>)`

will be chained together in the shape of a list;finally,

`lowerApDList`

puts a`pure`

at the end of that list.

Provided everything does get inlined, the resulting term is going to be of this form.

```
traverseExample :: Applicative f => (a -> f b) -> Example a -> f (Example b)
traverssExample update (Example a b c d) =
pure _
<*> update b
<*> traverse update c
<*> traverse update d
```

There is just one remaining unknown, which is the argument of `pure`

.

Whatever it is, it contains only stuff that was between

`lowerApDList`

and`liftApDList`

and that was not an`ApDList`

combinator (`pure`

,`(<*>)`

,`(<$>)`

), which we expect to have been reduced. That leaves us with`a`

, the constructors`(:*:)`

and`K1`

, and the lambda at the top.The only constructs allowed to combine those are function applications and lambdas, because the combinators where they could have come from,

`pure`

,`(<*>)`

, and`(<$>)`

, are pure lambda terms.

With the constraint that it must all be well-typed, that doesn’t leave a lot of room. I’d be hard-pressed to find another way to put these together:

```
b0 c0 d0 ->
\K1 a' :*: K1 b') :*: (K1 c' :*: K1 d')) -> Example a' b' c' d')
(\((K1 a :*: K1 b0) :*: (K1 c0 :*: K1 d0)) ((
```

Which beta-reduces to:

```
b0 c0 d0 -> Example a b0 c0 d0
\
-- equal to --
Example a
```

With all of that, we’ve managed to make `gtraverse`

simplify to roughly the naive definition, using only simplification rules from the pure Core calculus, and no external laws as rewrite rules.

There are a few more details to discuss, which explain the remaining differences between the naive definition, this latest definition, and what’s actually implemented in the latest version of *generic-data* (0.9.0.0).

`pure`

on the rightThe naive version of `traverse`

starts with `Example <$> pure a`

. By an `Applicative`

law, it is equivalent to `pure (Example a)`

. The naive version (Figure A) is indeed too naive: for fields of constant types, which don’t depend on the type parameter of the applicative functor `f`

, we can pass them directly as argument to the constructor instead of wrapping them in `pure`

and unwrapping them at run time.

The new version of `gtraverse`

achieves that by *not* wrapping `pure a`

under `liftApDList`

(Figure C). So this `pure`

does not come from `Applicative f`

, but from `Applicative (ApDList f)`

, where it is defined as the identity function (more or less). Notably, this “fusion” happens even if `pure`

is used in the middle of the list, not only at the end.

`pure`

on the leftWhile `pure`

on the right of `(<*>)`

got simplified, there is a remaining `pure`

on the left, which could be turned into `(<$>)`

(Figure D).

Applicative difference lists alone won’t allow us to do that, because all actions are immediately put into arguments of `(<*>)`

by `liftApDList`

. We cannot inspect applicative difference lists and change how the first element is handled, because it is an argument to an abstract `(<*>)`

. We can make another construction on top of applicative difference lists to postpone wrapping the first element in a difference list, so we can then use `(<$>)`

instead of `(<*>)`

(and an extra `pure`

) to its left. There is an extra constructor to represent an empty list of actions.

```
data ApCons f c where
PureAC :: c -> ApCons f c
LiftAC :: (a -> b -> c) -> f a -> ApDList f b -> ApCons f c
instance Applicative f => Applicative (ApCons f) where {- ... -}
```

In fact, we can extend that idea to use another applicative combinator, `liftA2`

which can do the job of one `(<$>)`

and one `(<*>)`

at the same time. We take off the first two elements of the list, using two extra constructors to represent empty and singleton lists.

```
data ApCons2 f c where
PureAC2 :: c -> ApCons2 f c
FmapAC2 :: (b -> c) -> f b -> ApCons2 f c
LiftApAC2 :: (a -> b -> c -> d) -> f a -> f b -> ApDList f c -> ApCons2 f d
-- Encodes liftA2 _ _ _ <*> _
instance Applicative f => Applicative (ApCons2 f) where {- ... -}
```

The complete implementation can be found in the library *ap-normalize*.

We’ve described a generic implementation of `traverse`

which can be simplified to the same Core term as a semi-naively handwritten version, using only pure lambda-calculus transformations—a bit more than beta-reduction, but nothing as dangerous as custom rewrite rules.

The `Traversable`

instances generated using the *generic-data* library are now identical to instances derived by GHC using the well-oiled `DeriveTraversable`

extension—provided sufficient inlining. This is a small step towards turning anything that has to do with deriving into a generic affair.

The heavy lifting is performed by *applicative difference lists*, an adaptation of difference lists, from list (really monoids) to applicative functors. This idea is far from new, depending on how you look at it:

the adaptation can also be seen in terms of heterogeneous lists first, with applicative functors adding a small twist to managing type-level lists (entirely existentially quantified);

this is an instance of Cayley’s generalized representation theorem in category theory. (More on this below.)

Difference lists are well-known as a technique for improving asymptotic (“big-O”) run time; it is less widespread that they can often be optimized away entirely using only inlining and simplification. (I’ve written about this before too.) Inlining and simplification are arguably a lightweight compiler optimization (“peephole optimization” of lambda terms) as opposed to, for instance, custom rewrite rules, which are unsafe, and other transformations that rely on non-trivial program analyses.

While a sufficiently smart compiler could be hypothesized to optimize anything, surely it is more practical to invest in data structures that can be handled by even today’s really dumb compilers.

I like expressive static type system. Here, the rich types allow us to give applicative difference lists the interface of an applicative functor, so that the original implementation of `gtraverse`

barely has to change, mostly swapping one implementation of `(<*>)`

for another. As I’ve tried to show here, while the types can appear a little crazy, we can rely on the fact that they must be erased during compilation, so the behavior of the program (what we care about) can still be understood in terms of more elementary structures: if we ignore coercions that have no run-time presence and a few minor details, an applicative difference list is just a difference list

Parametric polymorphism constrains the rest of the implementation; typed holes are a pretty fun way to take advantage of that, but if a program is uniquely constrained, maybe it shouldn’t appear in source code in the first place.

This technique is a form of staged metaprogramming: we use difference lists as a compile-time structure that should be eliminated by the compiler. This is actually quite similar to Typed Template Haskell,^{6} where the main difference is that the separation between compile-time and run-time computation is explicit. The advantage of that separation is that it allows arbitrary recursion and IO at compile-time, and there is a strong guarantee that none of it is left at run-time.

However, there are limitations on the expressiveness of Typed Template Haskell: only expressions can be quoted (whereas (Untyped) Template Haskell also has representations for patterns, types, and declarations), and we can’t inspect quoted expressions without giving up types (using `unType`

).

Aside from compile-time IO which is extremely niche anyway, I believe the other advantages have more to do with `GHC.Generics`

being held back by temporary implementation issues than some fundamental limitation of that kind of framework. In contrast, what makes `GHC.Generics`

interesting is that it is minimalist: all it gives us is a `Generic`

instance for each data type, the rest of the language stays the same. No need to quote things if it’s only to unquote them afterwards. Rather, a functional programming language can be its own metalanguage.

This is a **purely theoretical note** for curious readers. You do not need to understand this to understand the rest of the library. You do not need to know category theory to be an expert in Haskell.

`ApDList`

is exactly `Curry (Yoneda f) f a`

, where `Curry`

and `Yoneda`

are defined in the *kan-extensions* library as:

```
newtype Curry f g a = Curry (forall r. f (a -> r) -> g r)
newtype Yoneda f a = Yoneda (forall r. (a -> r) -> f r)
```

`Curry`

is particularly relevant to understand the theoretical connection between applicative difference lists and plain difference lists `[a] -> [a]`

via Cayley’s theorem:

A monoid`m`

is a submonoid of`Endo m = (m -> m)`

: there is an injective monoid morphism`liftEndo :: m -> Endo m`

.

(More precisely, `lift = (<>)`

.)

`Endo m`

corresponds to difference lists if we take the list monoid `m = [a]`

.

This theorem can be generalized to other categories, by generalizing the notion of monoid to *monoid objects*, and by generalizing the definition of `Endo`

to an *exponential object* `Endo m = Exp(m, m)`

.

As it turns out, applicative functors are monoid objects, and the notion of exponential is given by `Curry`

above.

An applicative functor`f`

is a substructure of`Endo f = Curry f f`

: there is an injective transformation`liftEndo :: f a -> Endo f a`

.

(“Sub-applicative-functor” is a mouthful.)

However if we take that naive definition `Endo f = Curry f f`

, that is different from `ApDList`

(missing a `Yoneda`

), and it is not what we want here. The instance `Applicative (Endo f)`

inserts an undesirable application of `(<$>)`

in its definition of `pure`

.

The mismatch is due to the fact that the syntactic concerns discussed here (all the worrying about where `(<$>)`

and `(<*>)`

get inserted) are not visible at the level of that categorical formalization. Everything is semantic, with no difference between `pure f <*> u`

and `f <$> u`

for instance.

Anyway, if one really wanted to reuse `Curry`

, `Curry (Yoneda f) (Yoneda f)`

should work fine as an alternative definition of `ApDList f`

.

Cayley’s theorem also applies to monads, so these three things are, in a sense, the same:

- Difference lists (
`Endo`

monoid) - Applicative difference lists (
`ApDList`

applicative functor) - Continuation transformers (continuation/codensity monad)

For more details about that connection, read the paper *Notions of computation as monoids*, by Exequiel Rivas and Mauro Jaskelioff (JFP 2017).

The same structure is already used in the

*lens*library by the`confusing`

combinator to optimize traversals. For recursive traversals, recursive calls are made in continuation-passing style, which makes these traversals stricter than what you get with*generic-data*.Eric Mertens also wrote about this before,

`ApDList`

was called`Boggle`

.

*A monad is just a submonad of the continuation monad, what’s the problem?**Making Haskell run fast: the many faces of*`reverse`

*Free applicative functors in Coq*

The

*inspection-testing*plugin; see also the paper*A Promise checked is a promise kept: Inspection testing*, by Joachim Breitner (Haskell Symposium 2017).↩︎We also can’t use

`DerivingVia`

for`Traversable`

.↩︎Some of those details: only used

`(<$>)`

and`(<*>)`

, ignoring the existence of`liftA2`

; dropped the`M1`

constructor; kept`K1`

(which is basically`Identity`

) around because it makes things typecheck if we look at it closely enough;`K1`

is actually used by only on of the fields in the derived`Generic1`

instance.↩︎This analogy would be tighter with an abstract monoid instead of lists.↩︎

Except for the fact that I first needed to copy over the

`Traversable`

instances from*base*in a fresh class before modifying them.↩︎A subset of Template Haskell, using the

`TExp`

expression type instead of`Exp`

. This is the approach used in the recent paper*Staged sums of products*, by Matthew Pickering, Andres Löh, and Nicolas Wu (Haskell Symposium 2020).↩︎

As a programming language enthusiast, I find lots of interesting news and discussions on a multitude of social media platforms. I made two sites to keep track of everything new online related to Coq and Haskell:

**Planet Coq**: https://coq.pl-a.net**Haskell Planetarium**:^{1}https://haskell.pl-a.net

If you were familiar with Haskell News, and missed it since it closed down, Haskell Planetarium is a clone of it.

While the inspiration came from Haskell News, this particular project started with the creation of Planet Coq. Since the Coq community is much smaller, posts and announcements are rarer while also more likely to be relevant to any one member, so there is more value in merging communication channels.

I’m told “planets”, historically, were more specifically about aggregating blogs of community members. In light of the evolution of social media, it is hopefully not too far-fetched to generalize the word to encompass posts on the various discussion platforms now available to us. Haskell Planetarium includes the blog feed of Planet Haskell; Planet Coq is still missing a blog feed, but that should only be temporary.

Under the hood, the link aggregators consist of a special-purpose static site generator, written in OCaml (source code). The hope was maybe to write some module of it in Coq, but I didn’t find an obvious candidate with a property worth proving formally. Some of the required libraries, in particular those for parsing (gzip, HTML, JSON, mustache templates), are clearer targets to be rewritten and verified in Coq.

I love pun domains. This one certainly makes me look for new projects related to programming languages (PL) just so that I could host them under that name.

An obvious idea is to spin up new instances of the link aggregator for other programming languages. If someone wants to see that happen, the best way is to clone the source code and submit a merge request with a new configuration containing links relevant to your favorite programming language (guide).

Questions and suggestions about the project are welcome, feel free to open a new issue on Gitlab or send me an email.

Other places for comments:

Thus named to not conflict with the already existing Planet Haskell.↩︎

How can we turn the infamous `head`

and `tail`

partial functions into total functions? You may already be acquainted with two common solutions. Today, we will investigate a more exotic answer using dependent types.

The meat of this post will be written in Agda, but should look familiar enough to Haskellers to be an accessible illustration of dependent types.

The list functions `head`

and `tail`

are frowned upon because they are partial functions: if they are applied to the empty list, they will blow up and break your program.

```
head :: [a] -> a
head (x : _) = x
head [] = error "empty list"
tail :: [a] -> [a]
tail (_ : xs) = xs
tail [] = error "empty list"
```

Sometimes we know that a certain list is never empty. For example, if two lists have the same length, then after pattern-matching on one, we also know the constructor at the head of the other. Or the list is hard coded in the source for some reason, so we can see right there that it’s not empty. In those cases, isn’t it safe to use `head`

and `tail`

?

Rather than argue that unsafe functions are safe to use in a particular situation (and sometimes getting it wrong), it is easier to side-step the question altogether and replace `head`

and `tail`

with safer idioms.

To start, directly pattern-matching on the list is certainly a fine alternative.

Just short of that, one variant of `head`

and `tail`

wraps the result in `Maybe`

so we can return `Nothing`

in error cases, to be unpacked with whatever error-handling mechanism is available at call sites.

```
headMaybe :: [a] -> Maybe a
tailMaybe :: [a] -> Maybe [a]
```

Another variant changes the argument type to be the type of non-empty lists, thus requiring callers to give static evidence that a list is not empty.

```
-- Data.List.NonEmpty
data NonEmpty a = a :| [a]
headNonEmpty :: NonEmpty a -> a
tailNonEmpty :: NonEmpty a -> [a]
```

In this post, I’d like to talk about one more total version of `head`

and `tail`

.

`headTotal`

and `tailTotal`

From now on, let us surreptitiously switch languages to Agda (syntactically speaking, the most disruptive change is swapping the roles of `:`

and `::`

). The functions `headTotal`

and `tailTotal`

are funny because they make the following examples well-typed:

```
(1 ∷ 2 ∷ 3 ∷ []) : Nat
headTotal (1 ∷ 2 ∷ 3 ∷ []) : List Nat tailTotal
```

Unlike

`headMaybe`

, the result has type`Nat`

, not`Maybe Nat`

.Unlike

`headNonEmpty`

, the input list`1 ∷ 2 ∷ 3 ∷ []`

has type`List Nat`

, a plain list, not`NonEmpty`

—or`List⁺`

as it is cutely named in Agda.

`headTotal`

and `tailTotal`

will be defined in Agda, so they are most definitely total. And yet they appear to be as convenient to use as the partial `head`

and `tail`

, where they can just be applied to a non-empty list to access its head and tail.

As you might have noticed, this post is an advertisement for dependent types, which are the key ingredients in the making of `headTotal`

and `tailTotal`

.

Naturally, this example only demonstrates the good points of these functions; we’ll get to the less good ones in time.

Let’s find the type and the body of `headTotal`

. We put question marks as placeholders to be filled incrementally.

```
: ?
headTotal = ? headTotal
```

Obviously the type is going to depend on the input list. To define that dependent type, we will declare one more function to be refined simultaneously.

`headTotal`

is a function parameterized by a type `a`

and a list `xs : List a`

, and with return type `headTotalType xs`

, which is another function of `xs`

. That tells us to add some quantifiers and arrows to the type annotations.

```
: ∀ {a : Set} (xs : List a) → ?
headTotalType = ?
headTotalType
: ∀ {a : Set} (xs : List a) → headTotalType xs
headTotal = ? headTotal
```

(Note: `Set`

is the “type of types” in Agda, called `Type`

in Haskell.)

`headTotalType`

must return a type, i.e., a `Set`

. Put that to the right of `headTotalType`

’s arrow. A function producing a type is also called a *type family*: a family of types indexed by lists `xs : List a`

.

```
: ∀ {a : Set} (xs : List a) → Set
headTotalType = ?
headTotalType
: ∀ {a : Set} (xs : List a) → headTotalType xs
headTotal = ? headTotal
```

Pattern-match on the list `xs`

, splitting both functions into two cases.

```
: ∀ {a : Set} (xs : List a) → Set
headTotalType (x ∷ xs) = ?
headTotalType = ?
headTotalType []
: ∀ {a : Set} (xs : List a) → headTotalType xs
headTotal (x ∷ xs) = ?
headTotal = ? headTotal []
```

In the non-empty case (`x ∷ xs`

), we know the head of the list is `x`

, of type `a`

. Therefore that case is solved.

```
: ∀ {a : Set} (xs : List a) → Set
headTotalType (_ ∷ _) = a
headTotalType = ?
headTotalType []
: ∀ {a : Set} (xs : List a) → headTotalType xs
headTotal (x ∷ _) = x
headTotal = ? headTotal []
```

What about the empty case? We are looking for two values `headTotalType []`

and `headTotal []`

such that the former is the type of the latter:

`headTotal [] : headTotalType []`

That tells us that the type `headTotalType []`

is inhabited.

What else can we say about those unknowns?

…

After much thought, there doesn’t appear to be any requirement besides the inhabitation of `headTotalType []`

. Then, a noncommittal solution is to instantiate it with the unit type, avoiding the arbitrariness in subsequently choosing its inhabitant, since there is only one. The unit type and its unique inhabitant are denoted `tt : ⊤`

in Agda.

```
: ∀ {a : Set} (xs : List a) → Set
headTotalType (_ ∷ _) = a
headTotalType = ⊤ -- unit type
headTotalType []
: ∀ {a : Set} (xs : List a) → headTotalType xs
headTotal (x ∷ _) = x
headTotal = tt -- unit value headTotal []
```

To recapitulate that last case, when the list is empty, there is no head to take, but we must still produce *something*. Having no more requirements, we produce a boring thing, which is `tt`

.

The definition of `headTotal`

is now complete.

Following similar steps, we can also define `tailTotal`

.

```
: ∀ {a : Set} (xs : List a) → Set
tailTotalType (_ ∷ _) = List a
tailTotalType = ⊤
tailTotalType []
: ∀ {a : Set} (xs : List a) → tailTotalType xs
tailTotal (_ ∷ xs) = xs
tailTotal = tt tailTotal []
```

And with that, we can finally build the examples above!

```
_number : Nat
some_number = headTotal (1 ∷ 2 ∷ 3 ∷ [])
some
_list : List Nat
some_list = tailTotal (1 ∷ 2 ∷ 3 ∷ []) some
```

We’re pretty much done, but we can still refactor a little to make this nicer to look at.

First, notice that the two type families `headTotalType`

and `tailTotalType`

are extremely similar, differing only on whether the `∷`

case equals `a`

or `List a`

. Let’s merge them into a single function: we define a type `b `ifNotEmpty` xs`

, equal to `b`

if `xs`

is not empty, otherwise equal to `⊤`

.

```
_`ifNotEmpty`_ : ∀ {a : Set} (b : Set) (xs : List a) → Set
(_ ∷ _) = b
b `ifNotEmpty` _ `ifNotEmpty` [] = ⊤
: ∀ {a : Set} (xs : List a) → a `ifNotEmpty` xs
headTotal : ∀ {a : Set} (xs : List a) → List a `ifNotEmpty` xs tailTotal
```

The infix notation reflects the intuition that `headTotal`

has a meaning close to a function `List a → a`

, and similarly with `tailTotal`

.

Finally, one last improvement is to reconsider the intention behind the unit type `⊤`

in this definition. If `headTotal`

or `tailTotal`

are applied to an empty list, we probably messed up somewhere. Such mistakes are made easier to spot by replacing `⊤`

with an isomorphic but more appropriately named type. If an empty list causes an error, we will either see a `Failure`

to unify, or some `ERROR`

screaming at us.

```
data Failure : Set where
: Failure
ERROR
_`ifNotEmpty`_ : ∀ {a : Set} (b : Set) (xs : List a) → Set
(_ ∷ _) = b
b `ifNotEmpty` = Failure
b `ifNotEmpty` []
: ∀ {a} (xs : List a) → a `ifNotEmpty` xs
headTotal (x ∷ _) = x
headTotal = ERROR
headTotal []
: ∀ {a} (xs : List a) → List a `ifNotEmpty` xs
tailTotal (_ ∷ xs) = xs
tailTotal = ERROR tailTotal []
```

We’ve now come full circle. The bodies of `headTotal`

and `tailTotal`

closely resemble those of the partial `head`

and `tail`

functions at the beginning of this post. The difference is that dependent types keep track of the erroneous cases.

A working Agda module with these functions can be found in the source repository of this blog. There is also a version in Coq.

(This was my first time programming in Agda. This language is super smooth.)

One might question how useful `headTotal`

and `tailTotal`

really are. They may be not so different from `headNonEmpty`

and `tailNonEmpty`

, because they’re all only meaningful with non-empty lists: the burden of proof is the same. Even if we added `ERROR`

values to cover the `[]`

case, the point is really to not ever run into that case.

Moreover, to actually get the head out, `headTotal`

requires its argument to be *definitionally* non-empty, otherwise the ergonomics are not much better than `headMaybe`

. In other words, for `headTotal e`

to have type `a`

rather than `a `ifNotEmpty` e`

, the argument `e`

must actually be an expression which reduces to a non-empty list `e1 :: e2`

, but that literally gives us an expression `e1`

for the head of the list. Why not use it directly?

The catch is that the expression for the head might be significantly more complex than the expression for the list itself, so we’d still rather write `headTotal e`

than whatever that reduces to.

For example, I’ve used a variation of this technique in a type-safe implementation of `printf`

.^{1} The function `printf`

takes a *format string* as its first argument, basically a string with holes. For instance, `"%s du %s"`

is a format string with two placeholders for strings. Then, `printf`

expects more arguments to fill in the holes. Once supplied, the result is a string with the holes correspondingly filled. Importantly, format strings may vary in number and types of holes.

```
"%s du %s" "omelette" "fromage" ≡ "omelette du fromage"
printf "%d * %d = %d" 6 9 42 ≡ "6 * 9 = 42" printf
```

Intuitively, that means the type of `printf`

depends on the format string:

```
: ∀ (fmt : string) → printfType fmt
printf : string → Set printfType
```

However, not all strings are valid format strings. If a special character is misused, for example, `printf`

may evaluate to `ERROR`

.^{2}

```
"%m" = ERROR -- "%m" makes no sense
printf "%m" = Failure printfType
```

In all “correct” programs, `printf`

is meant to be used with valid and statically known format strings, so the `ERROR`

case doesn’t happen. Nevertheless, `printf "%d * %d = %d"`

is a simpler expression to write than whatever it evaluates to, which would be some lambda that serializes its three arguments according to that format string.

I don’t have more examples right now, but this *dependently typed validation* technique seems well-suited to more general kinds of compile-time configurations, where it would not be practical to define a type encoding the necessary invariants.

Another hypothetical use case would be to extract the output of some parameterized search algorithm. Let’s imagine that it may not find a solution in general, so its return type should be a possibly empty `List a`

. If you know that it does output something for some hard-coded parameters, then `headTotal`

allows you to access it with little ceremony.

On a related note, `ifNotEmpty`

seems generalizable to a dependently typed variant of the `Maybe`

monad, keeping track of all the conditions for it to not be a `Failure`

at the type level.

`head`

is partial, undefined at`[]`

.`headTotal`

maps into two types,`Failure`

and`a`

, depending on the value of the input.`headMaybe`

maps into`Maybe a`

, a bigger type than`a`

, and cutting`a`

out of it would take a bit of work.`headNonEmpty`

has the cleanest looking diagram from putting the problem of`[]`

out of scope.

What other variations are there?

*coq-printf*. This trick is no longer used since version 2.0.0 though, a better alternative having been found in Coq’s new system for string notations.↩︎It would also be reasonable to ignore the “error” and accept all strings as valid.↩︎

I have just released two libraries to enhance QuickCheck for testing higher-order properties: *quickcheck-higherorder* and *test-fun*.

This is a summary of their purpose and main features. For more details, refer to the README and the implementations of the respective packages.

This project started from experiments to design laws for the *mtl* library. What makes a good law? I still don’t know the answer, but there is at least one sure sign of a bad law: find a counterexample! That’s precisely what property-based testing is useful for. As a byproduct, if you can’t find a counterexample after looking for it, that is some empirical evidence that the property is valid, especially if you expect counterexamples to be easy to find.

Ideally we would write down a property, and get some feedback from running it. Of course, complex applications will require extra effort for worthwhile results. But I believe that, once we have our property, the cost of entry to just start running test cases can be reduced to zero, and that many applications may benefit from it.

QuickCheck already offers a smooth user experience for testing simple “first-order properties”. *quickcheck-higherorder* extends that experience to *higher-order properties*.

A *higher-order property* is a property quantifying over functions. For example:

```
prop_bool :: (Bool -> Bool) -> Bool -> Property
prop_bool f x = f (f (f x)) === f x
```

Vanilla QuickCheck is sufficient to test such properties, provided you know where to find the necessary utilities. Indeed, simply passing the above property to the `quickCheck`

runner results in a type error:

```
main :: IO ()
main = quickCheck prop_bool -- Type error!
```

`quickCheck`

tries to convert `prop_bool`

to a `Property`

, but that requires `Bool -> Bool`

to be an instance of `Show`

, which is of course absurd.^{1}

Instead, functions must be wrapped in the `Fun`

type:

```
prop_bool' :: Fun Bool Bool -> Bool -> Property
prop_bool' (Fn f) x = f (f (f x)) === f x
main :: IO ()
main = quickCheck prop_bool' -- OK!
```

Compounded over many properties, this `Fun`

/`Fn`

boilerplate is repetitive. It becomes especially cumbersome when the functions are contained inside other data types.

*quickcheck-higherorder* moves that cruft out of sight. The `quickCheck'`

runner replaces the original `quickCheck`

, and infers that `(->)`

should be replaced with `Fun`

.

```
-- The first version
prop_bool :: (Bool -> Bool) -> Bool -> Property
prop_bool f x = f (f (f x)) === f x
main :: IO ()
main = quickCheck' prop_bool -- OK!
```

The general idea behind this is to distinguish the *data* that your application manipulates, from its *representation* that QuickCheck manipulates. The data can take any form, whatever is most convenient for the application, but its representation must be concrete enough so QuickCheck can randomly generate it, shrink it, and print it in the case of failure.

Vanilla QuickCheck handles the simplest case, where the data is identical to its representation, and gives up as soon as the representation has a different type, requiring us to manually modify the property to make the representation of its input data explicit. This is certainly not a problem that can generally be automated away, but the UX here still has room for improvement. *quickcheck-higherorder* provides a new way to associate data to its representation, via a type class `Constructible`

, which `quickCheck'`

uses implicitly.

```
class (Arbitrary (Repr a), Show (Repr a)) => Constructible a where
type Repr a :: Type
fromRepr :: Repr a -> a
```

Notably, we no longer require `a`

itself to be an instance of `Arbitrary`

and `Show`

. Instead, we put those constraints on an associated type `Repr a`

, which is thus inferred implicitly whenever values of type `a`

are quantified over.

Aiming to make properties higher-level, more declarative, the `prop_bool`

property above can also be written like this:

```
prop_bool :: (Bool -> Bool) -> Equation (Bool -> Bool)
prop_bool f = (f . f . f) :=: f
```

Where `(:=:)`

is a simple constructor. That defers the choice of how to interpret the equation to the caller of `prop_bool`

, leaving the above specification free of such operational details.

Behind the scenes, this exercises a new type class for testable equality,^{2} `TestEq`

, turning equality into a first-class concept even for higher-order data (the main examples being functions and infinite lists).

```
class TestEq a where
(=?) :: a -> a -> Property
```

For more details, see the README of *quickcheck-higherorder*.

QuickCheck offers a `Fun`

type to express properties of arbitrary functions.^{3} However, `Fun`

is limited to first-order functions. An example of type that cannot be represented is `Cont`

.

The library *test-fun* implements a generalization of `Fun`

which can represent higher-order functions. Any order!

It’s a very simple idea at its core, but it took quite a few iterations to get the design right. The end result is a lot of fun. The implementation exhibits the following characteristics, which are not obvious a priori:

like in QuickCheck’s version, the type of those

*testable functions*is a single GADT, i.e., a closed type, whereas an open design might seem more natural to account for user-defined types of inputs;the core functions to apply, shrink, and print testable functions impose no constraints on their domains;

*test-fun*doesn’t explicitly make use of randomness, in fact, it doesn’t even depend on QuickCheck! The library is parameterized by a functor`gen`

, and almost all of the code only depends on it being an`Applicative`

functor. There is (basically) just one function (`cogenFun`

) with a`Monad`

constraint and with a random generator as an argument.

As a consequence, *test-fun* can be reused entirely to work with Hedgehog. However, unlike with QuickCheck, some significant plumbing is required, which is work in progress. *test-fun* cannot just be specialized to Hedgehog’s `Gen`

monad; it will only work with QuickCheck’s `Gen`

,^{4} so we currently have to break into Hedgehog’s internals to build a compatible version of the “right” `Gen`

.

*test-fun* implements core functionality for the internals of libraries like *quickcheck-higherorder*. Users are thus expected to only depend directly on *quickcheck-higherorder* (or the WIP *hedgehog-higherorder* linked above).

*test-fun* only requires an `Applicative`

constraint in most cases, because intuitively a testable function has a fixed “shape”: we represent a function by a big table mapping every input to an output. To generate a random function, we can generate one output independently for each input, collect them together using `(<*>)`

, and build a table purely using `(<$>)`

.

However this view of “functions as tables” does not extend to higher-order functions, which may only make finite observations of their infinite inputs. A more general approach is to represent functions as decision trees over their inputs. “Function as tables” is the special case where those trees are *maximal*, such that there is a one-to-one correspondence between leaves and inputs. However, maximal trees don’t always exist. Then a random generator must preemptively terminate trees, and that requires stronger constraints such as `Monad`

(intermediate ones like `Alternative`

or `Selective`

might be worth considering too).

For more details, see the README of *test-fun*.

These libraries are already used extensively in my project *checkers-mtl*, which is where most of the code originated from.

One future direction on my mind is to port this to Coq, as part of the QuickChick project. I’m curious about the challenges involved in making the implementation provably total, and in formalizing the correctness of testing higher-order properties.

I’m always looking for opportunities to make testing as easy as possible. I’d love to hear use cases for these libraries you can come up with!

You could hack something in this case because

`Bool`

is a small type, but that does not scale to arbitrary types.↩︎*Shrinking and showing functions (functional pearl)*, by Koen Claessen, in Haskell Symposium 2012.↩︎It must be lazy, in the right way. A random monad built on top of lazy

`State`

is no good either. As of now, QuickCheck’s`Gen`

is the only monad I know which is useful for*test-fun*.↩︎

The previous post showed off the flexibility of the continuation monad to represent various effects. As it turns out, it has a deeper relationship with monads in general.

Disclaimer: this is not a monad tutorial. It will not be enlightening if you’re not already familiar with monads. Or even if you are, probably. That’s the joke.

The starting point is the remark that `lift`

for the `ContT`

monad transformer is `(>>=)`

, and `ContT`

is really `Cont`

.^{1} To make that identity most obvious, we define `Cont`

as a type synonym here.

```
type Cont r a = (a -> r) -> r
lift :: Monad m => m a -> Cont (m r) a
-- Monad m => m a -> (a -> m r) -> m r
lift = (>>=)
```

As a monad transformer, it is certainly an odd one. On the one hand, `Cont (m r)`

is a monad which doesn’t really care whether `m`

is a monad, or anything at all. On the other hand, `lift`

is `(>>=)`

: it directly depends on the full power of a `Monad`

. That contrasts with `StateT`

for example, whose `Monad`

instance uses the transformed monad’s `Monad`

instance, while `lift`

only needs a `Functor`

.

If `lift`

is `(>>=)`

, we can also say that `(>>=)`

is `lift`

, suggesting an alternative definition of monads as types that can be “lifted” into `Cont`

, and “unlifted” back, by passing `pure`

as a continuation.

```
class Monad m where
lift :: m a -> Cont (m r) a
pure :: a -> m a
unlift :: Monad m => Cont (m a) a -> m a
unlift u = u pure
```

We simply renamed `(>>=)`

in the `Monad`

class, nothing changed otherwise.^{2} The new monad laws below are also simple reformulations of the usual monad laws in terms of `lift`

and `unlift`

primarily. There’s a bit of work to fix the third law, but no serious difficulties in the process.^{3}

Nevertheless, such renaming opens the door to another point of view, where monads are merely “subsets” of the `Cont`

monad, and we can reframe the monad laws accordingly. They are the same, and yet, they look completely different.

```
-- Laws for the lift-pure definition of Monad
unlift . lift = id
lift . unlift) (pureCont x) = pureCont x
(lift . unlift) (lift u >>=? \x -> lift (k x))
(= (lift u >>=? \x -> lift (k x))
```

where the `pure`

and `(>>=)`

of `Cont`

are called `pureCont`

and `(>>=?)`

, clarifying that they are defined once for all, independently of the `Monad`

class. That is the key to resolve the apparent circularity in the title.

```
pureCont :: a -> Cont r a
pureCont a = (\k -> k a)
(>>=?) :: Cont r a -> (a -> Cont r b) -> Cont r b
c >>=? d = (\k -> c (\a -> d a k))
```

The second and third law have a common structure. An equation `(lift . unlift) y = y`

expresses the fact that `y`

is in the image of `lift`

. If we also assume the first law `unlift . lift = id`

, that says nothing more.

Another interpretation of the monad laws is now apparent: they say that a monad `m`

is defined by an injection `lift`

into a subset of `Cont (m r)`

closed under `pureCont`

and `(>>=?)`

. That’s why we can say that, by definition, `m`

is a “submonad” of `Cont (m r)`

.^{4}

But with that fact alone, it wouldn’t matter that the codomain of `lift`

is `Cont (m r)`

; any monad `n`

would do, as we could `unlift`

the `(>>=)`

of `n`

down to a `(>>=)`

for `m`

. The special thing about `Cont`

here is that `(>>=)`

for `m`

is literally `lift`

.

To push that idea further, one might propose a more symmetric redefinition of `Monad`

as a pair `(lift, unlift)`

:

```
class Monad m where
lift :: m a -> Cont (m r) a
unlift :: Cont (m a) a -> m a
```

The remaining asymmetry in the first type parameter of `Cont`

can also be removed by using the `CodensityT`

monad transformer:

```
type CodensityT m a = forall r. Cont (m r) a
class Monad m where
lift :: m a -> CodensityT m a
unlift :: CodensityT m a -> m a
```

That’s certainly fine. I just prefer the simplicity of `Cont`

over `CodensityT`

where we can get away with it.^{5}

In any case, we can then define `pure`

by “unlifting” `pureCont`

:

```
pure :: Monad m => a -> m a
pure = unlift . pureCont
```

A small wrinkle with taking `unlift`

as a primitive is that the new laws don’t quite match up to the old laws anymore. For example, for these two laws to be equivalent (remember that `lift`

is `(>>=)`

)…

```
unlift . lift = id
-- Corresponding classical monad law
u >>= pure = u
```

… we really want an extra law to “unfold” `unlift`

, which is its definition in the previous version of `Monad`

.

```
unlift u = u pure
-- or, without pure
unlift u = u (unlift . pureCont)
```

It’s also the only sensible implementation: `unlift`

has to apply its argument `u`

, which is a function, to some continuation. The only good choice is `pure`

, and we have to write it into law to prevent other not-so-good choices.^{6} `pure`

is arguably still a simpler primitive than `unlift`

in practice, because one has to implement `pure`

explicitly anyway.

To sum up, the `(lift, unlift)`

presentation of `Monad`

comes with an extra fourth law to keep `unlift`

in check.

```
unlift . lift = id
lift . unlift) (pureCont x) = pureCont x
(lift . unlift) (lift u >>=? \x -> lift (k x))
(= (lift u >>=? \x -> lift (k x))
unlift u = u (unlift . pureCont)
```

The title seems to be making a circular claim, defining monads in terms of monads. But it can really be read backwards in a well-founded manner.

The “continuation monad” is a concrete thing, consisting of a function on types `(_ -> m r) -> m r`

, and two operations `pureCont`

and `(>>=?)`

(which turn out to be essentially function application and function composition respectively).

A “submonad of the continuation monad” is a subset^{7} of the continuation monad closed under `pureCont`

and `(>>=?)`

.

Although “monad” appears in those terms, we are defining them as individual concepts independently of the general notion of “monad”, which can in turn be defined in those terms. Although confusing, the naming is meant to make sense a posteriori, after everything is defined.

That is an example of a representation theorem, where some general structure is reduced to another seemingly more specific one.

Cayley’s theorem says that every group on a carrier `a`

is a subgroup of the group of permutations (bijective functions) `a -> a`

, and the associated injection `a -> (a -> a)`

is exactly the binary operation of the group on `a`

.

The Yoneda lemma says that `fmap`

is an isomorphism between `m a`

and `forall r. (a -> r) -> m r`

for any functor `m`

(into Set).

Here we said that `(>>=)`

is a (split mono) morphism from `m a`

to `forall r. (a -> m r) -> m r`

for any monad `m`

.

As was pointed out to me on reddit, this is indeed an application of the generalized Cayley representation theorem. This connection is studied in detail in the paper *Notions of Computations as Monoids*, by Exequiel Rivas and Mauro Jaskelioff, JFP 2017. (PDF, extended version)

The paper shows how to view applicative functors, arrows and monads as monoids in different categories, and how useful constructions arise from common abstract concepts such as exponentials, Cayley’s theorem, free monoids. Below is the shortest summary I could make of Cayley’s theorem applied to monads.

Cayley’s theorem generalizes straightforwardly from groups to monoids (omitted), and then from monoids (in the category of sets) to monoids in any category with a tensor `×`

(i.e., a monoidal category) and with exponentials^{8}.

A (generalized) *monoid* `m`

consists of a pair of morphisms `mult : m × m -> m`

and `one : 1 -> m`

, satisfying some conditions. Cayley’s theorem constructs an injection from `m`

into the exponential object `(m -> m)`

, by currying the morphism `mult`

as `m -> (m -> m)`

. Said informally, a monoid `m`

is a submonoid of the monoid of endomorphisms `m -> m`

.

Then consider that statement in the category of endofunctors on *Set*, where the tensor `×`

is functor composition. In this category,

a monoid is a monad, i.e., a pair

`join : m × m -> m`

and`pure : 1 -> m`

(where`1`

is the identity functor);the exponential object

`(m -> m)`

is the*codensity monad*on`m`

(which we’ve been deliberately confusing with`Cont`

throughout the post):`CodensityT m a`

is the set of natural transformations^{9}between the functor`a -> m _`

and`m`

.

`type CodensityT m a = forall r. (a -> m r) -> m r`

Now, Cayley’s theorem translates directly to: a monad is a submonad of the codensity monad.

As before, there will be quite some blur on the distinction between

`Cont`

,`ContT`

, and`CodensityT`

.↩︎Assuming we’ve already ditched

`return`

for`pure`

.↩︎A proof in Coq that these new laws imply the old ones, just to be sure.↩︎

You may be more familiar with notions of “substructure” being refinements of the notion of “subset”, and strictly speaking,

`m`

is not a subset of`Cont (m r)`

. But it is convenient to generalize “substructure” directly to “anything that injects into a structure”, especially for working in category theory or formalizing those ideas in proof assistants based on type theory, where the set-theoretic notion of “subset” is awkward to express literally.↩︎By defining

`CodensityT`

as a type synonym instead of a newtype, we would also run into minor problems with impredicativity and type inference.↩︎I’m not actually sure whether the other laws entail this one.↩︎

“Subset” is not defined but I hope you get the idea.↩︎

or rather, it is sufficient for

`m`

alone to be an exponent, so`(m -> m)`

is defined as an object.↩︎which is not always a set, but we care when it is.↩︎

There are common monads associated with common effects: `Maybe`

for failure, `[]`

(list) for nondeterminism, `State`

for state… What about the continuation monad? We shall see why the answer is all of the above, but better. Indeed, many effects can be understood and implemented in a simple and uniform fashion in terms of first-class continuations.

```
{-# LANGUAGE
InstanceSigs,
RankNTypes #-}
module Continuations where
import Control.Applicative ((<|>))
import Control.Monad (replicateM, when)
import Data.Foldable (for_)
```

The key insight behind continuations is that producing a result in a function is equivalent to calling another function which does the rest of the computation with that result.

In this small starting example, we apply some function `timesThree`

, and compare the result to 10. We will transform this code in continuation-passing style.

```
example1 :: Int -> Bool
example1 x = 10 < timesThree x where
timesThree :: Int -> Int
timesThree x = 3 * x
```

As our first step, following the train of thought above, instead of taking the result of `timesThree`

and doing something (`10 < _`

) with it, let `timesThree`

do that operation directly.

```
example2 :: Int -> Bool
example2 x = timesThree x where
timesThree :: Int -> Bool
timesThree x = 10 < 3 * x
```

Of course, that’s not much of a `timesThree`

function anymore. Moreover, we know how `3 * x`

is going to be used in this case, but that’s quite counter to modularity. Let us generalize `timesThree`

: instead of hard-coding `10 < _`

, we parameterize `timesThree`

by the context in which the result `3 * x`

will be used. That context is called the *continuation* `k`

.

```
example3 :: Int -> Bool
example3 x = timesThree x (\ y -> 10 < y) where
timesThree :: Int -> (Int -> Bool) -> Bool
timesThree x k = k (3 * x)
```

Furthermore, the result of the continuation doesn’t have to be of type `Bool`

; we can generalize the type of `timesThree`

further to also be parameterized by the result type `r`

of the continuation. In the main body where we apply `timesThree`

, `r`

is specialized to the type of the final result, which is `Bool`

.

```
example4 :: Int -> Bool
example4 x = timesThree x (\ y -> 10 < y) where
timesThree :: Int -> (Int -> r) -> r
timesThree x k = k (3 * x)
```

That was continuation-passing style (CPS) in a nutshell.

Functions written in CPS can be composed as follows. Let us refactor the comparison `10 < _`

into another CPS function `greaterThanTen`

. Once the program is entirely written in CPS, the identity function (here `\ z -> z`

) is commonly used as the last continuation, which receives the final result.

```
example5 :: Int -> Bool
example5 x =
timesThree x (\ y ->
greaterThanTen y (\ z ->
z))
where
timesThree :: Int -> (Int -> r) -> r
timesThree x k = k (3 * x)
greaterThanTen :: Int -> (Bool -> r) -> r
greaterThanTen y k = k (10 < y)
```

Hey, this example looks a lot like `do`

notation… Indeed, note how we changed the result type of `timesThree`

from `Int`

to `(Int -> r) -> r`

; that mapping between types `(_ -> r) -> r`

defines a monad.

`Cont`

monad(The descriptions in this section are principally meant to provide context if you’ve never seen the implementation of `Cont`

before, but they may be quite dense. It’s not necessary to follow every single detail to catch the rest, so skipping forward is an option.)

A function of type `((a -> r) -> r)`

takes a continuation `(a -> r)`

and is expected to produce a result `r`

. The obvious way to do that is to apply the continuation to a value `a`

, which is exactly the idea behind continuations given at the beginning. In fact that is also what it means to “return” a value in this monad (`pureCont`

below; the instances are collapsed at the end of this section). As we will soon see, the power of the continuation monad hides in the myriad other ways of using that continuation.

```
newtype Cont r a = Cont ((a -> r) -> r)
-- Eliminate Cont
runCont :: Cont r a -> (a -> r) -> r
runCont (Cont m) = m
-- Use the identity continuation to extract the final result.
evalCont :: Cont a a -> a
evalCont (Cont m) = m id
pureCont :: a -> Cont r a
pureCont a = Cont (\ k -> k a)
```

The *bind* `(>>=)`

of the monad captures the pattern in `example5`

above to compose two CPS functions. We start with a continuation `(k :: b -> r)`

for the whole computation (`Cont r b`

). We first apply `ma`

, with a continuation which takes the result `a`

of `ma`

, and passes it to `mc`

, which in turn produces a `b`

that is just what `k`

wants.

```
bindCont :: Cont r a -> (a -> Cont r b) -> Cont r b
bindCont (Cont ma) mc_ =
Cont (\ k ->
ma (\ a ->
mc a (\ b ->
k b)))
where
mc = runCont . mc_
```

`Functor`

, `Applicative`

, `Monad`

for `Cont`

. ```
instance Functor (Cont r) where
fmap :: (a -> b) -> Cont r a -> Cont r b
fmap f (Cont m) = Cont (\ k -> m (k . f))
instance Applicative (Cont r) where
pure :: a -> Cont r a
pure = pureCont
(<*>) :: Cont r (a -> b) -> Cont r a -> Cont r b
Cont mf <*> Cont ma = Cont (\ k ->
mf (\ f ->
ma (\ a ->
k (f a))))
instance Monad (Cont r) where
(>>=) :: Cont r a -> (a -> Cont r b) -> Cont r b
>>=) = bindCont (
```

We can thus rewrite the example using `do`

-notation for `Cont`

.

```
example6 :: Int -> Bool
example6 x = evalCont $ do
y <- timesThree x
z <- greaterThanTen y
pure z
where
timesThree :: Int -> Cont r Int
timesThree x = Cont (\ k -> k (3 * x))
greaterThanTen :: Int -> Cont r Bool
greaterThanTen y = Cont (\ k -> k (10 < y))
```

Here is another way to look at monadic composition of `Cont`

. If we unfold the definition of `Cont`

, a continuation in the continuation monad, `a -> Cont r b`

, is really a function mapping continuations to continuations, we shall call that a *continuation transformer*: `(b -> r) -> (a -> r)`

. They map “future” continuations to “present” continuations.^{1}

This suggests to take a look at the fish operator, which composes monadic continuations.

```
(>=>) :: Monad m => (a -> m b) -> (b -> m c) -> (a -> m c)
(>=>) :: (a -> Cont r b) -> (b -> Cont r c) -> (a -> Cont r c)
```

Looking at the type of `(>=>)`

:

- unfold the definition of
`Cont r b`

to`(b -> r) -> r`

, - swap the arguments of each function
`a -> (b -> r) -> r`

to`(b -> r) -> (a -> r)`

.

The result shows that sequencing in the `Cont`

monad (with `(>=>)`

) is basically function composition. The function `f >=> g :: a -> Cont r c`

takes a continuation `c -> r`

, passes it to the function `g`

to produce a continuation `b -> r`

, which goes into `f`

to produce a continuation `a -> r`

(note that continuations do flow from right to left in `f >=> g`

).

```
>=>) ::
(a -> Cont r b) -> (b -> Cont r c) -> (a -> Cont r c)
({- 1 -} (a -> (b -> r) -> r) -> (b -> (c -> r) -> r) -> (a -> (c -> r) -> r)
{- 2 -} ((b -> r) -> a -> r) -> ((c -> r) -> b -> r) -> ((c -> r) -> a -> r)
y -> x ) -> (z -> y ) -> (z -> x )
(
.) ::
(y -> x) -> (z -> y) -> (z -> x) (
```

In spite of (or thanks to) its simplicity, the `Cont`

monad is quite versatile. Many kinds of effects can be represented in `Cont`

, all with only the one `Monad`

instance given above, that knows nothing about effects.

In contrast with free monads, which are just waiting to be interpreted, we can define an effect directly by its operations in `Cont`

.

The main idea is to consider what operations we allow on continuations. Here we describe various restrictions through the result type `r`

in `(a -> r) -> r`

, but there may be other ways.

`Identity`

Our starting point was that producing a result is equivalent to calling the continuation. If we add the constraint that the result type `r`

is abstract, so that there are no operations possible on it, then calling the continuation with some argument `a`

is the only option, i.e., we must produce a result `a`

, nothing else. In that case, the continuation monad is isomorphic to the identity monad.

To express the restriction that `r`

is abstract, we can use the `forall`

quantifier. If no operations are possible on `r`

, then `r`

could actually be any type. So `Cont`

computations defined under that restriction are polymorphic in `r`

. We name `Done`

the resulting “specialization” of `Cont`

(if we may call it one), with an isomorphism given by `pure :: a -> Done a`

from the `Applicative`

instance above, and `runDone :: Done a -> a`

below.

```
type Done a = forall r. Cont r a
-- forall r. (a -> r) -> r
runDone :: Done a -> a
runDone (Cont m) = m id
```

`Maybe`

The next interesting case to consider is that the continuation may be dropped. But, `(a -> r) -> r`

must still somehow produce a result of type `r`

. We thus replace `r`

with `Maybe r`

, so that a computation can produce `Nothing`

instead of calling the continuation (`abort`

below). As you might expect, the result is a monad which models computations that can exit early with no output, i.e., a variant of the `Maybe`

monad,

Although the type `Maybe`

appears in this definition, the fact that it is a monad is not used anywhere. In fact, whereas monadic composition `(>>=)`

for `Maybe`

is defined with pattern-matching, there is not a single `case`

in the operations for `Abortable`

defined here. Each constructor is only used once, `Nothing`

when aborting the computation, `Just`

as the final continuation to indicate success. So `Abortable`

does not just imitate `Maybe`

, it is even more efficient!

```
type Abortable a = forall r. Cont (Maybe r) a
-- forall r. (a -> Maybe r) -> Maybe r
abort :: Abortable x
abort = Cont (\ _k -> Nothing)
runAbortable :: Abortable a -> Maybe a
runAbortable (Cont m) = m Just
```

Contrary to `Done`

and `Identity`

, `Abortable`

is not isomorphic to `Maybe`

. Whereas a `Maybe`

computation must decide to be `Just`

or `Nothing`

on the spot, an `Abortable`

is a function `(a -> Maybe r) -> Maybe r`

, which may inspect the continuation before making a decision, even though “intuitively” it’s not supposed to.

Thus we can construct a computation `secondGuess`

which is expected to return a `Bool`

, calls the continuation with `True`

(like `pure True`

would) but backtracks to `False`

if that fails.

```
secondGuess :: Abortable Bool
secondGuess = Cont (\k -> k True <|> k False)
pureTrue :: Abortable Bool
pureTrue = pure True
```

`runAbortable`

maps both `secondGuess`

and `pureTrue`

to `Just True`

, but they behave differently with a continuation which fails on `True`

and succeeds on `False`

.

Nevertheless, it is not possible to construct examples such as `secondGuess`

with only the monad operations and `abort`

; you have to break the `Abortable`

abstraction. In that sense, `Abortable`

is still a practical alternative to the `Maybe`

monad.

`Either`

Naturally, a slight variant of “exit early” is to “exit early with an explicit error”, obtained by replacing `Maybe`

with `Either e`

.

```
type Except e a = forall r. Cont (Either e r) a
-- forall r. (a -> Either e r) -> Either e r
throw :: e -> Except e a
throw e = Cont (\ _k -> Left e)
runExcept :: Except e a -> Either e a
runExcept (Cont m) = m Right
```

`State`

What if the continuation takes an extra parameter, with result type `s -> r`

? Then we may want to call it with different parameters, resulting in a notion of stateful computation.

Remember that the result type `s -> r`

is both the result type of the continuation, and of the whole computation (`(a -> s -> r) -> s -> r`

). The whole computation can just call the continuation (with some value `a`

) to produce a result `s -> r`

, or it can first take the parameter `s`

, and obtain an `r`

by calling the continuation with a different state.

Thus `get`

takes that parameter `s`

, and feeds it twice to the continuation `k :: s -> s -> r`

, keeping the state (second argument) unchanged, but also giving it as the main (first) argument of the subsequent computation. The other function, `put`

ignores that parameter, and calls the continuation `k :: () -> s -> r`

with another state given externally.

```
type State s a = forall r. Cont (s -> r) a
-- forall r. (a -> s -> r) -> s -> r
get :: State s s
get = Cont (\ k s -> k s s)
put :: s -> State s ()
put s = Cont (\ k _s -> k () s)
runState :: State s a -> s -> (a, s)
runState (Cont m) = m (,)
```

That `State`

is isomorphic to the standard definition, `s -> (s, a)`

. Indeed, contrary to `Abortable`

, there is no observation to be made about the continuation `a -> s -> r`

when `r`

is abstract.

As with `Maybe`

/`Either`

, there is no pattern-matching on pairs going on. The `s`

and the `a`

are always just two arguments to the continuation, and a pair gets built up only in the final continuation in `runState`

.

`Writer`

Are we running out of ideas for what to put in `Cont _ a`

?

Above we tried `r`

, `Either e r`

(sums!), `s -> r`

(exponentials!). Surely we should also try products. The result is not quite nice, because to do anything with the pair we have to break the property that was maintained until now: that the continuation `k`

is the last thing we call.

We can `tell`

an element of a monoid, by appending it in front of whatever the rest of the computation outputs.

```
type Writer w a = forall r. Cont (w, r) a
-- forall r. (a -> (w, r)) -> (w, r)
tell :: Monoid w => w -> Writer w ()
tell w = Cont (\ k ->
let (w0, r) = k () in (w <> w0, r))
runWriter :: Monoid w => Writer w a -> (w, a)
runWriter (Cont m) = m (\ a -> (mempty, a))
```

`State`

, reversed`Cont (w, r)`

can also be viewed as a variant of `State`

. Instead of treating `w`

as a monoid, we can let the user update it however they want. However, that update happens *after* the rest of the computation is done, so the last update (in the order they would appear in a `do`

block for example) is applied first to the initial state. This is the *reverse state monad*, where modifications map the future state to the past state.

Getting the current state in the `RState`

monad requires recursion: the current state comes from the future (the continuation), which is asking for the current state itself. With this `rget`

operation, you have to be careful not to introduce any causality loop and accidentally tear down the fabric of reality.

Compare our `RState`

with a more conventional definition of it as `s -> (a, s)`

. There, recursion is used in the definition of `(>>=)`

, while `get`

is trivial, which is a situation opposite to `RState`

.

```
type RState s a = forall r. Cont (s, r) a
-- forall r. (a -> (s, r)) -> (s, r)
rmodify :: (s -> s) -> RState s ()
rmodify f = Cont (\ k ->
let (s, r) = k () in (f s, r))
rget :: RState s s
rget = Cont (\ k ->
let (s, r) = k s in (s, r))
runRState :: RState s a -> s -> (s, a)
runRState (Cont m) s = m (\a -> (s, a))
```

`Tardis`

We can combine `State`

, given by `s -> r`

, and `RState`

, given by `(s, r)`

: instead, if we make the continuation result type `s -> (s, r)`

, we obtain a Tardis monad, with one state going forward in time, and one going backwards.

The forward and backward states don’t actually have to be the same, so we can also generalize `(s -> (s, r))`

into `(fw -> (bw, r))`

.

```
type Tardis bw fw a = forall r. Cont (fw -> (bw, r)) a
-- forall r. (a -> fw -> (bw, r)) -> fw -> (bw, r)
```

`List`

One last standard type we haven’t tried for `r`

is the type of lists. In our previous examples, computations called the continuation only once (or at least they should, we can exclude `secondGuess`

as a degenerate example). Equipping the result type with the structure of lists, we can call a continuation multiple times, and return a combination of all the results.

This provides a model of nondeterministic computations, keeping track of all possible executions, which is the same interpretation as the standard list `[]`

monad.

`decide`

chooses both `True`

and `False`

, i.e., it calls the continuation on both booleans, and concatenates the results together. `vanish`

chooses nothing, it drops the continuation like `abort`

.

```
type List a = forall r. Cont [r] a
-- forall r. (a -> [r]) -> [r]
decide :: List Bool
decide = Cont (\ k -> k True ++ k False)
vanish :: forall a. List a
vanish = Cont (\ _k -> [])
runList :: List a -> [a]
runList (Cont m) = m (\ a -> [a])
```

There’s a handful of variations for that one. Use `NonEmpty r`

to rule out `vanish`

; generalize over an abstract monoid or semigroup `r`

to prevent inspection of the continuation; or use a `Tree r`

to keep track of the order of choices.

```
type List1 a = forall r. Cont (NonEmpty r) a
type List' a = forall r. Monoid r => Cont r a
type List1' a = forall r. Semigroup r => Cont r a
type Tree0 a = forall r. Cont (Tree r) a
```

`ContT`

There is also a continuation monad transformer, which is simply the continuation monad with a monadic result type `m r`

. The *transformers* library defines `ContT`

as a newtype mostly so that it has the right kind to be an instance of `MonadTrans`

. All instances stay the same, so here we will prefer a type synonym to keep our `Monad`

instance count at 1. We will refer to `ContT`

and `Cont`

interchangeably, as we’re not too concerned about kinds in this post, whichever looks better in context.

```
type ContT r m a = Cont (m r) a
-- (a -> m r) -> m r
```

What does it mean that `ContT`

is a monad transformer? There is a `lift`

function, which commutes with monadic operations (that’s called a *monad morphism*). For `ContT`

, `lift`

is simply `(>>=)`

,

```
lift :: Monad m => m a -> ContT r m a
-- Monad m => m a -> (a -> m r) -> m r
lift u = Cont (\ k -> u >>= k)
-- Monad morphism laws:
-- lift (pure a) = pure a
-- lift (u >>= \ a -> k a) = lift u >>= \ a -> lift (k a)
```

`CodensityT`

A closely related sibling is the “codensity” monad transformer, where `r`

is universally quantified, like it is in previous examples. Both `ContT`

and `CodensityT`

can be used to optimize monads^{2} that have expensive *bind* `(>>=)`

operations. We won’t say anything here about the actual differences between `ContT`

and `CodensityT`

.

```
type CodensityT m a = forall r. Cont (m r) a
-- forall r. (a -> m r) -> m r
```

In the examples above, the types we used instead of `r`

happen to be monads, even if we did not rely on that fact. Here’s a quick summary, with the names of the resulting variant of `Cont`

on the left, an equivalent definition in terms of `CodensityT`

in the middle, and their more-or-less standard counterparts on the right as they can be found on Hackage (*base*, *transformers*, *rev-state* and *tardis*). The words “retracts to” mean that there is a surjective but not injective mapping from the left to the right.

```
Done = CodensityT Identity isomorphic to Identity
Abortable = CodensityT Maybe retracts to Maybe
Except e = CodensityT (Either e) retracts to Either e
State s = CodensityT (Reader s) isomorphic to State s
Writer w = CodensityT (Writer w) retracts to Writer w, or (reverse) State w
Tardis s = CodensityT (State s) retracts to Tardis s
List = CodensityT [] retracts to []
```

The monad transformers corresponding to the above monads also find their equivalent in terms of `Cont`

. They are not exactly isomorphic, but a noteworthy feature, as before, is that they still use the same old `Monad`

instance for `Cont`

. Operations do rely on a `Monad`

constraint for the transformed monad `m`

.

`ListT`

Turning the previous examples into monad transformers is left as an exercise for the reader.

Here we will focus on `List`

; it is an interesting case because a monad transformer corresponding to lists is notoriously non-obvious. The obvious candidate `m [a]`

is not a monad (unless `m`

is commutative).

Curiously, we have the “monad” part down for free, and we only need to solve “list” and “transformer”.

We briefly saw earlier that we can get a “list” monad by using any monoid instead of `[r]`

as the result type. We also saw that a monadic result type `m r`

makes a monad transformer. In addition, any monad defines a monoid `m ()`

if we ignore the result (we can also use a different monoid instead of `()`

but that doesn’t seem as interesting), with `pure ()`

as the unit and `(*>)`

(or `(>>)`

) for composition. In fact, we only need an `Applicative`

constraint for the “list” operations, but `lift`

still requires `Monad`

.

We already had all the ingredients to make a *list monad transformer*!

Reading the definition of `ListT`

slowly, it takes a continuation `(a -> m ())`

, and produces a computation `m ()`

. What can it actually do? Mostly, call the continuation with various values of `a`

in some order.

```
type ListT m a = Cont (m ()) a
-- (a -> m ()) -> m ()
decideM :: Applicative m => ListT m Bool
decideM = Cont (\ k -> k True *> k False)
vanishM :: Applicative m => ListT m a
vanishM = Cont (\ _k -> pure ())
runListT :: Applicative m => (a -> m ()) -> ListT m a -> m ()
runListT k (Cont m) = m k
```

The list transformer is a nice pattern for deeply nested loops common in enumeration/search algorithms.

Here are three nested `for_`

loops:

```
-- All 3 bit patterns
threebit :: IO ()
threebit =
for_ [0, 1] $ \ i ->
for_ [0, 1] $ \ j ->
for_ [0, 1] $ \ k ->
printDigits [i, j, k]
printDigits :: [Int] -> IO ()
printDigits ds = do
for_ ds (\i -> putStr (show i))
putStrLn ""
```

Here they are again, where each value is bound using `do`

notation thanks to the list transformer (this combination is really neat: `Cont $ for_ [ ... ]`

).

```
-- All 3 bit patterns
threebit' :: IO ()
threebit' = runListT printDigits $ do
i <- Cont $ for_ [0, 1]
j <- Cont $ for_ [0, 1]
k <- Cont $ for_ [0, 1]
pure [i, j, k]
```

Once iteration is captured in a monad, we can iterate across dimensions:

```
-- All 8 bit patterns
eightbit :: IO ()
eightbit = runListT printDigits $
replicateM 8 (Cont (for_ [0, 1]))
-- 00000000
-- 00000001
-- 00000010
-- 00000011
-- 00000100
-- 00000101
-- 00000110
-- ...
```

All of that is technically possible with just the list monad. The transformer really adds the ability to interleave enumeration and computation.

```
-- All 8 bit patterns, but show only the suffix that changed at every step.
eightbit' :: IO ()
eightbit' = runListT pure $ do
for_ [0 .. 7] $ \ n -> do
i <- Cont $ for_ [0, 1]
lift $ when (i == 1) $ putStr (replicate n ' ')
lift $ putStr (show (i :: Int))
lift $ putStrLn ""
-- 00000000
-- 1
-- 10
-- 1
-- 100
-- 1
-- 10
-- ...
```

This “list monad transformer” is actually different from another incarnation which may be found on Hackage. The more common version of a “list monad transformer” is an “effectful list”, where the list constructors are interleaved with computations.^{3}

`newtype ListT m a = ListT (m (Maybe (a, ListT m a)))`

The biggest difference is that the “effectful list” transformer naturally supports an *uncons* operation, which evaluates the effectful list and pauses after producing the first element (or the empty list).

The trade-off is that *uncons* has a cost in usability. The paused computation must be resumed explicitly: it may be dropped, or resumed more than once. The continuation transformer, by not allowing such “interruptions”, may offer stronger guarantees for resource management.

The continuation monad can thus serve as a uniform foundation for many kinds of monadic effects, and is even often a more efficient replacement of “standard” monads.

“Control operations” might cause some difficulties; those are operations parameterized by computations, such as `catch`

and `bracket`

; they weren’t discussed here, but I think the problems can be overcome.^{4}

- Oleg Kiselyov’s page on Continuations.
*The Mother of all Monads*, by Dan Piponi.*The best refactoring you’ve never heard of*(aka.*Defunctionalize the continuation*), by James Koppel.

This is intimately related to

*predicate transformer semantics*. There were two relevant papers at ICFP this year where continuations play a great role:*A predicate transformer semantics for effects*, by Wouter Swierstra and Tim Baanen, ICFP 2019. (PDF)*Dijkstra monads for all*, by Kenji Maillard et al., ICFP 2019. (arxiv)

*Asymptotic improvement of computations over free monads*, by Janis Voigtländer, MPC 2008. (PDF)↩︎In more than one place:

*logict*,*pipes*,*list-t*,*list-transformer*. In particular,*logict*provides a Church-encoded version of the “effectful list”, which brings it close to the continuation transformer, but there’s still a gap.↩︎For example, there is an instance of the class

`MonadReader`

for`ContT`

, with a problematic operation`local :: MonadReader r m => (r -> r) -> m a -> m a`

. It can be implemented by explicitly restoring the environment of the continuation following`local`

. We also have to restrict`ContT`

values under consideration to a subset of “well-behaved” ones. That would most likely forbid use of`callCC`

or`shift`

/`reset`

, but as we’ve seen throughout this post, there is a lot we can do without those: avoiding insane control operations keeps the`Cont`

monad*reasonable*(*cf.*the title of this post).↩︎

Can we construct an infinite type in Haskell? I don’t mean “infinite” in the sense that it has infinitely many inhabitants, but that the type itself, its name, consists of an infinite tree of type constructors (an equirecursive type, more or less). For example, the following type synonym, if it were legal, would define a type which is an infinite tree of `(->)`

.

`type T = T -> T`

Here’s a first attempt with type families. This compiles:

```
type family F where
F = F -> F
```

This compiles still:

```
x :: F
x = x
```

But surprisingly, this doesn’t:

```
-- Nope.
x :: F
x = undefined
```

GHC tries the impossible task of unfolding `F`

completely to unify the type of `undefined`

with `F`

, and fails after spending too much fuel:

```
• Reduction stack overflow; size = 201
When simplifying the following type: F
Use -freduction-depth=0 to disable this check
(any upper bound you could choose might fail unpredictably with
minor updates to GHC, so disabling the check is recommended if
you're sure that type checking should terminate)
```

So anything beyond `x = x`

has no hope of working.

```
-- Nope.
x :: F
x y = y
```

Another idea is to hide the infinite type in an existential type, which would avoid the problematic equation causing the loop in the type checker.

```
data Some where
MkSome :: a -> Some
```

The goal is to somehow wrap an `x :: a`

such that:

`MkSome x = MkSome (x, x)`

which would implicitly lead to this desired equation on the type of `x`

:

`a ~ (a, a)`

We could try the following:

```
u :: Some
u = MkSome (x, x) where
MkSome x = u
```

But that is rejected. It is also not quite clear where the type of `x`

is bound. Maybe it means this (and that’s the closest to how GHC actually understands it):

```
u :: Some
u = MkSome (x, x) where
x = case u of MkSome x -> x
```

where the type of `x`

is bound by the pattern under the `case`

, and would thus “escape its scope”.

Or maybe it means this, where we pattern-match first to make the existentially quantified type variable available, before wrapping it again in a pair:

```
u :: Some
u = case u of
MkSome x -> MkSome (x, x)
```

Of course, that is not a productive definition.

For completeness, here’s yet another version, but again with the same scoping problem as the first one:

```
u :: Some
u = MkSome (case u of
MkSome x -> (x, x))
```

It appears that *unpacking an existentially quantified type requires computation*. Do it too early and you’re too strict, try to delay it and type variables escape their scope.

This is puzzling, because “existential quantification” should be a matter of types, of abstraction, something to be erased at run time… like `newtype`

!

If we could define an “existential newtype”, whose constructor has no run-time presence, then perhaps this would do the job:

```
newtype Some where -- Pretend this makes sense
MkSome :: a -> Some
u :: Some
u = case u of
MkSome x -> MkSome (x, x)
```

It is straightforward to understand in terms of type erasure, the newtype goes away, so we would be left with this (as we might expect):

`u = (u, u)`

However, it seems awfully difficult to formalize what is going on with explicit types instead, which would otherwise be the obvious way to guarantee type soundness (i.e., “well-typed programs do not go wrong”).

Indeed, newtypes are compiled using coercions. For example, a `case`

expression on the `Identity`

newtype becomes a coercion `runIdentity :: Identity a -> a`

inside the term (you can check it on real examples with the compiler option `-ddump-ds`

(desugarer output)):

```
case v of
Identity y -> f y
-- desugars to --
f (runIdentity v)
```

But if we want to allow newtypes to use existential quantification, there is no obvious type we can give to an “`unMkSome`

” coercion, especially because of the scoping issues we already ran into earlier.^{1}

Does that mean we should throw away the whole notion of “existential newtype”? Of course not. It can be useful to not pay the cost of an extra constructor at run time for existential quantification. The question comes up once in a while: here’s the relevant ticket on GHC’s issue tracker. Although `newtype`

is currently the *de facto* way to do such things, a different solution that would also work for existential types, and which was proposed in that discussion, is to unpack `data`

types with one strict single-field constructor. That proposal is especially nice as it doesn’t add anything new to the language itself, it is purely an optimization.

Because of that, it does not provide an answer to the above puzzle of constructing infinite types, and that’s a relief! That endeavor was *purposefully dubious*: a working solution would suggest that something extremely ~~wrong~~ *interesting* is going on with the type system.

Nevertheless, investigating how such a questionable requirement could be met lead to some surprising interaction between existential types and recursion.

Potential, but clunky, alternatives to model that: dependent types, that can refer explicitly to the hidden type (

`runMkSome :: foreach u : Some . case u of MkSome @a _ -> a`

), or fancier scoping rules that would somehow allow type variables to “temporarily escape their scope” (no idea what that may look like).↩︎

These three versions of `reverse`

are curiously related. The comparison will exemplify the role of laziness and purity in optimizations of functional programs, and the need for better tools to help performance tuning evolve from arcane art to science.

```
{-# LANGUAGE TemplateHaskell #-}
module Reverse where
import Prelude hiding (reverse)
import Test.Inspection (inspect, (===))
```

`reverse`

The *declarative* definition answers the question: *what* is the reverse of a list?

It depends on the list. If it is empty, then its reverse is also the empty list. If it is not empty, `x : xs`

, then the last element of the reverse is the head `x`

, and the rest is the reverse of the tail `xs`

.

```
reverse :: [a] -> [a]
reverse [] = []
reverse (x : xs) = reverse xs ++ [x]
```

Step-by-step example:

```
reverse (1 : 2 : 3 : [])
= reverse (2 : 3 : []) ++ [1]
= (reverse (3 : []) ++ [2]) ++ [1]
= ((reverse [] ++ [3]) ++ [2]) ++ [1]
= (([] ++ [3]) ++ [2]) ++ [1]
... (simplifying (++))
= 3 : 2 : 1 : []
```

Remarking that this function (in particular, the steps skipped at the end of the example above) takes time quadratic in the length of the input list motivates a more operational point of view.

`reverse`

The question for the *imperative* definition is: *how* to construct the reverse of a list?

By answering the *how*, we can be careful to only work for a constant amount of time for each element, so that overall the function works in time linear in the length of the list.

Given a list, we can take its elements off one by one to build up another list next to it, which will be the reverse of the original list.

This solution is also known as the “tail-recursive `reverse`

”.

```
reverse' :: [a] -> [a]
reverse' xs = revApp xs []
-- revApp xs ys = reverse xs ++ ys
revApp :: [a] -> [a] -> [a]
revApp [] acc = acc
revApp (x : xs) acc = revApp xs (x : acc)
```

Step-by-step example:

```
revApp (1 : 2 : 3 : []) []
= revApp (2 : 3 : []) (1 : [])
= revApp (3 : []) (2 : 1 : [])
= revApp [] (3 : 2 : 1 : [])
= (3 : 2 : 1 : [])
```

Another way to arrive at a linear-time `reverse`

is to find the cause behind the slowness of the first version and then fix it.

If `n`

is the length of the list, there is one `n`

factor because of `reverse`

calling itself `n`

times. For each of those times, we apply `(++)`

once, which is the other `n`

factor.

But the cost of `(++)`

is entirely up to the representation of lists. If we pick a representation with constant-time concatenation, then the declarative definition will give a linear-time function. One such representation is *difference lists*.

We first build a small library of difference lists.

```
type DList a = [a] -> [a]
empty :: DList a
singleton :: a -> DList a
(++.) :: DList a -> DList a -> DList a
toList :: DList a -> [a]
```

`DList`

implementation```
type DList a = [a] -> [a]
empty :: DList a
empty = id
singleton :: a -> DList a
singleton y = (y :)
(++.) :: DList a -> DList a -> DList a
++.) = (.)
(
toList :: DList a -> [a]
toList ys = ys []
```

Now, take the declarative definition, and replace list operations with those from that difference list library, and finally convert the result to the standard list representation.

```
-- reverse, where the result is a difference list
reversed :: [a] -> DList a
reversed [] = empty
reversed (x : xs) = reversed xs ++. singleton x
reverse'' :: [a] -> [a]
reverse'' = toList . reversed
```

Here we shall consider an implementation “correct” if it is equivalent to a “reference implementation”, which we’ll arbitrarily elect to be the “declarative” `reverse`

.

The body of the function `reverse''`

is essentially the same as the “reference” `reverse`

, and in that sense, we could say that it is “obviously correct”.

More formally, to prove that `reverse`

and `reverse''`

are equivalent, we can walk through both definition and relate every `DList`

on one side with a list on the other side. Here’s a proof in Coq.

In contrast, to prove directly that the “imperative” `reverse'`

is equivalent to the “declarative” `reverse`

, the invariant is more ad hoc to come by, even though in the `DList`

version, the same invariant turns out to be there too, it’s just hidden by the `DList`

abstraction.

Although their destinations may be the same, it seems worth looking back at the different journeys behind `reverse'`

and `reverse''`

to hopefully learn more general principles which can guide us to write correct and performant programs (that I have yet to figure out, don’t expect to find any clear answers here).

Both `reverse'`

and `reverse''`

evaluate in time which grows linearly with the length of the input list, that was the whole point of the operation.

But there is more to performance than asymptotic complexity. The “imperative” `reverse'`

relies on a tail-recursive auxiliary function `revApp`

, so it can easily be compiled to a simple loop where most of the work goes directly into constructing the reversed list.

The situation with the difference-list based `reverse''`

is less clear. Even if we admit that `(++.)`

has a constant cost, there seems to be a fair amount of overhead compared to “imperative” `reverse'`

: it is not tail-recursive, and it builds up a long chain of functions which is applied only at the end. As cheap as function calls may be, it seems quite hopeless to shed all of that weight to catch up to `reverse'`

.

And yet, simply by making explicit the second argument of `reversed`

, with a bit of rewriting, `reversed`

transforms into `revApp`

(the meat of “imperative” `reverse'`

):

```
reversed :: [a] -> DList a
revApp :: [a] -> [a] -> [a] -- same type
-- same definition as revApp
reversed [] zs = zs
reversed (y : ys) zs = reversed ys (y : zs)
```

```
reversed [] zs
= empty zs
= zs
reversed (y : ys) zs
= (reversed ys ++. singleton y) zs
= reversed ys (singleton y zs)
= reversed ys (y : zs)
```

Thanks to that, the glorious GHC compiles both the “declarative” `reverse''`

and the “imperative” `reverse'`

to identical Core terms.

Haskell is actually in a quite privileged position here: to a certain extent, such an optimization is enabled by laziness and purity, the two main distinguishing features of Haskell.

To see why, take another look at this equation in the definition of `reversed`

(in `reverse''`

):

`reversed (y : ys) = reversed ys ++. singleton y`

`(++.)`

is merely function composition `(.)`

, so we would like to rewrite that as follows:

`reversed (y : ys) = \zs -> reversed ys (singleton y zs)`

But in an eagerly-evaluated language, that transformation delays the evaluation of `reversed ys`

, which is valid only if it *terminates with no side effects*. That may be true in this case, but how many actual compilers do infer that information?^{1}

That reflects what the haskell.org site says about laziness:

Functions don’t evaluate their arguments. This means that programs can compose together very well, with the ability to write control constructs (such as if/else) just by writing normal functions. The purity of Haskell code makes it easy to fuse chains of functions together, allowing for performance benefits.

Laziness and purity make a pretty broad double-edged sword to optimize functional programs. It allows writing performant programs using high-level abstractions, however, the cost model is notoriously hard to grasp, especially beyond asymptotics. We *can* build fast programs in Haskell, but that alone is not good enough. What does it take to do so *reliably*?

Inlining and partial evaluation seem to inherently make the cost model non-compositional, so that we have to know what the code generated by the compiler looks like. But that’s quite tedious, so there should be tools to assist us in spotting patterns of efficient and inefficient code. One such tool (the only one I am aware of in my limited knowledge) is the *inspection-testing* library.

For example, here is a test that the fast “declarative” `reverse''`

and the “imperative” `reverse'`

are compiled to the same Core terms:

`inspect $ 'reverse'' === 'reverse'`

When we compile this file (with optimizations), we get the following output confirming our claim:

```
posts/2019-09-13-reverse.md:278:1: reverse'' === reverse' passed.
inspection testing successful
expected successes: 1
```

Of course, we can write that test here because we happen to have two functions which should compile to the same Core. It works well for unit-testing metaprograms (programs which generate programs), but it’s ill-suited to test the optimization of application-level code. *inspection-testing* offers a few other properties which are correlated with “well optimized” in appropriate situations, and there are definitely many more left to discover.

*A novel representation of lists and its application to the function*, by John Hughes, 1984. (PDF)`reverse`

*Why functional programming matters*, by John Hughes, 1990. (PDF)The many faces of

`isOrderedTree`

, talk by Joachim Breitner, MuniHac 2019.The

*dlist*library on Hackage (the README contains many links about differences lists).

In OCaml we can also construct infinite, cyclic lists:

`let rec xs = 1 :: xs`

. That makes an applicable requirement for this code transformation even more complicated to describe.↩︎