Heterogeneous lists with dependent types in Haskell
With the help of GHC’s many extensions, we can encode simple forms of dependent types, allowing us to enforce more expressive invariants in our programs at compile time.
In this post, I will walk you through an implementation of heterogeneous lists to demonstrate the features and techniques commonly associated with the current flavor of “dependent Haskell”.
My Literate Haskell blog posts render the whole source, including the extensions. In this post, I will mention most of them when they are used, and explain them a bit.
Extensions and imports for this Literate Haskell file
{-# LANGUAGE
AllowAmbiguousTypes,
ConstraintKinds,
DataKinds,
FlexibleContexts,
FlexibleInstances,
GADTs,
InstanceSigs,
MultiParamTypeClasses,
PolyKinds,
RankNTypes,
ScopedTypeVariables,
TypeApplications,
TypeFamilies,
TypeOperators,
UndecidableInstances #-}
module HLists where
As for those that won’t be mentioned later, let’s take care of them now briefly.
InstanceSigs
allows type signatures to appear in instance declarations, so that we don’t need to go back to the type class to remember what is being defined. This is also consistent with the common practice of annotating all toplevel bindings.ConstraintKinds
,FlexibleContexts
,FlexibleInstances
,ScopedTypeVariables
andUndecidableInstances
lift restrictions that are mostly historical. They are quite benign extensions that might be unnoticeable when enabled if you don’t already know about them.
The GHC user guide also documents all the available extensions.
A type for heterogeneous lists
We will construct lists using the infix pair type below as a cons, and the unit
type ()
as a nil.
The infix type constructor (:*)
is allowed by the TypeOperators
extension
(not to be confused with the identically-named data constructor (:*)
,
which is allowed by the standard).
data a :* b = a :* b deriving Show
infixr 1 :*
Another common candidate for heterogeneous lists is a GADT indexed by a list of
types, but I prefer (:*)
for its simplicity.
Here is an example of a list of three elements:
hlistExample1 :: Int :* String :* String :* ()
hlistExample1 = 1 :* "One" :* "1" :* ()
You might wonder, why even have a nil, since there can be any type to the right
of (:*)
? This is clearly shorter:
hlistExample2 :: Int :* String :* String
hlistExample2 = 1 :* "One" :* "1"
The problem is that it introduces an odd corner case, where the last element of
a list can never be a heterogeneous list, since (:*)
will be interpreted as a
tail of the list.
Is the above list a list of three elements, or is it a list of two (one Int
and one String :* String
)?
Such a choice would be reflected in the rest of the implementation as an
unnecessary complication compared to having lists be clearly terminated by
()
.
Hence, it is possible to accidentally construct these ill-formed lists, but something will break at compile time anyway, so that’s okay.
Task: get an element by type
We will implement a function to extract an element of a given type in the list.
It is declared as part of the following two-parameter type class.
This requires the MultiParamTypeClasses
extension.
class Get a bs where
get :: bs -> a
For example, we can get the string out of this triple as follows:
getExample1 :: String
getExample1 =
get @String ((1 :: Int) :* "One" :* "1" :* ())
-- "One"
In the simple examples of this post, we can always infer the output type of
get
from the context where it is applied.
But in general, an explicit annotation will often be necessary, and that is
nicely done with the TypeApplications
extension as shown in the above example.
get
has two type arguments, in the order of the type class arguments.1
Here we explicitly apply it to its first type argument, which is the type
of the element we want to extract.
We’ll go through three progressively more elaborate implementations for Get
.
To avoid conflicts between these attempts, we will declare a new copy of that
class at the beginning of each section.
Conventional approach
class Get0 a bs where
get0 :: bs -> a
The simplest solution is to write two overlapping instances.
First, when the type of the head of the list doesn’t match, we keep looking in
the tail, calling get0
recursively.
instance Get0 a bs => Get0 a (b :* bs) where
get0 :: (b :* bs) -> a
get0 (_ :* ys) = (get0 :: bs -> a) ys
Second, when the types match, we return the head of the list.
instance {-# OVERLAPPING #-} Get0 a (a :* bs) where
get0 :: (a :* bs) -> a
get0 (x :* _) = x
By default, overlapping instances are forbidden: often enough, they are caused
by a mistake.
We indicate that the overlap is intentional using the OVERLAPPING
pragma on
the more specific instance.2
The instance (Get0 a (a :* bs))
is more specific than (Get0 a (b :* bs))
because we can substitute variables in the latter to obtain the former: replace
b
with a
.
Before going in more details, let’s give this implementation a try. We can reproduce the above example successfully:
get0Example1 :: String
get0Example1 =
get0 @String ((1 :: Int) :* "One" :* "1" :* ())
-- "One"
What about this more abstract operation: get the third element of any sufficiently long list.
getThird0 :: (a :* b :* c :* ds) -> c
getThird0 = get0
-- Type error!
The constraint solver gets stuck on solving Get0 c (a :* b :* c :* ds)
because it cannot determine whether a
is equal to c
,
so it can’t choose one of the above instances.
If instances are allowed to overlap, which instance should it select?
I won’t go into the rationale here, but the rule is to always prefer the most
specific instance, and only pick a less specific one when there is no way to
instantiate type variables in the environment to make the other more specific
instance(s) match.
In this case, since a
can be made equal to c
, the OVERLAPPING
instance
can still potentially match, hence the error.
The GHC user guide actually has a section on overlapping instances which is a pretty good reference about this algorithm if you can get past the jargon.
One way out is to add the unresolved constraint to the signature of
getThird0
, and let it be resolved at use sites with concrete types.
getThird0
:: Get0 c (a :* b :* c :* ds)
=> (a :* b :* c :* ds)
-> c
getThird0 = get0
However, if c
is equal to a
or b
, this function won’t be doing what its
name and type advertise!
-- a, b, c all equal to String, ds ~ (String :* ())
get0Example2 :: String
get0Example2 =
getThird0 ("One" :* "Two" :* "Three" :* "Four" :* ())
-- "One"
-- We would like "Three"
This problem is of the kind that has many solutions, and the best one would really
depend on the purpose of this get
function.
But the one I want to share here is to write the constraint that c
is not
equal to a
or b
. This makes the type of getThird
less general to allow
the Get
constraint above to be solved at the definition of getThird
.
Type disequality constraints
class Get1 a bs where
get1 :: bs -> a
Immediately, we face an obstacle: there is an equality constraint (~)
, but no
primitive disequality constraint “(/~)
”, or any general way to negate a
constraint (such a thing would be incompatible with the open-world principle
of type classes).
Fortunately, we can define equality differently: as a boolean-valued
closed TypeFamily
.
(Are we playing language extensions bingo?)
The type-level function (==)
has two clauses, it is equal to True
if the
operands are equal, False
otherwise.
type family a == b :: Bool where
a == a = 'True
a == b = 'False
Note that type families allow nonlinear pattern-matching: the first clause
matches twice on the same variable. This is not allowed at the term level,
because there is no universal way of comparing values, especially
infinite values, functions, and values of abstract types such as IO
.3
But at the type level, it is possible.
Data constructors such as True
and False
are allowed at (or promoted to)
the type level by the DataKinds
extension.
They are disambiguated from type constructors using the single-quote prefix,
since they can use the same names even in a single module.
DataKinds
is the kind4 of extension that is so intuitive one doesn’t
even notice when they’re using it.
Now we can match on the outcome of that comparison operation.
The constraint 'False ~ (a == b)
holds if and only if a
and b
are not
equal. Conversely, one could also write 'True ~ (a == b)
in the case of an
equality, but the more common a ~ b
works as well, if not better.5
Note that a ~ b
implies 'True ~ (a == b)
, but the type checker is
not aware of the converse! From the type checker’s point of view, (==)
is a
type-level function like any other, and it doesn’t make any particular effort
to reason about its output.
We introduce a new type class GetIf
, with one more parameter to carry the
comparison result.
The superclass constraint is a minor safety net, ensuring there’s only one way
to use that type class: we must set the boolean aeqb
to a == b
,
and instances must also satisfy that constraint.
class (aeqb ~ (a == b)) => GetIf aeqb a b bs where
getIf :: (b :* bs) -> a
We have two cases, depending on the value of the boolean aeqb
.
Either a
is equal to b
, then we get the head of the list.
instance (a ~ b) => GetIf 'True a b bs where
getIf :: (a :* bs) -> a
getIf (x :* _) = x
Or a
is not equal to b
, then we keep looking in the tail. Since instances
must satisfy their superclass constraints, we must add 'False ~ (a == b)
.
instance ('False ~ (a == b), Get1 a bs)
=> GetIf 'False a b bs where
getIf :: (b :* bs) -> a
getIf (_ :* ys) = get1 ys
Finally, Get1
and GetIf
are connected by this instance of Get1
which immediately defers to GetIf
.
instance GetIf (a == b) a b bs => Get1 a (b :* bs) where
get1 = getIf
The main benefit of this technique over the previous one is that it avoids
overlapping instances, which can be difficult to understand.
The overlap still exists, as part of the definition of (==)
, but that is
contained in a general and reusable construct, and the rule of matching clauses
sequentially for closed type families may be more intuitive.6
For clarity, we can define a “class synonym” for type disequality. For constraints, the advantage of this method over type synonyms is that type synonyms can’t be partially applied, while “class synonyms” can.
class ('False ~ (a == b)) => a /~ b
instance ('False ~ (a == b)) => a /~ b
We can thus give getThird
the following type, ensuring that the c
we
get is indeed the third element in the list.
getThird1 :: (c /~ a, c /~ b) => (a :* b :* c :* ds) -> c
getThird1 = get1
Now using getThird1
with types for the first or second element equal to the third
results in a type error.
Whether that is good or bad depends on what you actually want to do with those
heterogeneous lists.
getThird1Example :: String
getThird1Example =
getThird1 @String ("One" :* "Two" :* "Three" :* "Four" :* ())
-- Type error!
Type-level conditionals
One last issue to address about that solution is that it is quite cumbersome
to implement.
We have to define a new type class, and the original class does nothing but
call the new one immediately.
Conceptually, all that work is only to match on a boolean, one of the simplest
data types (or, data kinds?).
We’d like a similarly simple abstraction to do that.
How about if
? We can certainly make it a type family7:
type family If (b :: Bool) (c :: k) (d :: k) :: k where
If 'True c _ = c
If 'False _ d = d
This family is polykinded, which is just another name for polymorphism
of type-level constructs: If :: forall k. Bool -> k -> k -> k
.
(PolyKinds
extension; kinds are to types as types are to terms.)
It will take a bit of setup to be able to use If
, so let’s put it aside for
the moment.
We also need a type class to pattern match on the type-level boolean and decide what to do at run time, as opposed to type families, which only live at compile time.
Here’s a first attempt.
class IsBool0 b where
_If0 :: forall r. r -> r -> r
instance IsBool0 'True where
_If0 x _ = x
instance IsBool0 'False where
_If0 _ y = y
Note that b
does not appear in the type of _If0
, meaning that b
cannot be
inferred from the context where _If0
is applied.
Historically, this would always result in ambiguity errors, thus, such a type
was forbidden, forcing us to resolve the ambiguity by adding a trivial argument
or tagging the function somehow.
The AllowAmbiguousTypes
extension tells the compiler to accept definitions
like _If0
above, keeping signatures tidy;
the ambiguity can be resolved at use sites with the TypeApplications
extension, as it makes type arguments explicit (corresponding to those
variables under forall
).
Some examples of _If0
in action:
_If0Examples :: [String]
_If0Examples =
_If0 @'True "This" "That"
[ -- "This"
_If0 @'False "This" "That"
, -- "That"
_If0 @(Int == Bool) "Int is equal to Bool" "Int is not equal to Bool"
, -- "Int is not equal to Bool"
]
However we will have trouble implementing Get
.
To see why, below is what the final instance will roughly look like.
There are two cases as always.
If the type a
we are looking for is equal to b
, then we return the head y
of the list, otherwise we keep looking in the tail ys
.
instance (???) => Get a (b :* bs) where
get :: b :* bs -> a
get (y :* ys) =
_If @(a == b)
y -- When (a == b) ~ 'True, i.e., a ~ b
get ys) -- When (a == b) ~ 'False, i.e., a /~ b (
The two branches of _If
make assumptions that the type checker (currently)
rejects.
In the True
branch, we return y :: b
when the expected type is a
,
this assumes that a
is equal to b
.
In the False
branch, we call get
on the tail, which requires a
Get a bs
constraint.
We could add it to the context ((???)
above), but that would imply a
traversal of the whole type list, whereas the behavior of the previous
implementations was to stop as soon as we found the element we are looking for.
One way to view this problem is that the type of _If
doesn’t encode the
condition that the boolean is True
in one branch, and False
in the other.
It turns out we can apply that intuition quite literally in order to fix _If
:
class IsBool b where
_If
:: forall r
. (('True ~ b) => r)
-> (('False ~ b) => r)
-> r
The RankNTypes
extension allows quantifiers and constraints to occur
in a type signature in more locations than only the beginning.8
Thus, the two arguments of _If
are now parameterized by constraints.
The function _If
can only use its first argument if b
is True
, and its
second argument if b
is False
, whereas those arguments can use the
constraints to deduce other useful facts locally.
Interestingly, there is thus exactly one way to write each of the two IsBool
instances:
instance IsBool 'True where
_If :: forall r. r -> (('False ~ 'True) => r) -> r
_If x _ = x
instance IsBool 'False where
_If :: forall r. (('True ~ 'False) => r) -> r -> r
_If _ y = y
The last piece of the puzzle is to complete the (???)
context in the Get
instance sketched above.
_If
requires anIsBool
instance in any case.We shall use the
If
type family we defined previously to construct a context which depends on the value of(a == b)
.
In each branch of _If
, (a == b)
is assumed to be either True
or False
,
which allows the compiler to evaluate If (a == b) _ _
, uncovering the
constraint that each case exactly needs, (a ~ b)
in one, (Get a bs)
in the
other.
We define a type synonym If'
to collapse the IsBool
and If
constraints
together.
type If' b c d = (IsBool b, If b c d)
instance If' (a == b) (a ~ b) (Get a bs)
=> Get a (b :* bs) where
get :: b :* bs -> a
get (y :* ys) =
_If @(a == b)
y -- (If 'True (a ~ b) _) becomes (a ~ b)
get ys) -- (If 'False _ (Get a bs)) becomes (Get a bs) (
We’re now reaching the end of this heterogeneous list example.
From the outside, get
should behave identically to get1
,
but the definition of get
was refactored to a single instance.
Do you find that easier to read than the previous versions?
Appendix: Dependently typed programming with singletons
A central reference in Haskell’s dependently-typed landscape is
the singletons
library.
One common point between this blog post and singletons is that, in fact,
the class IsBool
is the Church encoding of the Bool
singleton.
A singleton is a type with a single inhabitant. A useful class of singletons
can be defined as Generalized Algebraic Data Types (GADTs
extension).
GADTs are types whose parameters depend on the constructors, and vice versa.
The parameter b
of SBool
below is tied to the constructor:
STrue
is the unique inhabitant of SBool 'True
and SFalse
is the unique
inhabitant of SBool 'False
. The point of such a type is that
by pattern matching on an SBool b
, we can refine the value of b
in each branch, similarly to the _If
combinator.
data SBool (b :: Bool) where
STrue :: SBool 'True
SFalse :: SBool 'False
There is an isomorphism between SBool
and IsBool
instances, that we can
witness by desugaring IsBool
instances to the polymorphic functions
that they contain.
-- Desugar IsBool by unwrapping _If
type IsBool_ b
= forall r
. (('True ~ b) => r)
-> (('False ~ b) => r)
-> r
_SBool_to_IsBool :: forall b. SBool b -> IsBool_ b
_SBool_to_IsBool sb = case sb of
STrue -> (\x _ -> x)
SFalse -> (\_ y -> y)
_IsBool_to_SBool :: forall b. IsBool_ b -> SBool b
_IsBool_to_SBool _If = _If STrue SFalse
It is easy to check that _IsBool_to_SBool . _SBool_to_IsBool = id
.
The converse _SBool_to_IsBool . _IsBool_to_SBool = id
is also true, but
the proof relies on a more involved argument (parametricity, free theorems).
The “Church encoding” claim above can be made more explicit with the
following equivalent9 reformulation of SBool
. This also shows another
characteristic of GADTs: constructors can be given arbitrary function types,
especially polymorphic functions with constraints, as long as the result
matches the type being defined. The function constraints and arguments
become constructor fields.
data SBool' (b :: Bool) where
STrue' :: ('True ~ b) => SBool' b
SFalse' :: ('False ~ b) => SBool' b
It should be compared to the Either
data type (which can be defined using
GADT notation to make explicit the types of its constructors)
and its Church encoding.
data Either a b where
Left :: a -> Either a b
Right :: b -> Either a b
type Either_ a b = forall r. (a -> r) -> (b -> r) -> r
On the same topic
- Dependent types in Haskell, talk by Stephanie Weirich.
- Practical dependent types in Haskell, blog post by Justin Le.
- Decidable propositional equality in Haskell,
blog post by Richard Eisenberg, about unequality constraints (
(a == b) ~ 'False
). - Dependently typed programming with singletons, by Richard Eisenberg and Stephanie Weirich (Haskell Symposium 2012).
- Strongly typed heterogeneous collections (PDF), by Oleg Kiselyov, Ralf Lämmel, and Keean Schupke (Haskell Workshop 2004, extended version).
I don’t know whether they can be reordered without changing the order of the class arguments.↩︎
There are two more variants,
OVERLAPPABLE
andOVERLAPS
. You can read more about them in the GHC user guide.↩︎We could imagine using an
Eq
constraint to implement nonlinear pattern-matching; I haven’t really considered the implications.↩︎The fact that “type”, “kind” and “sort” are commonly used in this area of programming languages makes it sometimes difficult to talk about things.↩︎
That type family is also defined in
base
:Data.Type.Equality.(==)
. However, its definition has recently changed (sincebase 4.11
, i.e., GHC 8.4). I don’t like that it is not reflexive anymore, but I haven’t gathered any concrete data about the pros and cons yet. (Previous version, changelog (Ctrl-F “Equality”).)↩︎It’s a bit more complex than that in general, for example
Data.Type.Bool
defines the type-level boolean(&&)
and(||)
in a way that simplifies ideally:True && a = a = a && True
.↩︎Data.Type.Bool.If
inbase
.↩︎Both
RankNTypes
andScopedTypeVariables
enable theforall
quantifier/keyword, that allows us to explicitly bind type variables. Otherwise, the quantifier gets inserted implicitly at the beginning of a type signature.↩︎Up to internal boxing shenanigans.↩︎