Where does the name "algebraic data type" come from?

Posted on July 26, 2024 — 2600 words (13 minutes)

“Algebraic data types” is a beloved feature of programming languages with such a mysterious name. Where does this name come from?

There are two main episodes in this saga: Hope and Miranda. The primary conclusion is that the name comes from universal algebra, whereas another common interpretation of “algebraic” as a reference to “sums of products” is not historically accurate. We drive the point home with Clear. CLU is extra.

Disclaimer: I’m no historian and I’m nowhere as old as these languages to have any first-hand perspective. Corrections and suggestions for additional information are welcome.

Hope (1980)
Miranda (1985)
Sums of products?
Clear (1979)
CLU (1977)

Hope (1980)

Algebraic data types were at first simply called “data types”. This programming language feature is commonly attributed to Hope, an experimental applicative language by Rod Burstall et al.. Here is the relevant excerpt from the paper, illustrating its concrete syntax:

A data declaration is used to introduce a new data type along with the data constructors which create elements of that type. For example, the data declaration for natural numbers would be:
data num == 0 ++ succ(num)
(…) To define a type ‘tree of numbers’, we could say
data numtree == empty ++ tip(num)
                      ++ node(numtree#numtree)
(The sign # gives the cartesian product of types). One of the elements of numtree is:
node(tip(succ(0)),
     node(tip(succ(succ(0))), tip(0)))
But we would like to have trees of lists and trees of trees as well, without having to redefine them all separately. So we declare a type variable
typevar alpha
which when used in a type expression denotes any type (including second- and higher-order types). A general definition of tree as a parametric type is now possible:
data tree(alpha) == empty ++ tip(alpha)
                          ++ node(tree(alpha)#tree(alpha))
Now tree is not a type but a unary type constructor – the type numtree can be dispensed with in favour of tree(num).

Pattern matching in Hope is done in multi-clause function declarations or multi-clause lambdas. There was no case expression.

reverse(nil) <= nil
reverse(a::l) <= reverse(l) <> [a]

lambda true, p => p
     | false, p => false

As far as I can tell, other early programming languages cite Hope or one of its descendants as their inspiration for data types. There is a slightly earlier appearance in NPL by Darlington and the same Burstall, but I couldn’t find a source describing the language or any samples of data type declarations. Given the proximity, it seems reasonable to consider them the same language to a large extent. This paper by Burstall and Darlington (1977) seems to be using NPL in its examples, but data types are only introduced informally; see on page 62 (page 19 of the PDF):

We need a data type atom, from which we derive a data type tree, using constructor functions tip to indicate a tip and tree to combine two subtrees
tip : atoms → trees
tree : trees x trees → trees
We also need lists of atoms and of trees, so for any type alpha let
nil : alpha-lists
cons : alphas x alpha-lists → alpha-lists

Hope inspired ML (OCaml’s grandpa) to adopt data types. In Standard ML:

datatype 'a option = Nothing | Some of 'a

Before it became Standard, ML started out as the “tactic language” of the LCF proof assistant by Robin Milner, and early versions did not feature data types (see the first version of Edinburgh LCF). it’s unclear when data types were added exactly, but The Definition of Standard ML by Milner et al. credits Hope for it (in Appendix F: The Development of ML):

Two movements led to the re-design of ML. One was the work of Rod Burstall and his group on specifications, crystallised in the specification language Clear and in the functional programming language Hope; the latter was for expressing executable specifications. The outcome of this work which is relevant here was twofold. First, there were elegant programming features in Hope, particularly pattern matching and clausal function definitions; second, there were ideas on modular construction of specifications, using signatures in the interfaces. A smaller but significant movement was by Luca Cardelli, who extended the data-type repertoire in ML by adding named records and variant types.

Miranda (1985)

“Data types” as a programming language feature appeared in Hope, but its first mention under the name “algebraic data types” that I could find is in Miranda: a non-strict functional language with polymorphic types by David Turner in 1985:

Algebraic data types

The basic method of introducing a new concrete data type, as in a number of other languages, is to declare a free algebra. In Miranda this is done by an equation using the symbol ::=,
tree ::= Niltree | Node num tree tree
being a typical example. (…) The idea of using free algebras to define data types has a long and respectable history [Landin 64], [Burstall 69], [Hoare 75]. We call it a free algebra, because there are no associated laws, such as a law equating a tree with its mirror image. Two trees are equal only if they are constructed in exactly the same way.

In case you aren’t aware, Miranda is a direct precursor of Haskell. A minor similarity with Haskell that we can see here is that data constructors are curried in Miranda, unlike in Hope and ML. Another distinguishing feature of Miranda is laziness. See also A History of Haskell: being lazy with class.

Below are links to the articles cited in the quote above. The first [Landin 64] doesn’t explicitly talk about algebra in this sense, while [Burstall 69] and [Hoare 75] refer to “word algebra” rather than “free algebra” to describe the same structure, without putting “algebra” in the same phrase as “type” yet.

The mechanical evaluation of expression, by Peter Landin (1964)
Proving properties by structural induction, by Rod Burstall (1969)
Recursive data structures by Tony Hoare (1975)

Hoare’s paper contains some futuristic pseudocode in particular:

A possible notation for such a type definition was suggested by Knuth; it is a mixture of BNF (the | symbol) and the PASCAL definition of a type by enumeration:
type proposition = (prop (letter) | neg (proposition) |
                   conj, disj (proposition, proposition));
(…) In defining operations on a data structure, it is usually necessary to enquire which of the various forms the structure takes, and what are its components. For this, I suggest an elegant notation which has been implemented by Fred McBride in his pattern-matching LISP. Consider for example a function intended to count the number of &s contained in a proposition. (…)
function andcount (p: proposition): integer;
  andcount := cases p of
    (prop(c) → 0|
     neg(q) → andcount(q)|
     conj(q,r) → andcount(q) + andcount(r)+1|
     disj(q,r) → andcount(q) + andcount(r));

Fred McBride’s pattern-matching LISP is the topic of his PhD dissertation. There is not enough room on this page to write about the groundbreaking history of LISP.

Unfree algebras in Miranda

If algebraic data types are “free algebras”, one may naturally wonder whether “unfree algebras” have a role to play. Miranda allows quotienting data type definitions by equations (“laws” or “rewrite rules”). You could then define the integers like this, with a constructor to decrement numbers, and equations to reduce integers to a canonical representation:

int ::= Zero | Suc int | Pred int
Suc (Pred n) => n
Pred (Suc n) => n

In hindsight this is superfluous, but it’s fun to see this kind of old experiments in programming languages. The modern equivalent in Haskell would be to hide the data constructors and expose smart constructors instead. There are uses for quotient types in proof assistants and dependently typed languages, but they work quite differently.

Sums of products?

There is another folklore interpretation of “algebraic” in “algebraic data types” as referring to “sums of products”.

It’s not an uncommon interpretation. In fact, trying to find a source for this folklore is what got me going on this whole adventure. The Wikipedia article on algebraic data types at the time of writing doesn’t outright say it, but it does refer to sums and products several times while making no mention of free algebras. Some [citation needed] tags should be sprinkled around. The Talk page of that article contains an unresolved discussion of this issue, with links to a highly upvoted SO answer and another one whose references don’t provide first-hand account of the origins of the term. For sure, following that idea leads to some fun combinatorics, like differentiation on data types, but that doesn’t seem to have been the original meaning of “algebraic data types”.

That interpretation might have been in some people’s mind in the 70s and 80s, even if only as a funny coincidence, but I haven’t found any written evidence of it except maybe this one sentence in a later paper, Some history of programming languages by David Turner (2012):

The ISWIM paper also has the first appearance of algebraic type definitions used to define structures. This is done in words, but the sum-of-products idea is clearly there.

It’s only a “maybe” because while the phrase “algebraic type” undeniably refers to sums of products, it’s not clear that the adjective “algebraic” specifically is meant to be associated with “sum-of-products” in that sentence. We could replace “algebraic type” with “data type” without changing the meaning of the sentence.

Clear (1979)

In contrast, free algebras—or initial algebras as one might prefer to call them—are a concept from the areas of universal algebra and category theory with a well-established history in programming language theory by the time algebraic data types came around, with influential contributions by a certain ADJ group; see for example Initial algebra semantics and continuous algebras.

Ironically, much related work focused on the other ADT, “abstract data types”. Using universal algebra as a foundation, a variety of “specification languages” have been designed for defining algebraic structures, notably the OBJ family of languages created by Joseph Goguen (a member of the aforementioned ADJ group) and others, and the Clear language by Rod Burstall (of Hope fame) and Joseph Goguen. Details of the latter can be found in The Semantics of Clear, a specification language. (You may remember seeing a mention of Clear earlier in the quote from The Definition of Standard ML.)

Example theories in Clear

Here is the theory of monoids in Clear. It consists of one sort named carrier, an element (a nullary operation) named empty and a binary operation append.

constant Monoid = theory
                      sorts carrier
                      opns empty : carrier
                           append : carrier,carrier -> carrier
                      eqns all x: carrier . append(x,empty) = x
                           all x: carrier . append(empty,x) = x
                           all x,y,z: carrier . append(append(x,y),z) = append(x,append(y,z))
                  endth

A theory is an interface. Its implementations are called algebras. In that example, the algebras of “the theory of monoids” are exactly monoids.

In every theory, there is an initial algebra obtained by turning the operations into constructors (or “uninterpreted operations”), equating elements (which are trees of constructors) modulo the equations of the theory. For the example above, the initial monoid is a singleton monoid, with only an empty element (all occurrences of append are simplified away by the two equations for empty), which is not very interesting. Better examples are those corresponding to the usual data types.

The booleans can be defined as the initial algebra of the theory with one sort (truthvalue) and two values of that sort, true and false.

constant Bool = theory data
                    sorts truthvalue
                    opns true,false: truthvalue
                endth

In Clear, the initial algebra is specified by adding the data keyword to a theory. In the semantics of Clear, rather than thinking in terms of a specific algebra, a “data theory” is still a theory (an interface), with additional constraints that encode “initiality”, so the only possible algebra (implementation) is the initial one. My guess as to why the concept of data theory is set up that way is that it allows plain theories and data theories to be combined seamlessly.

The natural numbers are the initial algebra of zero and succ:

constant Nat = theory data
                   sorts nat
                   opns zero: nat
                        succ: nat -> nat
               endth

At this point, the connection between “data theories” in Clear and data types in Hope and subsequent languages is hopefully clear.

More substantial examples in Clear

Theories can be extended into bigger theories with new sorts, operations, and equations. Here is an extended theory of booleans with two additional operations not, and, and their equations. This should demonstrate that, beyond the usual mathematical structures, we can define non-trivial operations in this language:

constant Bool1 = enrich Bool by
                     opns not: truthvalue -> truthvalue
                          and: truthvalue,truthvalue -> truthvalue
                     eqns all . not true = false
                          all . not false = true
                          all p: truthvalue . and(false, p) = false
                          all p: truthvalue . and(true, p) = p
                 enden

Initial algebras are also called free algebras, but that gets confusing because “free” is an overloaded word. Earlier for instance, you might have expected the initial monoid, or “free monoid”, to be the monoid of lists. The monoid of lists is the initial algebra in a slightly different theory: the theory of monoids with an embedding from a fixed set of elements A.

We might formalize it as follows in Clear. The theory List is parameterized by an algebra A of the theory Set, and its body is the same as Monoid, except that we renamed carrier to list, we added an embed operation, and we added the data keyword to restrict that theory to its initial algebra.

constant Set = theory sorts element endth
procedure List(A : Set) = theory data
                              sorts list
                              opns empty : list
                                   append : list,list -> list
                                   embed : element of A -> list
                              eqns all x: list . append(x,empty) = x
                                   all x: list . append(empty,x) = x
                                   all x,y,z: list . append(append(x,y),z) = append(x,append(y,z))
                          endth

One may certainly see a resemblance between theories in Clear, modules in ML, and object-oriented classes. It’s always funny to find overlaps between the worlds of functional and object-oriented programming.

CLU (1977)

CLU is a programming language created at MIT by Barbara Liskov and her students in the course of their work on data abstraction.

It features tagged union types, which are called “oneof types”. (Source: CLU Reference Manual by Barbara Liskov et al. (1979).)

T = oneof[empty:       null,
          integer:     int,
          real_num:    real,
          complex_num: complex]

Values are constructed by naming the oneof type (either as an identifier bound to it, or by spelling out the oneof construct) then the tag prefixed by make_:

T$make_integer(42)

The tagcase destructs “oneof” values.

x: oneof[pair: pair, empty: null]
...
tagcase x
    tag empty: return(false)
    tag pair(p: pair): if (p.car = i)
                       then return(true)
                       else x := down(p.cdr)
                       end
end

The main missing feature for parity with algebraic data types is recursive type definitions, which are not allowed directly. They can be achieved indirectly though inconveniently through multiple clusters (classes in modern terminology). (Source: A History of CLU by Barbara Liskov (1992).)

Burstall’s papers on Hope and Clear cite CLU, but beyond that it doesn’t seem easy to make precise claims about the influence of CLU, which is an object-oriented language, on the evolution of those other declarative languages developed across the pond.