A quick tour of generic-random

Posted on January 5, 2018

Metaprogramming with Generics in Haskell allows us to derive many functions and types directly from newly declared types. Here is a quick toy demonstration of using generic-random to derive arbitrary from the QuickCheck library. I won’t go into any implementation details; to learn about generics in general, check out this tutorial!

Starters

Below is a type MyType with a simple, handwritten Arbitrary instance.

{-# LANGUAGE InstanceSigs, TypeApplications #-}
import Test.QuickCheck

data MyType
  = OneThing Int
  | TwoThings Double String

instance Arbitrary MyType where
  arbitrary :: Gen MyType
  arbitrary = oneof [
    OneThing <$> arbitrary @Int,
    TwoThings <$> arbitrary @Double <*> arbitrary @String]

(Also showing off the InstanceSigs and TypeApplications extensions. These annotations are inferable here, but helpful! Especially the former.)

We generate either OneThing or TwoThings with probability 1/2 each, and use other existing Arbitrary instances to fill their respective fields.

Now, let us add a constructor to MyType:

data MyType
  = OneThing Int
  | TwoThings Double String
  | ThreeThings (Maybe Integer) [()] (Bool -> Word)

instance Arbitrary MyType where
  arbitrary :: Gen MyType
  arbitrary = oneof [
    OneThing <$> arbitrary @Int,
    TwoThings <$> arbitrary @Double <*> arbitrary @String]

That compiles therefore it’s correct but the new constructor is not generated by arbitrary yet! Of course, we must also remember to update any code involving the modified MyType.

data MyType
  = OneThing Int
  | TwoThings Double String
  | ThreeThings (Maybe Integer) [()] (Bool -> Word)

instance Arbitrary MyType where
  arbitrary :: Gen MyType
  arbitrary = oneof [
    OneThing <$> arbitrary @Int,
    TwoThings <$> arbitrary @Double <*> arbitrary @String,
    ThreeThings <$> arbitrary <*> arbitrary <*> arbitrary]
    -- N.B.: QuickCheck can generate functions

(The lazy programmer gives up spelling out all the field types of ThreeThings.)

Main course

Typing arbitrary so often gets repetitive; here enters generic-random.

-- In addition to the first LANGUAGE/import header
{-# LANGUAGE DeriveGeneric #-}
import GHC.Generics
import Generic.Random

data MyType
  = OneThing Int
  | TwoThings Double String
  | ThreeThings (Maybe Integer) [()] (Bool -> Word)
  deriving Generic

instance Arbitrary MyType where
  arbitrary :: Gen MyType
  arbitrary = genericArbitraryU
  -- Uniform distribution of MyType constructors

In contrast to the previous snippets, genericArbitraryU automatically adapts to changes in the numbers of constructors and fields of MyType.

We may find OneThing a boring enough test case that we should generate it less often, here with probability 1/9.

instance Arbitrary MyType where
  arbitrary :: Gen MyType
  arbitrary = genericArbitrary (1 % 4 % 4 % ())
  -- 1/(1+4+4): OneThing
  -- 4/(1+4+4): TwoThings
  -- 4/(1+4+4): ThreeThings

Now, forgetting to update the distribution when the number of constructor changes would result in a compile-time error. It’s also possible to statically enforce the correspondence between weights and constructor names (the declaration order must match too).

instance Arbitrary MyType where
  arbitrary :: Gen MyType
  arbitrary = genericArbitrary
    ((1 :: W "OneThing") %
     (4 :: W "TwoThings") %
     (4 :: W "ThreeThings") %
     ())

Suddenly, we realize Nothing is not a thing, so ThreeThings Nothing [()] fromInteger is not really “three things”.

To implement the requirement that no Nothing is generated, last year we would have had to go back to the fully handwritten generator (with frequency instead of oneof to preserve the distribution).

instance Arbitrary MyType where
  arbitrary :: Gen MyType
  arbitrary = frequency [
    (1, OneThing <$> arbitrary @Int),
    (4, TwoThings <$> arbitrary @Double <*> arbitrary @String),
    (4, ThreeThings <$> (Just <$> arbitrary) <*> arbitrary <*> arbitrary)]

But now, since generic-random-1.1, we can say: “for any field of type Maybe Integer, use this generator; otherwise use arbitrary, as before”.

-- Heterogeneous list of generators, of length 1, with cons (:@).
custom :: GenList '[Maybe Integer]
custom = (Just <$> arbitrary) :@ Nil

instance Arbitrary MyType where
  arbitrary :: Gen MyType
  arbitrary = genericArbitraryG custom (1 % 4 % 4 % ())

If that is too heavy handed, we can also mention specific fields by name, when they have one (there is an example at the end of this “tutorial module”).

We are reaching the end of this tour. A compilable version of that last snippet.

N.B.

Random generation for testing is a largely open topic. generic-random implements a very simple and specific kind of random generators, and it is not always applicable: depending on the type and distribution of constructors, it may not terminate within a reasonable time, and many applications need much more structured generators to achieve the best coverage.

Dessert (Conclusion)

Other than just indulging in our laziness when writing code, automating boilerplate-writing has benefits that may lighten the burden of maintenance:

Feel free to make a pull request or open an issue if you’d like to see some new option in generic-random or any other improvement!

P.S.

generic-random changed a lot since its creation. The initial implementation derived Boltzmann samplers, which are heavier in complexity and dependencies; that can now be found in the boltzmann-samplers library (I’m slowly working on a GHC.Generics version instead of SYB). The now simpler generic-random doesn’t have as nice probabilistic guarantees as for Boltzmann samplers, but it is actually not clear how a globally uniform-ish distribution improves random testing and whether that is worth the extra complexity. Even with a naive distribution of constructors:

Moreover, if you really need a uniform distribution, take a look at testing-feat! (So far I found it’s much more efficient than Boltzmann samplers.)