A quick tour of generic-random
Metaprogramming with Generics in Haskell allows us to derive many functions and
types directly from newly declared types. Here is a quick toy demonstration of
using generic-random to
derive arbitrary
from the
QuickCheck library.
I won’t go into any implementation details; to learn about generics in general,
check out this tutorial!
Starters
Below is a type MyType
with a simple, handwritten
Arbitrary
instance.
{-# LANGUAGE InstanceSigs, TypeApplications #-}
import Test.QuickCheck
data MyType
= OneThing Int
| TwoThings Double String
instance Arbitrary MyType where
arbitrary :: Gen MyType
arbitrary = oneof [
OneThing <$> arbitrary @Int,
TwoThings <$> arbitrary @Double <*> arbitrary @String]
(Also showing off the
InstanceSigs
and
TypeApplications
extensions. These annotations are inferable here, but helpful!
Especially the former.)
We generate either OneThing
or TwoThings
with probability 1/2 each,
and use other existing Arbitrary
instances to fill their respective fields.
Now, let us add a constructor to MyType
:
data MyType
= OneThing Int
| TwoThings Double String
| ThreeThings (Maybe Integer) [()] (Bool -> Word)
instance Arbitrary MyType where
arbitrary :: Gen MyType
arbitrary = oneof [
OneThing <$> arbitrary @Int,
TwoThings <$> arbitrary @Double <*> arbitrary @String]
That compiles therefore it’s correct but the new constructor is not
generated by arbitrary
yet! Of course, we must also remember to update any
code involving the modified MyType
.
data MyType
= OneThing Int
| TwoThings Double String
| ThreeThings (Maybe Integer) [()] (Bool -> Word)
instance Arbitrary MyType where
arbitrary :: Gen MyType
arbitrary = oneof [
OneThing <$> arbitrary @Int,
TwoThings <$> arbitrary @Double <*> arbitrary @String,
ThreeThings <$> arbitrary <*> arbitrary <*> arbitrary]
-- N.B.: QuickCheck can generate functions
(The lazy programmer gives up spelling out all the field types of ThreeThings
.)
Main course
Typing arbitrary
so often gets repetitive;
here enters
generic-random.
-- In addition to the first LANGUAGE/import header
{-# LANGUAGE DeriveGeneric #-}
import GHC.Generics
import Generic.Random
data MyType
= OneThing Int
| TwoThings Double String
| ThreeThings (Maybe Integer) [()] (Bool -> Word)
deriving Generic
instance Arbitrary MyType where
arbitrary :: Gen MyType
arbitrary = genericArbitraryU
-- Uniform distribution of MyType constructors
In contrast to the previous snippets, genericArbitraryU
automatically
adapts to changes in the numbers of constructors and fields of MyType
.
We may find OneThing
a boring enough test case that we should generate it
less often, here with probability 1/9.
instance Arbitrary MyType where
arbitrary :: Gen MyType
arbitrary = genericArbitrary (1 % 4 % 4 % ())
-- 1/(1+4+4): OneThing
-- 4/(1+4+4): TwoThings
-- 4/(1+4+4): ThreeThings
Now, forgetting to update the distribution when the number of constructor changes would result in a compile-time error. It’s also possible to statically enforce the correspondence between weights and constructor names (the declaration order must match too).
instance Arbitrary MyType where
arbitrary :: Gen MyType
arbitrary = genericArbitrary
1 :: W "OneThing") %
((4 :: W "TwoThings") %
(4 :: W "ThreeThings") %
( ())
Suddenly, we realize Nothing
is not a thing, so
ThreeThings Nothing [()] fromInteger
is not really “three things”.
To implement the requirement that no Nothing
is generated, last year we
would have had to go back to the fully handwritten generator (with
frequency
instead of
oneof
to preserve the distribution).
instance Arbitrary MyType where
arbitrary :: Gen MyType
arbitrary = frequency [
1, OneThing <$> arbitrary @Int),
(4, TwoThings <$> arbitrary @Double <*> arbitrary @String),
(4, ThreeThings <$> (Just <$> arbitrary) <*> arbitrary <*> arbitrary)] (
But now, since generic-random-1.1, we can say: “for any field of type
Maybe Integer
, use this generator; otherwise use arbitrary
, as before”.
-- Heterogeneous list of generators, of length 1, with cons (:@).
custom :: GenList '[Maybe Integer]
custom = (Just <$> arbitrary) :@ Nil
instance Arbitrary MyType where
arbitrary :: Gen MyType
arbitrary = genericArbitraryG custom (1 % 4 % 4 % ())
If that is too heavy handed, we can also mention specific fields by name, when they have one (there is an example at the end of this “tutorial module”).
We are reaching the end of this tour. A compilable version of that last snippet.
N.B.
Random generation for testing is a largely open topic. generic-random implements a very simple and specific kind of random generators, and it is not always applicable: depending on the type and distribution of constructors, it may not terminate within a reasonable time, and many applications need much more structured generators to achieve the best coverage.
Dessert (Conclusion)
Other than just indulging in our laziness when writing code, automating boilerplate-writing has benefits that may lighten the burden of maintenance:
we can’t get the boilerplate wrong if we don’t write it, and the boilerplate may rewrite itself when types changes (e.g., we can’t forget to generate a constructor; that is admittedly hyperbolic, only certain kinds of mistakes are actually prevented);
not only that, it might not even be necessary to know how to write the boilerplate to get something working (here, a newcomer could get generators and play with the rest of QuickCheck without having to do any monadic programming with
Gen
, although more documentation seems necessary to put that into practice);we can optimize the boilerplate by changing the one piece of code that generates it, instead of the many places where it would be duplicated (e.g.,
frequency
andoneof
are the easiest things to use but call recursive functions on mostly static lists, which are thus not optimized away by GHC; a generic library can transparently use a more efficient implementation for all users to benefit).
Feel free to make a pull request or open an issue if you’d like to see some new option in generic-random or any other improvement!
P.S.
generic-random changed a lot since its creation. The initial
implementation derived Boltzmann samplers, which are heavier in complexity
and dependencies; that can now be found in the
boltzmann-samplers
library (I’m slowly working on a GHC.Generics
version instead of SYB).
The now simpler generic-random doesn’t have as nice probabilistic guarantees
as for Boltzmann samplers, but it is actually not clear how a globally
uniform-ish distribution improves random testing and whether that is worth the
extra complexity. Even with a naive distribution of constructors:
small types (i.e., with few inhabitants) are quickly covered;
for large types, we still generate a good variety of test cases quickly;
anyway, what is the uniform (or actually, “sizewise uniform” for Boltzmann samplers) distribution for
Double
? For functions with infinite domain?
Moreover, if you really need a uniform distribution, take a look at testing-feat! (So far I found it’s much more efficient than Boltzmann samplers.)