Lecture 22 — 2015-11-23

QuickCheck

This lecture is written in literate Haskell; you can download the raw source.

In this class, we looked at QuickCheck, a system for automatically testing Haskell programs.

import Prelude hiding (lookup)

import Control.Applicative

import Test.QuickCheck 
import Test.QuickCheck.Function

import System.Random

import qualified Data.Set as Set
import Data.Set (Set)

import Data.Maybe

Writing unit tests is easy. We can write a bunch of tests as a list of bools:

test1 = [5 + 5 == 10, 4 + 6 == 10, 4 + 7 == 10]

And then we can run and test1 to make sure every test succeeds. Unit tests are sometimes inadequate… testing your own code, it’s easy to miss the very cases that cause your code to break.

QuickCheck lets us pretend to have quantifiers in our tests. For example, the following test ensures that (+) is symmetric.

addSymmetric n m = n + m == m + n

Loading it up in GHCi, we can run quickCheck addSymmetric, which will chipperly retort that all tests were successful.

A failing test produces a counterxample. Suppose we wanted to test whether all numbers sum to ten (hint: they don’t).

addFailing n m = n + m == 10

Running quickCheck addFailing, we’ll be informed that our test is falsifiable:

GHCi, version 7.10.2: http://www.haskell.org/ghc/  :? for help
Prelude> :load "/Users/mgree/cs131/lec/lec22.lhs"
[1 of 1] Compiling Main             ( /Users/mgree/cs131/lec/lec22.lhs, interpreted )
Ok, modules loaded: Main.
*Main> quickCheck addFailing
*** Failed! Falsifiable (after 1 test): 
0
0

Here, 0 and 0 are the choices for n and m that QuickCheck randomly made.

QuickCheck can do more than make up numbers. Here we observe that lists always have a non-negative length.

checkLength l = length l >= 0

So: how does it work?

There are two critical type classes: Arbitrary and Testable. Instances of the Arbitrary type class know (among other things) how to come up with arbitrary values, by means of arbitrary :: Arbitrary a => Gen a.

A term of type Gen a can generate terms of type a. Conveniently, Gen is an instance of Functor, Applicative, and Monad, so we can use these standard interfaces to combine the wide variety of functions available in the QuickCheck library.

The Testable type class does some interesting work, as well. Generally, an instance of Testable is a Property, some abstract type in the library. Critically, Bool is Testable, so our unit tests are Testable. But we can see there is one very interesting instance of Testable:

instance (Arbitrary a, Show a, Testable prop) => Testable (a -> prop) where
  property f = forAllShrink arbitrary shrink f

That is, a function a -> prop can be tested when (1) prop is testable and (2) when the argument a can be randomly generated and printed to the screen. The general strategy is to come up with an arbitrary input and then testing the generated property.

Let’s play with it more and see. Let’s work with binary trees, aiming to test a bunch of binary-search tree properties.

data BST a = Empty | Node (BST a) a (BST a) deriving (Eq, Show)

First, we can generate arbitrary binary trees—without the search property!—using the oneof operator.

bt :: Arbitrary a => Gen (BST a)
bt = oneof [return Empty, -- oneof :: [Gen a] -> Gen a
            liftA3 Node bt arbitrary bt]

That is, a bt is either Empty or a Node with bts for children and an arbitrary value.

We can run sample :: Show a => Gen a -> IO () to see some random values:

*Main> sample bt
Empty
Empty
Node Empty () (Node Empty () (Node (Node (Node Empty () Empty) () (Node (Node (Node Empty () Empty) () Empty) () (Node (Node (Node Empty () (Node Empty () (Node (Node Empty () (Node Empty () Empty)) () (Node Empty () Empty)))) () (Node (Node (Node (Node Empty () (Node Empty () Empty)) () (Node (Node Empty () (Node (Node Empty () Empty) () Empty)) () (Node Empty () Empty))) () Empty) () (Node (Node (Node Empty () Empty) () Empty) () (Node Empty () (Node Empty () Empty))))) () (Node Empty () (Node (Node (Node (Node Empty () Empty) () (Node Empty () (Node Empty () Empty))) () Empty) () (Node (Node Empty () Empty) () Empty)))))) () Empty))
Empty
Node Empty () Empty
Node (Node (Node (Node (Node Empty () Empty) () (Node (Node Empty () Empty) () (Node Empty () Empty))) () Empty) () Empty) () (Node Empty () Empty)
Empty
Empty
Node (Node (Node (Node Empty () Empty) () Empty) () (Node Empty () (Node (Node Empty () Empty) () (Node Empty () Empty)))) () (Node Empty () (Node Empty () Empty))
Node (Node (Node Empty () Empty) () (Node (Node Empty () Empty) () (Node (Node Empty () (Node (Node Empty () (Node Empty () Empty)) () (Node (Node Empty () Empty) () (Node (Node (Node Empty () Empty) () Empty) () (Node Empty () (Node Empty () (Node (Node Empty () Empty) () Empty))))))) () Empty))) () Empty
Empty

Note that we didn’t have to specify a type, but QuickCheck just used () for us. Unfortunately this won’t always be the case—it’s often necessary to write type annotations for QuickCheck to figure things out.

Let’s get some canned BST functions—a validity checker and some basic operationrs—and test them using QuickCheck.

isBST :: Ord a => BST a -> Bool
isBST t = isBST' Nothing Nothing t
  where isBST' lower upper Empty = True
        isBST' lower upper (Node l x r) = 
            maybeBounded lower upper x &&
            isBST' lower (Just x) l && 
            isBST' (Just x) upper r
        maybeBounded Nothing Nothing x = True
        maybeBounded Nothing (Just upper) x = x < upper
        maybeBounded (Just lower) Nothing x = lower < x
        maybeBounded (Just lower) (Just upper) x = lower < x && x < upper

insert x Empty = Node Empty x Empty
insert x (Node l y r) | x < y  = Node (insert x l) y r
                      | x == y = Node l y r
                      | x > y  = Node l y (insert x r)

delete x Empty = Empty
delete x (Node l y r) 
  | x < y  = Node (delete x l) y r
  | x > y  = Node l y (delete x r)
  | x == y = case (l,r) of
               (Empty, _) -> r
               (l, Empty) -> l
               _          -> let (min, r') = deleteMin r in
                             Node l min r'
  where deleteMin Empty = error "nope"
        deleteMin (Node Empty x r) = (x,r)
        deleteMin (Node l x r) = 
          let (min, l') = deleteMin l in
          (min, Node l' x r)

lookup _ Empty = Nothing
lookup v (Node l x r)
  | v < x  = v `lookup` l
  | v == x = Just x
  | v > x  = v `lookup` r

To test these operations, we need to come with not just arbitrary binary trees, but arbitrary binary search trees. We had a few ways of doing this:

bst :: (Arbitrary a, Bounded a, Ord a, Random a, Enum a) => Gen (BST a)
bst = boundedBST minBound maxBound

boundedBST :: (Arbitrary a, Ord a, Random a, Enum a) => a -> a -> Gen (BST a)
boundedBST lo hi = 
  oneof [return Empty,
         do
           v <- choose (lo,hi)
           l <- boundedBST lo (pred v)
           r <- boundedBST (succ v) hi
           return $ Node l v r]

Notice how we use the monadic interface to generate a value and then use it to control bounds.

If we wanted to control size more:

sizedBST :: (Arbitrary a, Ord a, Random a, Enum a) => 
            a -> a -> Int -> Gen (BST a)
sizedBST lo hi 0 = return Empty
sizedBST lo hi n = 
  oneof [do
           v <- choose (lo,hi)
           l <- sizedBST lo (pred v) (n `div` 2)
           r <- sizedBST (succ v) hi (n `div` 2)
           return $ Node l v r]

We can then test that the trees are of (approximately) the right size.

size :: BST a -> Int
size Empty = 0
size (Node l _ r) = 1 + size l + size r

bigTrees (Positive n) = 
  forAll (sizedBST minBound maxBound n :: Gen (BST Int)) 
  $ \b -> 2 * n >= size b && size b >= n

We use the Positive newtype to make sure that QuickCheck only generates positive numbers for BST sizes.

instance (Arbitrary a, Enum a, Ord a, Bounded a, Random a) => 
         Arbitrary (BST a) where
  arbitrary = sized $ sizedBST minBound maxBound

The sized function is a way for QuickCheck to control the size of randomly generated values—it likes to explore from smaller to bigger, presuming that smaller values will trigger errors just as well as large ones.

There’s a slightly better way, though, if you have confidence in your operations: use them directly!

bsts' :: (Ord a, Arbitrary a) => Gen (BST a)
bsts' = listOf arbitrary >>= return . foldr insert Empty

instance (Ord a, Arbitrary a) => Arbitrary (BST a) where
  arbitrary = bsts'

We can first check to make sure our generators are correct.

prop_allValid :: BST Int -> Bool
prop_allValid = isBST

Next, we can make sure that insert does the right thing. First, we should be able to lookup whatever’s been put into the BST. Second, insert should maintain the search invariant.

insertLookup :: Int -> BST Int -> Bool
insertLookup x t = isJust $ lookup x $ insert x t

insertValid :: Int -> BST Int -> Bool
insertValid x t = isBST $ insert x t

Note that we don’t even need to have the generate for BSTs; we can instead use (==>) to filter our inputs:

insertValid' :: Int -> Property
insertValid' x = forAll bt $ \t -> isBST t ==> isBST $ insert x t

We can write other, similar tests for other operations. Here we use classify to see how well we cover the space of inputs. That is, how often do we insert, delete, or lookup an element that is already in the tree?

prop_deleteIsBST :: BST Int -> Int -> Property
prop_deleteIsBST b x = classify (isJust $ lookup x b) "in tree" $ 
                       isBST $ delete x b

prop_deleteWorks :: BST Int -> Int -> Property
prop_deleteWorks b x = classify (isJust $ lookup x b) "in tree" $ 
                       isNothing $ lookup x $ delete x b

Finally, we looked at randomly generating lambda calculus terms.

type Id = String

data LCExpr = 
    LCVar Id
  | LCApp LCExpr LCExpr
  | LCLam Id LCExpr
  deriving (Show,Eq)

vars :: Gen Id
vars = elements $ [[c] | c <- ['a'..'z']]

idAndBools :: Gen LCExpr
idAndBools = elements [LCLam "x" $ LCVar "x",
                       LCLam "t" $ LCLam "f" $ LCVar "t",
                       LCLam "t" $ LCLam "f" $ LCVar "f"]

lcTerms :: Gen LCExpr
lcTerms = sized terms
  where terms 0 = oneof [LCVar <$> vars, idAndBools]
        terms n = oneof [LCVar <$> vars,
                         let smaller = terms (n `div` 2) in
                         liftA2 LCApp smaller smaller,
                         liftA2 LCLam vars $ terms (n `div` 2)]

Sampling from the lambda calculus terms above, we see that many aren’t well scoped. Let’s generate well scoped terms.

scoped :: Set Id -> Gen LCExpr
scoped ctx = 
  oneof [elements $ (map LCVar $ Set.toList ctx) ++ 
         [LCLam "x" $ LCVar "x"],
         liftA2 LCApp (scoped ctx) (scoped ctx),
         do
           x <- vars
           e <- scoped (Set.insert x ctx)
           return $ LCLam x e]

fv :: LCExpr -> Set Id
fv (LCVar x) = Set.singleton x
fv (LCApp e1 e2) = fv e1 `Set.union` fv e2
fv (LCLam x e) = Set.delete x $ fv e

We can test and see: did it work?

wellScoped = forAll (scoped Set.empty) $ Set.null . fv