# Homework 5

This homework is written in literate Haskell; you can download the raw source to fill in yourself. You’re welcome to submit literate Haskell yourself, or to start fresh in a new file, literate or not.

Please submit homeworks via the new submission page.

This homework (and all the ones after it), you are free to work on your own or in groups of up to three. Please make sure that your whole group—and only your group—are listed as collaborators on any submission.

In this homework, you’ll be working with monads in a variety of ways: to speed up a shuffling algorithm; to create a generic framework for MapReduce; and to generate random values for testing.

I’ve imported the libraries you’ll need. Look at them before you start! You’ll want to use both the Haskell language documentation as well as Hackage and Hoogle.

If you’re running Haskell on your own computer and you installed the Haskell platform, you should be able to install the two libraries we need by running at the command line `stack install random QuickCheck`. These libraries should already be installed on the lab machines, but they may not be; the same `stack install random QuickCheck` command should work.

You are of course allowed to import other libraries. It may even make your solutions easier!

``````module Hw05 where

import Data.Array.IO

import qualified Data.Map as Map
import Data.Map (Map,(!))

import System.Environment
import System.Exit
import System.IO
import System.Random

import Test.QuickCheck hiding (shuffle)``````

Problem (1): shuffling

In this problem, we’re going to “shuffle” a list, reordering it randomly.

To start, let’s get familiar with `System.Random`.

Write a function that takes two numbers, `low` and `high`, and returns a random number `n` such that `low <= n <= high`, i.e., inclusively within the range.

``````rand :: Int -> Int -> IO Int
rand low high = undefined``````

Now write a function that takes a list and shuffles it. The straightforward algorithm is O(n2):

• Given a non-empty list `xs`,
• randomly split the list into an element `y` and the rest of the list `ys`,
• cons `y` onto the shuffling of `ys`.

Don’t worry, we’ll speed it up in a minute.

``````shuffleList :: [a] -> IO [a]
shuffleList xs = undefined``````

Don’t forget that you can run `:set +s` to get timing information in GHCi. My implementation on my computer runs `sum <\$> shuffleList [0..10000]` 3.26 seconds.

It turns out that there’s a much faster, O(n) algorithm for shuffling: the Fisher–Yates shuffle. It works on arrays, not linked lists, so we’ll have to use Haskell’s arrays.

Haskell’s arrays are a little funny: arrays are parameterized by two things: the type of their index and the monad in which they’ll be used. We’ll work with `IOArray`s. The `IOArray` type represents arrays that can be used in the `IO` monad. We’ll interact with these arrays using the `MArray` interface.

Let’s take a brief look at `IOArray`. It has kind `* -> * -> *`. The first type it needs is the type of its indices… we can just use `Int` for that, but it’s interesting that we can use any type in the `Ix` class. The second type it needs is the type of its contents. Shuffling won’t care about that, so we’ll end up working with `IOArray Int a`.

As a warmup, write a function that takes a list and generates a corresponding array. It’s worth noting that the bounds that Haskell uses in, e.g., `newListArray` are inclusive, per `Data.Ix`.

``````listToArray :: [a] -> IO (IOArray Int a)
listToArray x = undefined``````

Okay: let’s do it. Implement the Fisher–Yates shuffling algorithm that takes a given array and shuffles it.

``````shuffle :: IOArray Int a -> IO ()
shuffle arr = undefined``````

Now use your array-based function `shuffle` to work on lists. Be sure to test your code on a wide variety of inputs!

``````fastShuffle :: [a] -> IO [a]
fastShuffle l = undefined``````

My version of `fastShuffle` runs much more quickly than the naive one: `sum <\$> fastShuffle [0..10000]` runs in 0.04 seconds!

Finally, write a function that reads in a file and shuffles the order of its lines. You should ignore lines with no characters on them and your final output should not end in a newline.

``````shuffleFile :: FilePath -> IO String
shuffleFile f = undefined``````

Finally, this shuffling program is useful enough that we should make it a command. To do so, you’ll need to write a `main` function, which should have type `IO ()`.

First and foremost, to compile your program, you’ll run a command like: `stack ghc -- Hw05.lhs -main-is Hw05.main -o shuffle`. This will compile your program to an executable called `shuffle`. (The `-main-is` flag is necessary because the default in Haskell is to have a file named Main.hs.)

How should your program behave? It should take a single, optional argument indicating a filename to shuffle. Without an argument (or with an argument of `-`), it should read the content to be sorted from standard input. If more than one argument is given, you should print a “usage” message on standard error and exit with a non-zero exit code.

Either way, your program should read the lines of the input (file or standard input), shuffle them, and then print out the shuffled input.

Look at the various `System` modules to find out how to parse command-line arguments.

``````main :: IO ()
main = undefined``````

MapReduce is a model for data parallel computation. We’ll look at data parallel programming after Thanksgiving, but for now let’s try to understand MapReduce as is.

Our mappers will take an input of type `a` and produce a list of key-value pairs, where keys have type `k` and values have type `v`.

``type Mapper a k v = a -> [(k,v)]``

Our reducers take a key and list of values and produces a new (ideally shorter!) list of values.

``type Reducer k v = k -> [v] -> [v]``

The actual MapReduce implementaiton has three real phases: mapping, shuffling, and reducing. Here’s a simple implementation:

``````mapReduce :: Ord k => Mapper a k v -> Reducer k v -> [a] -> [(k,[v])]
mapReduce m r = reduce r . shuffleKeys . concatMap (map listifyVal . m)
where listifyVal (k,v) = (k,[v])
shuffleKeys = Map.fromListWith (++)
reduce r = Map.toList . Map.mapWithKey r``````

The canonical MapReduce example is word count. (Riveting, isn’t it?) Here’s how to implement word count in MapReduce: given a list of documents, a mapper breaks a given document into its consituent words, where appearance of a word maps to 1. After shuffling, identical words will be grouped together; the reducer sums up each token.

``````wordCount = mapReduce countWords sumCounts
where countWords = map (\w -> (w,1)) . words
sumCounts _ cs = [sum cs]``````

I’m not sure why Google is so proud of what amounts to eight lines of code and a slow way to count words.

Let’s modify the MapReduce paradigm to allow for mappers and reducers that generate monadic computations, like so:

``````type MapperM m a k v = a -> m [(k,v)]
type ReducerM m k v = k -> [v] -> m [v]``````

Note that a `MapperM` returns its list of key-value pairs inside of some monad `m`; `ReducerM` is similar.

Adapt `mapReduce` above to define `mapReduceM`:

``````mapReduceM :: (Ord k, Monad m) => MapperM m a k v -> ReducerM m k v -> [a] -> m [(k,[v])]
mapReduceM m r input = undefined``````

To test, here’s an adaptation of the `wordCount` example above.

``````wordCountM = mapReduceM countWords sumCounts
where countWords = return . map (\w -> (w,1)) . words
sumCounts w cs = do
when (length cs > 1) \$ putStrLn \$ "Lots of " ++ w ++ "!"
return [sum cs]``````

If you’re having trouble, I recommend piecing apart the function and relying on types to help you get through. If you’re truly stuck, a good place to start is refactoring `mapReduce` so it doesn’t use composition (`(.)`).

Problem (3): QuickCheck

We’ll be using QuickCheck to write some tests.

Write a QuickCheck property to check that `reverse` is involutive, i.e., that reversing a reversed list yields the original list.

``prop_rev_involutive l = undefined``

Write a QuickCheck property to check that checks the Collatz conjecture for a given number greater than 0.

``prop_Collatz = undefined``

Write a QuickCheck property that expresses the correctness of your `fastShuffle` function. No need to go for full correctness of every potential property of your shuffle, e.g., that it’s pseudorandom. You might need to write a type signature. Check out `Test.QuickCheck.Monadic`.

``````prop_fastShuffle_correct :: [Int] -> Property
prop_fastShuffle_correct s = undefined ``````
``````data ArithExp =
Num Int
| Plus ArithExp ArithExp
| Times ArithExp ArithExp
| Neg ArithExp
deriving Show

eval :: ArithExp -> Int
eval (Num i) = i
eval (Plus e1 e2) = eval e1 + eval e2
eval (Times e1 e2) = eval e1 * eval e2
eval (Neg e) = 0 - eval e``````

Write a generator that generates arbitrary `ArithExp`s. Use it to define an `Arbitrary` instance for `ArithExp`… keep in mind that we don’t want to generate giant data structures, so you may need to keep track of sizes.

``````instance Arbitrary ArithExp where
arbitrary = undefined``````

Write a test to ensure that `Plus e e` behaves the same as `Times 2 e` for all expressions `e`.

``prop_double = undefined``