There’s more than one way to shuffle cards, but the Knuth Shuffle is one of the better ways you can find. It’ll produce outputs that are uniformly distributed across all the possible shuffles of the deck, and it’ll do it in linear time. But to implement it efficiently you need a mutable array. If your implementation language is Haskell, that means using the
IO monad. That also means you’re doing computer science, so you have to call shuffles permutations. Here’s the code.
module Permutation ( permute ) where import Data.Array.MArray import Data.Array.IO import System.Random permute :: [a] -> IO [a] permute xs = let len = length xs in do a <- newListArray (0, len - 1) xs mapM_ (\i -> randomRIO (i, len - 1) >>= swap a i) [0..len-1] getElems a swap :: (Ix i) => IOArray i a -> i -> i -> IO () swap a i j = do e1 <- readArray a j e2 <- readArray a i writeArray a i e1 writeArray a j e2
If you’re like me when I first discovered
unsafePerformIO, you might wonder if you can use it to get
permute out of the
IO monad. Here’s what the documentation says.
unsafePerformIO :: IO a -> a
This is the “back door” into the
IOcomputation to be performed at any time. For this to be safe, the
IOcomputation should be free of side effects and independent of its environment.
If the I/O computation wrapped in
unsafePerformIOperforms side effects, then the relative order in which those side effects take place (relative to the main I/O trunk, or other calls to
unsafePerformIO) is indeterminate.
Haskell is a lazy language, and the second paragraph points out the problem of mixing laziness with side effects. You don’t have a guarantee of when an expression will be evaluated, so you don’t know in what order side effects will take place. Hence the
IO monad, which requires you to explicitly sequence computations that depend on or mutate some implicit state.
unsafePeformIO is a way to circumvent that requirement. When you use it, you’re telling Haskell that you don’t care when you program performs a particular computation relative to other side effects. But caveat emptor, the documentation claims this is only “safe” if “the
IO computation … [is] free of side effects and independent of its environment.” This seems like a sufficient condition for its safe use. If you can say that about a computation, then it certainly doesn’t matter when or in what relative order your program performs that computation. But is that a necessary condition? No, and I believe the
permute function above demonstrates that.
permute, every execution of a
randomRIO computation mutates and reads from
theStdGen, which is where Haskell stores the state of its random number generator. So clearly,
permute mutates implicit state and depends on its environment. But whether you get this random number or some other one doesn’t really matter. You still get a random number. The order in which your program mutates and reads from
theStdGen is irrelevant. So it seems you should be able to define
permute like this.
permute :: [a] -> [a] permute l = unsafePerformIO $ let len = length l in do a <- newListArray (0, len - 1) l mapM_ (\i -> randomRIO (i, len - 1) >>= swap a i) [0..len-1] getElems a
A Sequence of Precautions
That’s an easy change to make, but it’s not the only change that’s necessary. Expressions involving
unsafePerformIO violate one of GHC’s core assumptions: that all expressions are referentially transparent. Referential transparency allows GHC to perform optimizations by reasoning about program behavior equationally. If an expression
e is referentially transparent, and it evaluates to some value
v, then you can replace any occurrence of
v in the text of your program without changing its meaning. More generally if two expressions are referentially transparent and they evaluate to the same value, then you can use them interchangeably.
permute mutates the state of Haskell’s random number generator, substituting an application of
permute with its result would discard those mutations. That would change the result of subsequent calls to the random number generator, thus changing the result of your program. So it’s not referentially transparent. This doesn’t seem like a big deal—one random number’s as good as the next, after all —until you consider the sorts of optimizations that GHC may perform on your program.
Common Subexpression Elimination
Consider the following expression.
(permute [1..4], permute [1..4])
GHC assumes that the two applications of
permute are referentially transparent. Since they’re identical expressions, they should evaluate to the same result. So we should be able to just evaluate the expression once, and substitute the result in place of those two expressions. This is called common subexpression elimination, and it would transform the expression like this.
let e = permute [1..4] in (e, e)
While the original expression is likely to produce two distinct permutations, the transformed program will always produce two identical ones. That’s less work at runtime, but it doesn’t preserve the intent of the original program. You can turn off common subexpression elimination using GHC’s
-fno-cse compiler flag. If a module uses
permute, you should compile it with this flag.
let-binding an expression in a function that does’t depend on the function’s arguments…
compute x = let n = 1 + 1 in x / (x + n)
…GHC may avoid recomputing
n on every application by “floating the
let” outside of the function. It’ll turn your program into something like this.
n = 1 + 1 compute x = x / (x + n)
n is bound to an expression that involves
unsafePerformIO, then you likely intended for
n to be evaluated for each application of
compute. But if GHC applies this optimization, this won’t happen.
compute will always return the same result for the same inputs. You can turn this off by compiling with the
In the case of
permute’s module you don’t have to use that flag since all the expressions directly or indirectly depend on the argument
l. You may have to use it in other modules to prevent calls to
permute from floating out of your functions.
When the GHC inliner inlines a function, it may completely eliminate the application, substituting the syntactic arguments at the call site into the body of the function. So GHC might optimize a call to
compute (f 2) like this:
let n = 1 + 1 in (f 2) / ((f 2) + n)
It can even go a step further and inline
n, leaving nothing but
(f 2) / ((f 2) + (1 + 1))
…which it can then subject to further optimization passes. If the argument itself reads from or mutates implicit state—imagine if
f were were a function like
permute—then those operations may be duplicated by the inliner. What your program assumed would be identical permutations may in fact be distinct, which is roughly the opposite effect that common subexpression elimination might have. Mark
NOINLINE to avoid this.
Seeds of Doubt
Everything up until now has been an attempt to make the fact that
permute violates GHC’s assumption of referential transparency irrelevant. GHC’s optimizations may throw out necessary work—as in the case of common subexpression elimination—or it might duplicate work—as in the case of inlining. That may make random program too random or not random enough. But even when your program’s just random enough, there’s still the question of when it’s random.
You might want to seed
theStdGen manually using
setStdGen. If you use
setStdGen after you’ve called
permute, then calls to
permute might be floating around in unevaluated thunks. When your program forces those thunks,
permute will read from the new state of
theStdGen—not the old state, as you might have intended. This is a legitimate concern, so it’s best to either seed
theStdGen at the beginning of your program, or not at all. Otherwise, it may be hard to get the timing right.
Having to worry about the behavior of GHC optimizations when you use
unsafePerformIO is not pleasant. It’s fascinating, but it makes you question the correctness of your program. GHC is a complex system, and using
unsafePerformIO violates one of its core assumptions. That’s hard to recover from, but that’s no reason not to try. In the process, you acquire a better understanding and appreciation for the system you’re breaking.