Monkeys and Shakespeare

The Mathematics of Monkeys and Shakespeare

or “Monkey Claims Copyright on Hamlet: Film at 11.”

A constrained rant by The Famous Brett Watson, 13-Dec-1995.

Introduction

I don’t know who it was that first talked about the possibility of monkeys typing randomly on typewriters producing Hamlet entirely by chance, but it is an argument that I have often heard. “Sure it’s unlikely,” I’m told, “but given enough time and enough monkeys, it would happen.”

This argument is actually quite sound — given enough time and enough monkeys, one could eventually produce “Hamlet” by accident. The fact that it is intuitively sound is the argument’s greatest problem, because it means that people generally don’t bother checking the exact figures. This is a shame, because it is one of those rare areas of speculation where the exact figures can be calculated.

The other problem is that large numbers are rarely understood. Allow me to tell a little joke to demonstrate.

The Orders of Magnitude Gag

During the US government’s “Strategic Defense Initiative” program, better known as “Star Wars”, leading scientists on the project were asked to report their progress to the Minister of Defense. So they gathered their data and brought it together in a presentation. In short, they had discovered that the whole problem of shooting down a nuclear missile at such a great distance was a very tricky problem indeed. They intended to explain to the minister what an impossible problem it was — well beyond the capabilities of current technology. During the course of their presentation, the following exchange took place.

SCIENTIST: …and so you can see, Mr Minister, that in order to achieve an acceptable hit-rate against the missiles, our instruments need to be accurate to one part in ten to the ninth. So far, the best we have been able to achieve is one part in ten to the fifth.

MINISTER: That’s tremendous! We’re over half way there!

Our poor politician completely failed to understand the meaning of the scientist’s statement. Ten to the fifth is not more than half of ten to the ninth. Ten to the fifth is 100,000 and ten to the ninth is 1,000,000,000. Perhaps our scientist friend would have done better to let the politician see all those zeros — and then translate it into terms of a budget increase!

This brings me to the question of probability. Just as large numbers are widely misunderstood, there are very few people who appreciate just how unlikely things can be. The monkeys and typewriters scenario sounds possible until you examine the math. What follows is an in-depth discussion of the mathematics of the situation — it’s very long, but interesting, I think.

Monkeys Produce Hamlet: Feasibility Study

Let’s imagine a very simple typewriter that has only the 26 upper-case letters, a space bar and five punctuation characters (a total of 32 buttons). It doesn’t even have a carriage return — it does an automatic return when the required number of letters have been typed, and it has an infinite roll of paper being fed through it. We have a monkey that knows how to press the keys and will do so in a totally random manner indefinitely. All in all, we have a little bit of machinery, but no real intelligence in the system. We want our monkey to type the following snippet: “TO BE OR NOT TO BE, THAT IS THE QUESTION.

The probability of this happening is quite simple to calculate, and this will in turn give us some idea of how many monkeys and typewriters we need for a reasonable chance of success. Place your bets now — our monkeys are fast typists and can type the required number of characters in a single second (there are 41 keystrokes)! On average, how long will it be before one of our monkeys produces a line matching the above sentence?

Well, there are 32 keys, so starting at any moment, the chances of our monkey getting the first keypress right are one in 32. Not good, but we have fast monkeys and lots of time. However, once it has got the first keystroke right, we also need the second keystroke to be right, otherwise we are back to square one. The chances of it getting the first and second keystrokes right are only one in (32*32 = 1024). Only one chance in 1024, but still lots of time to get it right. To get the first three characters right will be a one in (32*32*32 = 32768) chance. Each time it presses a key, there is a one in 32 chance that it will be correct. To get our little snippet of Hamlet, it will need a total of 41 consecutive “correct” keystrokes. This means that the chances are one in 32 to the power of 41. Let’s look at a table of values.

   Keys   Chances (one in...)
   ------------------------------------
    1     32
    2     32*32 = 1024
    3     32*32*32 = 32768
    4     32*32*32*32 = 1048576
    5     32^5 = 33554432
    6     32^6 = 1073741824
    7     32^7 = 34359738368
    8     32^8 = 1099511627776
    9     32^9 = 3.518437208883e+013
    10    32^10 = 1.125899906843e+015
    ...
    20    32^20 = 1.267650600228e+030
    ...
    30    32^30 = 1.427247692706e+045
    ...
    41    32^41 = 5.142201741629e+061
    ...
    204   32^204 = 1.123558209289e+307

The last figure is included only because it is the largest value that the MS Windows calculator can handle — it’s doing better than my hand-held Casio (old faithful!) which only goes up to 1e+99. Okay, so these figures are pretty vast, but we have a lot of monkeys and they can type fast. So how long will it take, on average, for one of my monkeys to type a line matching that sentence? Hard question. Let’s get an idea of how long we are talking here. How many lines can my monkey type in a year, given that it types at a rate of one line per second?

  1 line per second
  * 60 seconds per minute = 60 lines per minute
  * 60 minutes per hour = 3600 lines per hour
  * 24 hours per day = 86400 lines per day
  * 365.24 days per year = 31556736 lines per year

Okay, now for the moment of truth. We know how many possible different lines can be produced, hence how likely it is for us to get the right one at random (because only one is right). We can calculate the chances of getting the quote in a year most easily by calculating the chances of missing on every attempt: the chances of getting the quote will be 100% minus the chances of missing on every attempt. I need a really amazingly precise calculator to do this because the chances of missing are so close to 100% that most calculators will round it off to 100%. The calculation is as follows.

 probability of missing on one attempt = 1 - 1/(32^41)
 ...of missing for a minute straight = (1 - 1/(32^41)) ^ 60
 ...of missing for an hour straight = ((1 - 1/(32^41)) ^ 60) ^ 60
 ...of missing for a day straight = (((1 - 1/(32^41)) ^ 60) ^ 60) ^ 24
 ...for a year straight = ((((1 - 1/(32^41)) ^ 60) ^ 60) ^ 24) ^ 365

If you have access to Unix, you can calculate this with the dc command, but be warned that it may take quite a while to calculate and annoy other users because the computer is so slow. Use of the nice command is suggested. The syntax, should you care to try, is as follows. Type the dccommand, then type the following lines.

  99k
  1 1 32 41 ^ / - 60 ^ 60 ^ 24 ^ 365 ^
  p

The figure that is eventually printed will be the probability (expressed as a value between zero and one) of our monkey not typing our little phrase from Hamlet in the space of one year’s worth of continuous attempts. The answer that it prints looks like this:

0.999999999999999999999999999999999999999999999999999999386721844366784484760952487499968756116464000

Notice all the nines? Even to fifty or more significant figures, this reads 100%. Okay, so realistically, there is no way that our monkey can do its job in a year. Maybe we should start talking centuries? Millenia? As I understand it, common scientific wisdom suggests that the universe is about 15 billion years old (although they may have revised their dating since I last heard about it). We can easily extend our current figure of one year to count many years. Our calculator will be much faster if we break the calculation down to powers of two and just use the “square” operation, so let’s choose a nice even power of two like 2^34, which is about 17 billion (17,179,869,184 to be precise). The new figure is:

0.999999999999999999999999999999999999999999989463961512816564762914005246488858434168051444149065728

The chances of failure are still essentially 100%, even after 2^34 years. Hmmm. It doesn’t look like were are going to get very far with this, but just for the heck of it, let’s see if we are any better off with a lot of monkeys. Let’s not hold back here — I hypothesize 17 billion galaxies, each containing 17 billion habitable planets, each planet with 17 billion monkeys each typing away and producing one line per second for 17 billion years. What are the chances of the phrase “TO BE OR NOT TO BE, THAT IS THE QUESTION.” not being included in the output?

0.999999999999946575937950778196079485682838665648264132188104299326596142975867879656916416973433628

I’d bet money on that. It’s about 99.999999999995% sure that they would fail to produce the sentence. Are you astounded? It’s such a trivial requirement, right? Just one puny sentence. And yet the figures keep coming up “impossible”. Where have we made a mistake? We have fallen into the same trap as the politician who was the subject of my joke, way back up there. We have failed to appreciate the sheer magnitude of the problem. Let’s look at it one more time.

The number of 41-character strings that are possible with a 32-character alphabet is 32^41. According to dc, this value is as follows.

51422017416287688817342786954917203280710495801049370729644032

In case you don’t feel like counting, this value is 62 digits long. In our hypothesising above, we imagined 17 billion galaxies, each with 17 billion planets, each with 17 billion monkeys, each of which was producing a line of text per second for 17 billion years. How many lines of text did we wind up producing in this experiment? The math is as follows:

2^34 * 2^34 * 2^34 * 2^34 * 365 * 24 * 60 * 60

And the answer is as follows:

2747173049143991138247931294711870033017962496000

Once again, in case you don’t feel like counting, the answer is 49 digits long. Now, there is no guarantee that our monkeys are going to type something different every time, but even if we managed to rig up the experiment so that they never tried the same thing twice, they have still only produced 1/18,718,157,355,362 of the possible alternatives. The denominator in that fraction is 14 digits long, by the way. It’s a figure that’s vastly bigger than anything you would come across in the real world. Is it any wonder, in light of that, that it is so damn hard to get the right answer by accident?

Conclusion

In light of this, I find it impossible to believe that “chance” had anything to do with the process that created life. How can I suppose that Shakespeare himself was the result of a random process when it is quite clearly impossible for even a trivial fragment of his work to have arisen by chance? No sir, I see information all around me, and I conclude that it is the product of a far, far greater intelligence.

Information is the product of intelligence, not chance.