next up previous
Next: Twentieth Century: War! Up: Early History of Cryptology Previous: Mayan Hieroglyphics

Nineteenth Century: Statistics

The additive cipher and many other simple cryptosystems are subject to an analysis based on statistics. In fact, cryptography was one of the great motivations for the development of the science of statistics, and at the same time is one of the crowning achievements of statistics.

The simple fact of the matter is that English (as well as any other human language) is a very non-random language. Over the course of any substantially large English text, the letters appear with a very predictable frequency. Here is one measurement of the relative frequencies of the letters in English text.

letter frequency (%) letter frequency (%)
a 8.167 n 6.749
b 1.492 o 7.507
c 2.782 p 1.929
d 4.253 q 0.095
e 12.702 r 5.987
f 2.228 s 6.327
g 2.015 t 9.056
h 6.094 u 2.758
i 6.966 v 0.978
j 0.153 w 2.360
k 0.772 x 0.150
l 4.025 y 1.974
m 2.406 z 0.074
So ``e'' is the most commonly occurring letter, with a frequency of 12.7%. Anyone who plays Scrabble probably already has an idea of these frequencies. There is also statistical information on the frequencies of two-letter groups (called bigrams) and three-letter groups (trigrams). The most commonly occurring bigram is ``he''; the most frequent trigrams are ``the'' and ``and.''

If we suspect a monoalphabetic cipher (where each letter of plaintext is represented by a single distinct letter of ciphertext), and if we have a large enough piece of ciphertext, we can compute the frequencies of the letters in the ciphertext and match them up according to the table. If an additive cipher were used, just identifying ``e'' would give the key. Try it out!

Virtually all of the cryptosystems proposed in the nineteenth century proved to be susceptible to increasingly sophisticated statistical attacks.

Amazingly, there are works of literature that deliberately avoid certain letters. (La Disparition, George Perec, avoids ``e.'')


next up previous
Next: Twentieth Century: War! Up: Early History of Cryptology Previous: Mayan Hieroglyphics
David J. Wright
1999-11-19