Letter Frequency/Usage

Where can I find a list of frequency and use of letters and letter combinations in the English language?


I know it is new, but there is a web site for searching. It’s called Google and I used it to find the answer to your question in less than 10 seconds.

Google Search “Engine”

Frequency of Letters in English

Scroll down for various answers.

In the good old days of typeset newspapers, you would often find rows of meaningless letters appearing at random; but they would always say:


No doubt someone will put me right but I understood that the typesetting machines weren’t Qwerty but arranged the letters in order of commonness, and if a typesetter accidentally leant onto the keyboard, a slug of the commonest letters was produced.

I am probably talking though my hat entirely. But that is why I know the first 12 most frequently used letters…

I used to be fascinated by codes and ciphers, and quickly learned to try ETAOINS first for ciphers.

You could have guessed Cecil addressed this…


Also, a previous thread on this. http://boards.straightdope.com/sdmb/showthread.php?p=6127767&highlight=ETAOIN+SHRDLU#post6127767

In fact, there’s a short story by Fredric Brown called “Etaoin Shrdlu.” Yup, it’s about an evil Linotype machine.

And there was no shift key; there were separate sections of the keyboard for lowercase and uppercase, and a third section for numbers, punctuation, etc.

as a long time solver of cryptograms, I’m aware of the typical frequencies. (I also know that, if you make your own lists from particular sources, the frequencies change around a little, but not by much). But there’s something I’ve long wondered about.
What are the frequencies of letter usage in other languages? If I were solving a cryptogram in, say, French, what is the most likely order of letters? Or German, or Italian.

For that matter, are there significant differences (larger than the variaions you get in binning letters from different sources) between British English and American English, or with Australian English?

One I’ve wondered, meanwhile… ETAOIN SHRDLU etc. are the most common letters in English usage, but that’s largely because of a few very common words (“T”, for instance, is as high as it is thanks to common words like “the”, “it”, “at”, “to”, “this”, and “that”). But what’s the frequency order for words in the dictionary? That is, giving the same weight to “the” as to “syzygy”?

Wiki table.

With the web2 word list, I get EIAORNTSLCUPMDHYGBFVKWZXQJ. This word list doesn’t include obvious inflected forms, so adding those would probably move S (for example) up the list.

Haven’t you posted this before?
If so, why not link to it, instead of typing it again?

And of course asking questions like this leads to interesting discussions.
Unlike your post.

No, I have not posted it before. Why would I have? And I did answer the question…and was joking about the google thing.

Sorry if I offended you. I’m not sure I even understand if you were talking to me.


I notice that “C” has now surpassed “U,” so we now have “ETAOINSHRDLC.”

Ah, that explains the “R S T L N E” on Wheel of Fortune, then, since the answers are mostly single words in the bonus round.

Professional cryptographers have language-specific frequency distributions for whatever language the target messages are in. Different frequency distributions are also based on the characteristics of the target message as well - military jargon has a different distribution than does Valley Girl speak, etc.

In addition to the simple one-character frequency distributions, frequency distributions are also compiled for digraphs, trigraphs, etc.

It used to be that players would choose five and one for the bonus round, but eventually everybody was picking the same six, so the formality seemed silly. So they began to give you those six, and have you request another three and one. I wondered how long till everyone settled on the same four, but so far it is 20 years+ and counting.

No doubt the advent of the Internet has changed some of the rankings. You’d think W would have moved up in rank a couple notches with all those www’s out there. The C has also likely gained rank with .com which would mean the O should gain some ranking (which would also have .org working in its favor).

Well, the first six are chosen blindly, but the next four are after you’ve seen some letters come up on the board, so theoretically you ought to be basing your choices on what you already have. For instance, if you have an N in the penultimate position, I and G might be good guesses, and T blank vowel or S blank vowel are likely to have an H in between.

Come on, man! Grab a copy of the New York Times and do it the old fashioned way! :smiley: