A visitor from another planet, completely unfamiliar with Earth languages, would naively assume that the words of English are distributed evenly among the letters of the alphabet. Of course, in actuality, they are distributed very unevenly. But given the history of the language and alphabet and the nature of the sounds the letters represent, most of those can be explained. But I can’t figure out why N is so shortchanged.
First to establish that it is shortchanged, lets compare the number of N-words with those starting with M. I can’t think of any reason M should have many more than N. The letter M in my desk dictionary takes pages 744 to 822, which is 79 pages. The letter N is from page 823 to the middle of 854, 31.5 pages. That’s more than double the amount.
Not sure why you’re singling out N. In my dictionary there are 9 letters which are less frequent than N as initial letters, and N is pretty close to F and G:
s 25162
p 24461
c 19901
a 17096
u 16387
t 12966
m 12616
b 11070
d 10896
r 9671
h 9027
i 8799
e 8736
o 7849
g 6861
f 6860
n 6780
l 6284
w 3944
v 3440
k 2281
j 1642
q 1152
z 949
y 671
x 385
A more intriguing question to me is why there are so few animal names that start with N. Narwhal, newt, nutria, nightingale is pretty much it, and none are very common.
There are also a bunch of breeds of dogs that start with N, although most or all are from placenames: Newfoundland, Norfolk terrier, Neapolitan mastiff, e.g., or Norwegian forest cat.
It is strange. N is much less common as an initial letter than as a letter in general. Also its lack of commonness seems anti-Zipf as n is easier to write than m (2 downstrokes instead of 3) and easier to say since your mouth doesn’t have to close all the way and open all the way. Alternative spellings of the sound like “kn”, “gn” aren’t common enough to explain its lack.
Maybe the strong association of “n” with negation (no; non- and similar in other languages like nein, nicht) makes it a poor choice as an initial letter to name something and would lead to misunderstandings in communication if it were used at the beginning of a word.
True, it’s not a perfect relationship, but is a good first order approximation. At any rate, @markn_1 's list (thank you for that) also shows that N has about half as many as M. And that supports my question. I can understand why the other letters have roughly their word counts relative to the other letters, for reasons too numerous to list, but N stands out as being way too low.
BTW, the initial letters do not follow Zipf’s Law, although the total number of letters used does.
I think it will also depend on how we’re counting these words. I see “u” is listed perhaps surprisingly high, and that’s, I suppose, because so many more words can be made with the “un-” prefix. “P” has a lot of "pre-"s and “s” – that one doesn’t seem like it would have all that much extra added. “Semi-” perhaps. I’d be interested in also seeing a ranking based on root words without prefixes.
Right. In my dictionary, most of the non- words are in a list with no definition or pronunciation. It takes up over a page and a half. There’s a couple similar lists in the M’s for mis- and multi-, but they’re much shorter. At any rate, that list compresses the N pages so that N looks even more shortchanged than otherwise, but it doesn’t really change my thesis.