Why are there so few words starting with N?

Dr.Strangelove · April 24, 2021, 11:16pm

I guess it sometimes works the other way, but the “a n-vowel-word” conversion into “an vowel-word” seems more common. A few examples beyond napron:
a naddre => an addre => an adder
a nauger => an auger
an eke name => a neke name => a nickname
a nought => an ought

LSLGuy · April 24, 2021, 11:38pm

Forget thee not the illustrious nudibranch.

Monty · April 25, 2021, 5:20am

Why would said visitor assume such a thing?

dtilque · April 25, 2021, 5:38am

It’s just a way of saying it’s the naive assumption. Don’t take it so literally.

ALOHA_HATER · April 25, 2021, 7:36am

Mmm is the easiest sound to make. That’s why your mama is called mama.

Mangetout · April 25, 2021, 7:53am

It might be sensible to expect a uniform distribution of letter usage if the language had been designed from scratch with that objective in mind, but it wasn’t - English has loads of imported words (that have arrived in big clumps from other places at different points in history, as well as shifts where an influx of people into the speaking culture will have modified the sounds of existing words (such as replacing an initial N sound with an M sound, or vice versa, or something else.
Different languages have different sets of vocal sounds for their words (which means when a word crosses a boundary, it either has to bring a foreign sound with it, or it gets coerced into the existing set of sounds in the target language - with the latter case being more probable, I think.

Mijin · April 25, 2021, 10:31am

Also ‘a norange’ => ‘an orange’

I think this is basically the answer to the OP.
N-starting words in English tend to lose their N because our indefinite article is “a / an”

Mangetout · April 25, 2021, 10:44am

The norange thing is apparently a bit of a myth, based on the assumption that it was rebracketed like napron, nadder, newt and nickname, but apparently it lost its N before it reached the English language

Mijin · April 25, 2021, 11:37am

Ah, right you are. I saw it on QI, and it made intuitive sense to me, since I knew “naranja” is orange in Spanish. On googling, it does seem to be a myth so, thanks, ignorance fought.

Those same cites mention nuncle => ‘an uncle’ as an example, though that one might also be arguable, as at least one cite implies that uncle and nuncle do not have exactly the same meaning.

Exapno_Mapcase · April 25, 2021, 2:34pm

That’s a great lecture, though he drifts off into too much philosophical explanation at the end. I’ll be hunting down more of the VSauce stuff.

It’s not a naive assumption, it’s the totally ignorant one that any intelligent species would know to be wrong. To find out why, watch the youtube video that Mangetout posted.

blabbermeister · April 25, 2021, 3:08pm

You got me. (But your nanny is called nanny, that’s pretty close )

running_coach · April 25, 2021, 3:11pm

Or nana.

markn_1 · April 25, 2021, 4:15pm

I did a little more analysis to compare the each letter’s appearances as an initial letter versus the total appearances of the letter. In this table, the second column is the percentage of total appearances of the letter, the third column is the appearances of the letter as an initial, and the last column is the ratio between the second and third column. So the less the letter appears as an initial compared to its total occurence, the higher the last column will be. The list is sorted on the last column, so it is in order of the least frequently appearing initial letters. N is third on the list, behind E and Y. It turns out that Y is the real outlier. I think this is largely driven by the large number of words that END with Y. Almost 13% of the words in my dictionary end with Y. However L, X, O and I are not far behind N as infrequent initials. (BTW in this table I removed proper nouns, so the numbers are a little different than my first table posted above.)

y   2.12%   0.25%   8.4102
e   9.57%   3.71%   2.5784
n   6.39%   2.89%   2.2067
l   5.31%   2.47%   2.1486
x   0.28%   0.14%   2.0295
o   6.93%   3.43%   2.0212
i   7.96%   3.94%   2.0202
r   6.55%   4.25%   1.5415
t   6.27%   5.41%   1.1593
a   7.49%   6.90%   1.0862
z   0.33%   0.34%   0.9534
k   0.64%   0.82%   0.7734
g   1.93%   2.77%   0.6943
h   2.50%   3.74%   0.6686
d   2.72%   4.72%   0.5756
v   0.84%   1.46%   0.5755
m   2.79%   5.08%   0.5495
s   5.63%  10.80%   0.5212
c   4.14%   8.26%   0.5017
u   3.56%   7.68%   0.4634
b   1.64%   4.59%   0.3570
f   1.03%   3.03%   0.3390
w   0.58%   1.71%   0.3378
p   3.17%  10.52%   0.3015
q   0.15%   0.51%   0.2990
j   0.11%   0.55%   0.2027

Thudlow_Boink · April 25, 2021, 4:24pm

That’s a whole nother thing.

dtilque · April 25, 2021, 7:06pm

Yes, but your naive alien has no knowledge of that history of English. They’re making an initial and extremely simple assumption. Slightly less simple would be to assume that the distribution of initial letters is the same as the distribution of letters in either general text or in all dictionary words. Any of these three assumptions would turn out to be wrong on further examination, but that’s usually the case for naive assumptions.

Are you naively assuming that the Zipf relationship applies to all alien species? It doesn’t even apply to everything we do. In specific, it doesn’t apply to initial letters of words, although I expect it does to all letters in general text. (I haven’t looked to make sure it does, so could be wrong here.)

Yes, but Y being a primarily end-letter is understandable based on historical changes of English. N being an outlier is not so easily understood.

Exapno_Mapcase · April 25, 2021, 9:06pm

It’s not just Zipf’s Law, it’s the Pareto Principle, which I absolutely expect to be universal. There are no universal rules that I’m aware of that predict natural distributions to be evenly scattered. Any advanced alien culture would surely be aware of this. The naive assumption is always that stuff obeys known statistical laws until proven otherwise.

dtilque · April 26, 2021, 6:05am

Maybe it’s just me, but I would expect that there’s a one-to-one correspondence between the phonemes of an unknown alphabetic language and the letters in its alphabet. And since I have no knowledge of which, if any, sounds are favored, I’d initially assume there’s an equal chance of any of those sounds being the first in a random word. Which means my naive assumption is an even distribution of the words when sorted by initial letter.

Exapno_Mapcase · April 26, 2021, 2:41pm

You’re using “naive assumption” as a synonym for not knowing anything at all. But those with any experience in a subject do have preconceptions, and scientists will have preconceptions based on science. They will not expect to encounter magic or time flowing backward on even an unknown planet. They will be surprised if the planets’ orbits don’t conform to a plane around the sun. They will be certain that some local enforced rule must be in place if every city has exactly the same population. They will assume genetic manipulation if every member of a species is exactly the same size.

Thousands of years of manipulation was needed before dice or coins could be manufactured so precisely that using them in games to produce random outcomes might be possible.

Randomness and equality never go together in finite structures. That’s the naive assumption.

dtilque · April 26, 2021, 3:35pm

Since this is a side issue and not one that I really am interested in, I’m going to drop the argument. I’ll concede you’re right.

But I noticed that the distribution of English words by first letter does not follow either Zipf Law nor the Pareto Principle. That can perhaps be due to English phonemes not having anything like a one-to-one relationship to the letters. However, there are languages that have a much better phoneme-letter correspondence. Does anyone have a word list from such a language where they can do a count of similar to what markn_1 did above? It’d be interesting to see if those languages do fall into either Zipf or Pareto?

Little_Nemo · April 26, 2021, 3:39pm

They needed two volumes for “big blue wobbly thing that mermaids live in”?

Topic		Replies	Views
Why are there so few English words ending with "v"? Factual Questions	44	6806	December 12, 2016
Use the NATO phonetic alphabet or make up your own? Miscellaneous and Personal Stuff I Must Share	58	2179	June 7, 2020
Why is the alphabet in alphabetical order? Cecil's Columns/Staff Reports	30	7415	November 14, 2012
What's the deal with letter 'J'? Factual Questions	30	3894	May 30, 2009
Why is African-scheme spam poorly written? Factual Questions	45	6420	September 30, 2006

Why are there so few words starting with N?

Related topics