Why are there so few words starting with N?

I guess it sometimes works the other way, but the “a n-vowel-word” conversion into “an vowel-word” seems more common. A few examples beyond napron:
a naddre => an addre => an adder
a nauger => an auger
an eke name => a neke name => a nickname
a nought => an ought

Forget thee not the illustrious nudibranch.

Why would said visitor assume such a thing?

It’s just a way of saying it’s the naive assumption. Don’t take it so literally.

Mmm is the easiest sound to make. That’s why your mama is called mama.

It might be sensible to expect a uniform distribution of letter usage if the language had been designed from scratch with that objective in mind, but it wasn’t - English has loads of imported words (that have arrived in big clumps from other places at different points in history, as well as shifts where an influx of people into the speaking culture will have modified the sounds of existing words (such as replacing an initial N sound with an M sound, or vice versa, or something else.
Different languages have different sets of vocal sounds for their words (which means when a word crosses a boundary, it either has to bring a foreign sound with it, or it gets coerced into the existing set of sounds in the target language - with the latter case being more probable, I think.

Also ‘a norange’ => ‘an orange’

I think this is basically the answer to the OP.
N-starting words in English tend to lose their N because our indefinite article is “a / an”

The norange thing is apparently a bit of a myth, based on the assumption that it was rebracketed like napron, nadder, newt and nickname, but apparently it lost its N before it reached the English language

Ah, right you are. I saw it on QI, and it made intuitive sense to me, since I knew “naranja” is orange in Spanish. On googling, it does seem to be a myth so, thanks, ignorance fought.

Those same cites mention nuncle => ‘an uncle’ as an example, though that one might also be arguable, as at least one cite implies that uncle and nuncle do not have exactly the same meaning.

That’s a great lecture, though he drifts off into too much philosophical explanation at the end. I’ll be hunting down more of the VSauce stuff.

It’s not a naive assumption, it’s the totally ignorant one that any intelligent species would know to be wrong. To find out why, watch the youtube video that Mangetout posted.

You got me. (But your nanny is called nanny, that’s pretty close :slight_smile: )

Or nana.

I did a little more analysis to compare the each letter’s appearances as an initial letter versus the total appearances of the letter. In this table, the second column is the percentage of total appearances of the letter, the third column is the appearances of the letter as an initial, and the last column is the ratio between the second and third column. So the less the letter appears as an initial compared to its total occurence, the higher the last column will be. The list is sorted on the last column, so it is in order of the least frequently appearing initial letters. N is third on the list, behind E and Y. It turns out that Y is the real outlier. I think this is largely driven by the large number of words that END with Y. Almost 13% of the words in my dictionary end with Y. However L, X, O and I are not far behind N as infrequent initials. (BTW in this table I removed proper nouns, so the numbers are a little different than my first table posted above.)

y   2.12%   0.25%   8.4102
e   9.57%   3.71%   2.5784
n   6.39%   2.89%   2.2067
l   5.31%   2.47%   2.1486
x   0.28%   0.14%   2.0295
o   6.93%   3.43%   2.0212
i   7.96%   3.94%   2.0202
r   6.55%   4.25%   1.5415
t   6.27%   5.41%   1.1593
a   7.49%   6.90%   1.0862
z   0.33%   0.34%   0.9534
k   0.64%   0.82%   0.7734
g   1.93%   2.77%   0.6943
h   2.50%   3.74%   0.6686
d   2.72%   4.72%   0.5756
v   0.84%   1.46%   0.5755
m   2.79%   5.08%   0.5495
s   5.63%  10.80%   0.5212
c   4.14%   8.26%   0.5017
u   3.56%   7.68%   0.4634
b   1.64%   4.59%   0.3570
f   1.03%   3.03%   0.3390
w   0.58%   1.71%   0.3378
p   3.17%  10.52%   0.3015
q   0.15%   0.51%   0.2990
j   0.11%   0.55%   0.2027

That’s a whole nother thing.

Yes, but your naive alien has no knowledge of that history of English. They’re making an initial and extremely simple assumption. Slightly less simple would be to assume that the distribution of initial letters is the same as the distribution of letters in either general text or in all dictionary words. Any of these three assumptions would turn out to be wrong on further examination, but that’s usually the case for naive assumptions.

Are you naively assuming that the Zipf relationship applies to all alien species? It doesn’t even apply to everything we do. In specific, it doesn’t apply to initial letters of words, although I expect it does to all letters in general text. (I haven’t looked to make sure it does, so could be wrong here.)

Yes, but Y being a primarily end-letter is understandable based on historical changes of English. N being an outlier is not so easily understood.

It’s not just Zipf’s Law, it’s the Pareto Principle, which I absolutely expect to be universal. There are no universal rules that I’m aware of that predict natural distributions to be evenly scattered. Any advanced alien culture would surely be aware of this. The naive assumption is always that stuff obeys known statistical laws until proven otherwise.

Maybe it’s just me, but I would expect that there’s a one-to-one correspondence between the phonemes of an unknown alphabetic language and the letters in its alphabet. And since I have no knowledge of which, if any, sounds are favored, I’d initially assume there’s an equal chance of any of those sounds being the first in a random word. Which means my naive assumption is an even distribution of the words when sorted by initial letter.

You’re using “naive assumption” as a synonym for not knowing anything at all. But those with any experience in a subject do have preconceptions, and scientists will have preconceptions based on science. They will not expect to encounter magic or time flowing backward on even an unknown planet. They will be surprised if the planets’ orbits don’t conform to a plane around the sun. They will be certain that some local enforced rule must be in place if every city has exactly the same population. They will assume genetic manipulation if every member of a species is exactly the same size.

Thousands of years of manipulation was needed before dice or coins could be manufactured so precisely that using them in games to produce random outcomes might be possible.

Randomness and equality never go together in finite structures. That’s the naive assumption.

Since this is a side issue and not one that I really am interested in, I’m going to drop the argument. I’ll concede you’re right.

But I noticed that the distribution of English words by first letter does not follow either Zipf Law nor the Pareto Principle. That can perhaps be due to English phonemes not having anything like a one-to-one relationship to the letters. However, there are languages that have a much better phoneme-letter correspondence. Does anyone have a word list from such a language where they can do a count of similar to what markn_1 did above? It’d be interesting to see if those languages do fall into either Zipf or Pareto?

They needed two volumes for “big blue wobbly thing that mermaids live in”?