The Zipf Mystery!

Yes, it works for individual writers as well as when grouping them together. I have not seen data specifically for E. E. Cummings, but it would not be hard to test.

No shit?

https://www.bing.com/videos/search?q=the+wire+shit+scene&view=detail&mid=8C1AFD8E0D22626CB0D58C1AFD8E0D22626CB0D5&FORM=VIRE

Don’t forget “of”. It’s used in “should of”, “could of” and “would of”

One point often overlooked is that the total population in a pure Zipf distribution is
k * (1 + 2[sup]-1[/sup] + 3[sup]-1[/sup] + 4[sup]-1[/sup] + 5[sup]-1[/sup] + …)
which is unbounded. Yet populations (of people in a country, or words in a library) are bounded! So that pure distribution seems flawed. This is avoided by using an exponent slightly larger than 1:
k * (1 + 2[sup]-1.06[/sup] + 3[sup]-1.06[/sup] + 4[sup]-1.06[/sup] + 5[sup]-1.06[/sup] + …)
And indeed the exponent in Zipf’s Law often is slightly larger than 1. (The Wiki article cites 1.07 as the exponent for cities.)

That exponent relates to the largest element. For example, with that (1 + .07) exponent I think you’d expect somewhat more than 7% of the population to reside in the largest city.

Speaking of largest city, Zipf’s Law is a general tendency not a a legislated statute. :smack: One has to think about specifics.

In particular it is quite common for a country to emphasize a particular city, usually its capital. Jakarta has more than thrice the population of Indonesia’s 2nd-largest city. The numbers in Thailand are much more extreme: In distant 2nd- and 3rd-place for city size are … suburbs of Bangkok.

It is silly to reject the general utility of Zipf’s Law by focusing on the special case of a country’s very largest cities.