How would you perform this calculation regarding all potential book combinations

Wesley_Clark · February 26, 2020, 9:58pm

Let’s say you wanted to write every possible 200 page book in existence.

That’s about 50,000 words, maybe 300,000 characters. Excluding capitalization and only using letters, numbers and common punctuation. I’m guessing there are 60 characters, roughly.

So what’s the calculation to determine how many books you’d have?

Is it 60^300,000 or something else?

cormac262 · February 26, 2020, 10:14pm

60^300,000 would give you all the possible character combinations, but to get “number of books”, you would need to exclude all the “unreadable” combinations of letters. 300,000 "a"s in a row may fill 200 pages, but would not be a “book”.

Next somehow you need to filter all those combinations to then get the 50,000 words - which means placement of blanks and punctuation that makes “sentences”.
callmeishmael
callme ishmael
call meishmael
call me ishmael
So you would need to filter by some standard dictionary (and have logic enough to identify names like “Ishmael”).

Then you need to filter those by sentences that make some (semblance) of sense.

The conversion from raw character combinations to words and sentences would be a difficult and complicated process. You would also end up with books containing the exact same words in the exact same order, and being perfectly “understandable”, but having very different meaning just based on the punctuation.

Darren_Garrison · February 26, 2020, 10:40pm

FWIW, what you are describing is known as the Library of Babel.

Kimble · February 27, 2020, 10:33am

cormac262:

60^300,000 would give you all the possible character combinations, but to get “number of books”, you would need to exclude all the “unreadable” combinations of letters. 300,000 "a"s in a row may fill 200 pages, but would not be a “book”.

Next somehow you need to filter all those combinations to then get the 50,000 words - which means placement of blanks and punctuation that makes “sentences”.
callmeishmael
callme ishmael
call meishmael
call me ishmael
So you would need to filter by some standard dictionary (and have logic enough to identify names like “Ishmael”).

Then you need to filter those by sentences that make some (semblance) of sense.

The conversion from raw character combinations to words and sentences would be a difficult and complicated process. You would also end up with books containing the exact same words in the exact same order, and being perfectly “understandable”, but having very different meaning just based on the punctuation.

Randall Munroe discussed something similar to this in this What If? column. Back in 1950, Claude Shannon determined that English transmitted about 1.1 bits of information per letter. So, you could estimate that the 300,000 characters would create about 2^(300,000 * 1.1) meaningfully different books in English. That number has 99,340 digits.

septimus · February 27, 2020, 12:10pm

Nitpick: Well over 99.999999% of those books would still have misspelled words or bad grammar. Even among the books which pass that test over 99.99999999% would have fake facts, poorly developed plots, or would otherwise get an F in any English Composition class.

A compact way to represent the Library of Babel (with a 60-character alphabet) would be to just write: “the base-60 expansion of arctan(1).” Admittedly it would be an effort to find exactly where the U.S. Constitution is written in those digits (especially if you insist that commas be misplaced just as in the original), but you’d have a similar search problem using the more conventional Library of Babel.

Representing all books concisely reminds me of the old-timers who only knew a thousand different jokes.

[SPOILER]To save time they memorized and numbered all the jokes. “Number 431.” “Har de har har har. That’s a real whiz-banger of a joke, Billie! Ha ha ha! … Hahh!”

Newcomer shows up and tries to join in the fun. “Number 522.” Dead silence.

“Whassa matter? Isn’t #522 a funny joke?”
“Oh, #522 is a funny enough joke. You just don’t tell it very well.”
[/SPOILER]
Did you like this joke? If so, just say “Number 814” next time to get a good laugh.

Banksiaman · February 28, 2020, 9:26am

Previous experimentsin typing multiple letter combinations to create readable books have not been very successful.

Chronos · February 28, 2020, 3:32pm

Actually, it’s not known that pi (or pi/4) is normal. There are numbers which are known to be normal, but they generally amount to “List all of the books, in order”.

md2000 · February 28, 2020, 6:50pm

Some article I saw once discussed making fake text by analyzing thousands of source documents to create an “odds table” of letters - what are the odds that for example, “e” follows “l” or “a” follows “k”? Then they extended it to an odds table for the preceding two letters. If you include spaces and some punctuation, you can make random quasi-English-looking words and sentences that would make Lewis Carrol proud. Perhaps you could filter out any “novel” where the occurrence of spaces was extreme - not enough separate “words” or too-long words? The trouble with randomness is that words are not random, and their association is not random. you could extend the letter logic to instead take all 10,000 commonly used words and come up with a table - what are the odds “red” follows “the”, etc.? But with successive refinements of comprehensibility, you are removing a degree of randomness and reducing the output. And you risk missing the odd novel which quotes foreign language, or onomatopia to describe something, or has totally made-up proper names or novel words (“hobbit”?).

So you are best off saying “any random collection of characters”.

septimus · February 28, 2020, 7:00pm

Using an odds table based on three preceding letters (i.e. tetragraph statistics) just now I generated random text with the same stats as Darwin’s Origin of Species. Here’s an excerpt:

which gradaptility of gras anothe of the led that to the degreason gardly
have
eminatural severa are to charact of species be legs of hight fine same; anot a climily be procend some moder afterst noticultant adaption, years has
instably different somes heave detely sal; and the have surrelaterier, but
fathe pland we could perictly under jaws of that of migrangement, have facesservate having
has in this degreat of then would from are the spack at the more not bees, which
two
in a size of so-call
songe of
growth. As F; justries, but that not the in is in and to play, one of huntructionall the see my visinglistitudescent as having and like improportainstimall headinature. Nor fresember to becommongard, that should by surrese of ineve grough well-gland or repland less will regious new liever slight, the enomachecked by to the one; fore for been of the Glace of
divilition to and
inhabits confinitions of the can allief fere on the
production, to cerous species of the in Daws
not at is lended; as largumined: but
wellighbouth the lants of name generium parts; an
acted in due two belose or in thosediated, large naterbalapable ord the one specked on more save, and In that select: which it damplex
reasonsiders;
contince tincreason of time are ferespecies showere could cur, I am could in a greathe pare in the up a case of butely

(The entropy was about 2 bits per character.)

Topic		Replies	Views
My Super-Giant Library of Everything (or Things That Blew Your Mind When You Were Young and Stupid) Miscellaneous and Personal Stuff I Must Share	6	1630	January 23, 2013
To Monkey, or not to Monkey... Factual Questions	11	1429	October 3, 2009
Mathmen: Calculate the odds...(RE Bible Code) Factual Questions	39	2531	September 13, 2009
Monkeys Shakespeare and Should You Do It Great Debates	40	1727	October 14, 2005
Predicting the Future-Would This work? Factual Questions	29	2004	March 26, 2008

How would you perform this calculation regarding all potential book combinations

Related topics