How would you perform this calculation regarding all potential book combinations

Let’s say you wanted to write every possible 200 page book in existence.

That’s about 50,000 words, maybe 300,000 characters. Excluding capitalization and only using letters, numbers and common punctuation. I’m guessing there are 60 characters, roughly.

So what’s the calculation to determine how many books you’d have?

Is it 60^300,000 or something else?

60^300,000 would give you all the possible character combinations, but to get “number of books”, you would need to exclude all the “unreadable” combinations of letters. 300,000 "a"s in a row may fill 200 pages, but would not be a “book”.

Next somehow you need to filter all those combinations to then get the 50,000 words - which means placement of blanks and punctuation that makes “sentences”.
callmeishmael
callme ishmael
call meishmael
call me ishmael
So you would need to filter by some standard dictionary (and have logic enough to identify names like “Ishmael”).

Then you need to filter those by sentences that make some (semblance) of sense.

The conversion from raw character combinations to words and sentences would be a difficult and complicated process. You would also end up with books containing the exact same words in the exact same order, and being perfectly “understandable”, but having very different meaning just based on the punctuation.

FWIW, what you are describing is known as the Library of Babel.

Randall Munroe discussed something similar to this in this What If? column. Back in 1950, Claude Shannon determined that English transmitted about 1.1 bits of information per letter. So, you could estimate that the 300,000 characters would create about 2^(300,000 * 1.1) meaningfully different books in English. That number has 99,340 digits.

Nitpick: Well over 99.999999% of those books would still have misspelled words or bad grammar. Even among the books which pass that test over 99.99999999% would have fake facts, poorly developed plots, or would otherwise get an F in any English Composition class.

A compact way to represent the Library of Babel (with a 60-character alphabet) would be to just write: “the base-60 expansion of arctan(1).” Admittedly it would be an effort to find exactly where the U.S. Constitution is written in those digits (especially if you insist that commas be misplaced just as in the original), but you’d have a similar search problem using the more conventional Library of Babel.

Representing all books concisely reminds me of the old-timers who only knew a thousand different jokes.

[SPOILER]To save time they memorized and numbered all the jokes. “Number 431.” “Har de har har har. That’s a real whiz-banger of a joke, Billie! Ha ha ha! … Hahh!”

Newcomer shows up and tries to join in the fun. “Number 522.” Dead silence.

“Whassa matter? Isn’t #522 a funny joke?”
“Oh, #522 is a funny enough joke. You just don’t tell it very well.”
[/SPOILER]
Did you like this joke? If so, just say “Number 814” next time to get a good laugh.

Previous experimentsin typing multiple letter combinations to create readable books have not been very successful.

Actually, it’s not known that pi (or pi/4) is normal. There are numbers which are known to be normal, but they generally amount to “List all of the books, in order”.

Some article I saw once discussed making fake text by analyzing thousands of source documents to create an “odds table” of letters - what are the odds that for example, “e” follows “l” or “a” follows “k”? Then they extended it to an odds table for the preceding two letters. If you include spaces and some punctuation, you can make random quasi-English-looking words and sentences that would make Lewis Carrol proud. Perhaps you could filter out any “novel” where the occurrence of spaces was extreme - not enough separate “words” or too-long words? The trouble with randomness is that words are not random, and their association is not random. you could extend the letter logic to instead take all 10,000 commonly used words and come up with a table - what are the odds “red” follows “the”, etc.? But with successive refinements of comprehensibility, you are removing a degree of randomness and reducing the output. And you risk missing the odd novel which quotes foreign language, or onomatopia to describe something, or has totally made-up proper names or novel words (“hobbit”?).

So you are best off saying “any random collection of characters”.

Using an odds table based on three preceding letters (i.e. tetragraph statistics) just now I generated random text with the same stats as Darwin’s Origin of Species. Here’s an excerpt:

(The entropy was about 2 bits per character.)