Reply
 
Thread Tools Display Modes
  #1  
Old 02-26-2020, 03:58 PM
Wesley Clark's Avatar
Wesley Clark is offline
Guest
 
Join Date: Aug 2003
Posts: 23,466

How would you perform this calculation regarding all potential book combinations


Let's say you wanted to write every possible 200 page book in existence.

That's about 50,000 words, maybe 300,000 characters. Excluding capitalization and only using letters, numbers and common punctuation. I'm guessing there are 60 characters, roughly.

So what's the calculation to determine how many books you'd have?

Is it 60^300,000 or something else?
__________________
Sometimes I doubt your commitment to sparkle motion
  #2  
Old 02-26-2020, 04:14 PM
cormac262 is offline
Member
 
Join Date: Feb 2003
Location: San Diego, CA
Posts: 1,394
60^300,000 would give you all the possible character combinations, but to get "number of books", you would need to exclude all the "unreadable" combinations of letters. 300,000 "a"s in a row may fill 200 pages, but would not be a "book".

Next somehow you need to filter all those combinations to then get the 50,000 words - which means placement of blanks and punctuation that makes "sentences".
callmeishmael
callme ishmael
call meishmael
call me ishmael
So you would need to filter by some standard dictionary (and have logic enough to identify names like "Ishmael").

Then you need to filter those by sentences that make some (semblance) of sense.

The conversion from raw character combinations to words and sentences would be a difficult and complicated process. You would also end up with books containing the exact same words in the exact same order, and being perfectly "understandable", but having very different meaning just based on the punctuation.
  #3  
Old 02-26-2020, 04:40 PM
Darren Garrison's Avatar
Darren Garrison is offline
Guest
 
Join Date: Oct 2016
Posts: 13,330
FWIW, what you are describing is known as the Library of Babel.
  #4  
Old 02-27-2020, 04:33 AM
Kimble's Avatar
Kimble is offline
Guest
 
Join Date: Jun 2001
Location: Nashville, TN
Posts: 541
Quote:
Originally Posted by cormac262 View Post
60^300,000 would give you all the possible character combinations, but to get "number of books", you would need to exclude all the "unreadable" combinations of letters. 300,000 "a"s in a row may fill 200 pages, but would not be a "book".

Next somehow you need to filter all those combinations to then get the 50,000 words - which means placement of blanks and punctuation that makes "sentences".
callmeishmael
callme ishmael
call meishmael
call me ishmael
So you would need to filter by some standard dictionary (and have logic enough to identify names like "Ishmael").

Then you need to filter those by sentences that make some (semblance) of sense.

The conversion from raw character combinations to words and sentences would be a difficult and complicated process. You would also end up with books containing the exact same words in the exact same order, and being perfectly "understandable", but having very different meaning just based on the punctuation.
Randall Munroe discussed something similar to this in this What If? column. Back in 1950, Claude Shannon determined that English transmitted about 1.1 bits of information per letter. So, you could estimate that the 300,000 characters would create about 2^(300,000 * 1.1) meaningfully different books in English. That number has 99,340 digits.
  #5  
Old 02-27-2020, 06:10 AM
septimus's Avatar
septimus is offline
Guest
 
Join Date: Dec 2009
Location: the Land of Smiles
Posts: 21,132
Quote:
Originally Posted by Kimble View Post
Randall Munroe discussed something similar to this in this What If? column. Back in 1950, Claude Shannon determined that English transmitted about 1.1 bits of information per letter. So, you could estimate that the 300,000 characters would create about 2^(300,000 * 1.1) meaningfully different books in English. That number has 99,340 digits.
Nitpick: Well over 99.999999% of those books would still have misspelled words or bad grammar. Even among the books which pass that test over 99.99999999% would have fake facts, poorly developed plots, or would otherwise get an F in any English Composition class.

A compact way to represent the Library of Babel (with a 60-character alphabet) would be to just write: "the base-60 expansion of arctan(1)." Admittedly it would be an effort to find exactly where the U.S. Constitution is written in those digits (especially if you insist that commas be misplaced just as in the original), but you'd have a similar search problem using the more conventional Library of Babel.

Representing all books concisely reminds me of the old-timers who only knew a thousand different jokes.
SPOILER:
To save time they memorized and numbered all the jokes. "Number 431." "Har de har har har. That's a real whiz-banger of a joke, Billie! Ha ha ha! ... Hahh!"

Newcomer shows up and tries to join in the fun. "Number 522." Dead silence.

"Whassa matter? Isn't #522 a funny joke?"
"Oh, #522 is a funny enough joke. You just don't tell it very well."

Did you like this joke? If so, just say "Number 814" next time to get a good laugh.
  #6  
Old 02-28-2020, 03:26 AM
Banksiaman is offline
Guest
 
Join Date: Apr 2012
Location: Straya
Posts: 1,274
Previous experiments in typing multiple letter combinations to create readable books have not been very successful.
  #7  
Old 02-28-2020, 09:32 AM
Chronos's Avatar
Chronos is online now
Charter Member
Moderator
 
Join Date: Jan 2000
Location: The Land of Cleves
Posts: 87,418
Actually, it's not known that pi (or pi/4) is normal. There are numbers which are known to be normal, but they generally amount to "List all of the books, in order".
  #8  
Old 02-28-2020, 12:50 PM
md2000 is offline
Guest
 
Join Date: Feb 2009
Posts: 15,485
Some article I saw once discussed making fake text by analyzing thousands of source documents to create an "odds table" of letters - what are the odds that for example, "e" follows "l" or "a" follows "k"? Then they extended it to an odds table for the preceding two letters. If you include spaces and some punctuation, you can make random quasi-English-looking words and sentences that would make Lewis Carrol proud. Perhaps you could filter out any "novel" where the occurrence of spaces was extreme - not enough separate "words" or too-long words? The trouble with randomness is that words are not random, and their association is not random. you could extend the letter logic to instead take all 10,000 commonly used words and come up with a table - what are the odds "red" follows "the", etc.? But with successive refinements of comprehensibility, you are removing a degree of randomness and reducing the output. And you risk missing the odd novel which quotes foreign language, or onomatopia to describe something, or has totally made-up proper names or novel words ("hobbit"?).

So you are best off saying "any random collection of characters".
  #9  
Old 02-28-2020, 01:00 PM
septimus's Avatar
septimus is offline
Guest
 
Join Date: Dec 2009
Location: the Land of Smiles
Posts: 21,132
Quote:
Originally Posted by md2000 View Post
Some article I saw once discussed making fake text by analyzing thousands of source documents to create an "odds table" of letters - what are the odds that for example, "e" follows "l" or "a" follows "k"? Then they extended it to an odds table for the preceding two letters. If you include spaces and some punctuation, you can make random quasi-English-looking words and sentences that would make Lewis Carrol proud....
Using an odds table based on three preceding letters (i.e. tetragraph statistics) just now I generated random text with the same stats as Darwin's Origin of Species. Here's an excerpt:
Quote:
which gradaptility of gras anothe of the led that to the degreason gardly
have
eminatural severa are to charact of species be legs of hight fine same; anot a climily be procend some moder afterst noticultant adaption, years has
instably different somes heave detely sal; and the have surrelaterier, but
fathe pland we could perictly under jaws of that of migrangement, have facesservate having
has in this degreat of then would from are the spack at the more not bees, which
two
in a size of so-call
songe of
growth. As F; justries, but that not the in is in and to play, one of huntructionall the see my visinglistitudescent as having and like improportainstimall headinature. Nor fresember to becommongard, that should by surrese of ineve grough well-gland or repland less will regious new liever slight, the enomachecked by to the one; fore for been of the Glace of
divilition to and
inhabits confinitions of the can allief fere on the
production, to cerous species of the in Daws
not at is lended; as largumined: but
wellighbouth the lants of name generium parts; an
acted in due two belose or in thosediated, large naterbalapable ord the one specked on more save, and In that select: which it damplex
reasonsiders;
contince tincreason of time are ferespecies showere could cur, I am could in a greathe pare in the up a case of butely
(The entropy was about 2 bits per character.)
Reply

Bookmarks

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Forum Jump


All times are GMT -5. The time now is 08:12 PM.

Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2020, vBulletin Solutions, Inc.

Send questions for Cecil Adams to: cecil@straightdope.com

Send comments about this website to: webmaster@straightdope.com

Terms of Use / Privacy Policy

Advertise on the Straight Dope!
(Your direct line to thousands of the smartest, hippest people on the planet, plus a few total dipsticks.)

Copyright 2019 STM Reader, LLC.

 
Copyright © 2017