Is this an accurate factoid about pi?

Digressing a bit into an amusement I’m fond of, there is a kind of digit-sequence that has a sort of maximal super-normality (whose consequences ARE observable in finite time): sequences where, at any moment, the number of occurrences so far of any two strings of the same length differs by at most 1 (e.g., among other things, if there have at any point been 3 "CAB"s so far, there have been either 2 or 3 or 4 "DAD"s).

A jargon term for such a thing is an “infinite de Bruijn sequence” and, interestingly, such sequences exist in every base EXCEPT base 2. The decimal digits of π, however, manifestly don’t form such a sequence (at 3.141, say, we’ve already gotten two instances of 1 and no instances of 2).

Whoops, rather than what I just said, what I actually meant is that an infinite de Bruijn sequence is one such that, for any two strings of the same length, there do not exist two instances of one before there are any of the other; I’m not sure this implies more generally that the counts never later on differs by more than one. Well, you may enjoy thinking about this (and regardless, I will, on my own time…).

Hmm, neat. Let’s try binary, since you said that wasn’t possible:

0: No loss of generality to start here
01: Forced (can’t have a second 0 before 1).
011: Try this first.
0110: Forced (can’t have a second 11 before 00).
01100: Forced (can’t have a second 01 before 00).
011001: Another guess.
0110010: Forced (can’t have a second 011 before 000).
01100101: Forced (can’t have a second 100 before 000).
But now we’re stuck, because we need a 111 but can only produce a second 011 or 010.

It’s late, so I’ll try the other paths later. Something to do in boring meetings tomorrow… then, base 3 :).

Somehow, the infinite deBruijn sequence thing reminds me of four-coloring maps. You pick the first few colors arbitrarily, and then eventually get to a point where most of your choices are forced.

Which is to say in a hand-wavy sense that indeterminism “converges” to determinism in the course of traversing the decision space. I hadn’t really thought of those classes of problems in those terms before. Hmmm. Thanks for that.

Very meta. :slight_smile:

Yeah, and since the first few colors are equivalent under permutation anyway, there end up being only a very small number of final patterns.

The other de Bruijn binary sequences turned out to be trivial–there are really only three distinct ones:
0100
01100101
011000101

I guess that proves that. Now to see if I can come up with an infinite pattern that works in base-3…

… BIGGER datebase …

Incorrect, because the continuation is 3.141592654, so, if you express pi to 6 decimal digits with proper rounding, it spells 3.LSLGUA

I wrote a horrid little script to just brute-force de Bruijn sequences. It just runs through the digits in sequence, backtracking if necessary. I stopped it at 400 digits since it’s really slow, but you can definitely see that the higher digits are used less than the lower ones. Not terribly surprising since it only uses a higher digit when it absolutely needs to; it can frequently get away with a smaller one.

Clearly, the first rule that Indistinguishable mentioned would produce normal sequences; these are just disjunctive. At least they seem like it. I wonder if the statistics converge to some value. I’ll bet they do, because the influence of the substring criterion goes down rapidly as the strings get longer, because there exponentially more possibilities but only linearly more chances of a collision. At a certain point, you barely have to check for a collision because the odds go down so rapidly.

The histograms look a bit weird to me. It’s not a curve that I’ve really seen before, like an exponential or Pareto distribution. Instead, it has an early drop, and then levels off for a bit, and then has an extra dip near the end. I dunno, maybe just an artifact of the limited samples, but it seems strange.

I also collected some stats on the number of places between each instance of the largest digit (value n-1 for base n). Just eyeballing things, most spacing is approximately N (what you would expect if it were random), but there are huge outliers. Base 8, for instance:
7, 12, 10, 8, 6, 4, 2, 0, 98, 2, 8, 8, 8, 8, 5, 2, 2, 2, 2, 2, 56, 8, 8, 8, 8, 5, 2, 2, 2, 2, 34, 8, 8, 8, 5, 2, 2

Low numbers are common, but there’s a 98, 56, and 34 in there. Weird.

[spoiler]base 3:
01200221100010111210202122201001101021002000021111022001211201122020122121202222101201010000011100100210101210002010200102210012020020202201100220210211011020111112000122000222002120011202112110212101121211101222111221022212012122112222202212201200000020011000010001102101100111100021000022000012001001010102010002210100121010220101101110100201101202100102020001021110020210121100122100012120002022102000
histo:
0: 162
1: 131
2: 111

base 4:
01230020311332210001011021112003013022023103212131222323303331320100111002101020000221101201121022200121122012212020131003101321030003200230103120302021223021300133003301130303131103310232022311113133112312123211322032222132303232233123320332133302333323130120001100000120210010002001021020101002011010121002200021100122002020021200112010220103010030010320111101112110202203002131010330000302103100031
histo:
0: 132
1: 104
2: 91
3: 74

base 5:
01234002030411314224433210001011021112003103201300401402202302403303410420431141212213222312413313421432324233343440441444243010011100210102000022102220012010300030112103110121103310032112201130122211320023101310232020124000410042104001321202121302022300141014200430024102420143021311113300332031203220333013322123014400340104120402041303022401140044013402140303130403140404141104320422044202431044303
histo:
0: 112
1: 87
2: 77
3: 66
4: 59

base 6:
012345002030405113141522425335544321000101102111200310320130041042022043014005015023024025033034035044045105205305411421213221431151222312412513313413514414521531542252323324333423524424532543435344454550551552555354010011100210102000022102220012010300030104000401121031101211033100321122011301140122113200231013102320201240013212021223001410042104110431044101420043203120322023300241024201430033204020
histo:
0: 100
1: 82
2: 74
3: 58
4: 52
5: 36

base 7:
01234560020304050611314151622425263353644665543210001011021112003103201300410420220430140051052053023033054015006016024025026034035036044045046055056106206306406511421213221431152153122231331541161241251261341351361441451461551562163164165225323243325422623334235236244245246255256326426533634344354445346355356436545464555656606616626636664650100111002101020000221022200120103000301040004010500050112
histo:
0: 79
1: 62
2: 58
3: 51
4: 51
5: 51
6: 49

base 8:
01234567002030405060711314151617224252627335363744647557766543210001011021112003103201300410420220430140051052053023033054015006106206306402403404406501600701702502602703503603704504604705505605706606710720730740750761142121322143115215312223133154116216316412413414416511712512612713513613714514614715515615716616721731741751762253232433254226326423334244265227235236237245246247255256257266267327427
histo:
0: 63
1: 63
2: 62
3: 46
4: 44
5: 43
6: 43
7: 37

base 9:
01234567800203040506070811314151617182242526272833536373844647485575866887765432100010110211120031032013004104202204301400510520530230330540150061062063064024034044065016007107207307407502503504505507601700801802602702803603703804604704805605705806606706807707810820830840850860871142121322143115215312223133154116216316412413414416511721731741751251351451551761181261271281361371381461471481561571581
histo:
0: 80
1: 71
2: 40
3: 38
4: 38
5: 38
6: 34
7: 34
8: 28

base 10:
01234567890020304050607080911314151617181922425262728293353637383944647484955758596686977998876543210001011021112003103201300410420220430140051052053023033054015006106206306402403404406501600710720730740750250350450550760170081082083084085086026036046056066087018009019027028029037038039047048049057058059067068069077078079088089109209309409509609709811421213221431152153122231331541162163164124134144
histo:
0: 99
1: 48
2: 38
3: 36
4: 36
5: 31
6: 31
7: 28
8: 28
9: 26[/spoiler]

Obviously we’re all kidding here. But continuing to play right along …

If you believe rounding is appropriate, change the mapping of Y from 2 to 3 then remap A to any other unused value. CED still applies. That’s the wonder of numerology; it always overcomes any logical objection!

Although on a more solid logical footing I’d like to see you offer a quality defense of your idea that when discussing partial sequences extracted from longer sequences you should be rounding the digits at the cut points. That sounds pretty far-fetched to me.

Note I never claimed that pi merely spelled 3.LSLGUY and nothing more. I claimed, or at least was trying to claim, that pi *started *with 3.LSLGUY. The fact the following umptygazillion digits keep rattling out nonsense and Shakespeare and Voltaire (and Trump :eek: but I repeat myself :)) doesn’t alter my intended claim.

I’m sorry if I wasn’t explicit enough.

Not a defense, per se, but it is an old habit from my slip-stick days. Once you have to take a number off, you look at where the hairline is and you pick the closest mark. After that, rounding up 5/4 at the tail becomes second nature.

In Numberphile’s very long (but weirdly fascinating) Making of a Mile of Pi videothey discuss the letters in pi question from around the 13:12 mark onwards. The results are underwhelming though. The longest string is YNHEWHOLB and the first five-letter string that (kind of) makes sense is COROT - “A French landscape and portrait painter”

Yeah, using a particular encoding where pairs of decimal digits may code letters (01 for A, 02 for B, through 26 for Z), but most pairs of decimal digits don’t code letters, so most strings of decimal digits end up not coding just strings of letters. Mind you, we could pick all kinds of different encodings and get different results, including variable-length encodings in which all strings of decimal digits become strings of letters.

Glad you enjoyed exploring this concept! Incidentally, you may wish to know how you can be assured, at any point, that what have you have so far is good and further backtracking will not force its erasure; for this, it is convenient to know the following fact (from which the construction of infinite de Bruijn sequences really arises): let’s say a sequence is de Bruijn of order n if every length n string occurs in it precisely once. Then, it turns out, every order n de Bruijn sequence (over base > 2) is the start of at least one order n + 1 de Bruijn sequence.

So an infinite de Bruiijn sequence is just an order 0 de Bruijn sequence extended into an order 1 de Bruijn sequence extended into an order 2 de Bruijn sequence extended into an order 3…, and so on. And at each point you’ve completed one of those, you know all your digits so far are solid and need never be revised.

[As for WHY it is that order n de Bruijn sequences can always be extended to order n + 1 de Bruijn sequences (over base > 2), this is interestingly a graph-theoretic fact; that every Hamiltonian path over the de Bruijn graph of a given order can be extended into an Eulerian path, by Euler’s eponymous characterization of the conditions under which Eulerian paths exist, the only tricky part being the connectivity condition. I may write this up in more detail later.]

So I guess my intuition was heading in the right direction, with four-coloring. Or that was a lucky guess.

It also smells to me a bit like Gray coding is a very degenerate case of this idea. Or is maybe a lower “dimensionality” of the same idea considered in complexity space.

e.g. To extend a Gray coding from 2 bits to 3, first prepend a zero to each entry then replay the 2-bit entries in reverse order while prepending a one. The n+1 bit sequences contain the n-bit sequences twice with a simple amendment.

I don’t see a strong connection to four-coloring (or graph-coloring more generally) in particular, but I suppose the thing about graph theory that makes it such great math is that it’s so abstract, its core ideas and phenomena are ubiquitous.

Gray codes are more closely related to de Bruijn sequences. Just as de Bruijn sequences are Hamiltonian paths/cycles through de Bruijn graphs, Gray codes are Hamiltonian paths/cycles through Hamming graphs. And just as the de Bruijn graph of a particular order is generated inductively from the previous order in a simple way (as its edge graph), similarly, the Hamming graph of a particular order is generated inductively from the previous order in a simple way (as its Cartesian/“box” product with the complete graph on digits).

I would be interested in this. I may have to look up some of the big words…

Can’t every Hamiltonian path be turned Eulerian (or vice versa) just by converting the graph to its dual? Or is that not quite what’s going on here?

Regarding the statistics listed above, I think I figured out what’s going on. Clearly, when you reach an order-N stopping point, the digits must be represented equally (ignoring the stragglers at the ends). But the density across the sequence doesn’t have to be constant. By choosing an arbitrary stopping point, and because the algorithm tries lower numbers first, I preferentially captured the early parts of sequences where the high-digit density was lower.

So nothing really going on there, though I couldn’t say what this means for the “real” distribution of digits. Most prefix sequences will be biased low, but they all eventually recover.

For general graphs, the Hamiltonian path problem is a much more difficult one than the Eulerian path problem (the theory of Eulerian paths is simple: this is essentially the same as the theory of Eulerian cycles, and Eulerian cycles in a directed graph exist if and only if there are as many ways into as out of each node and all the nodes are connected; random stumbling will then decompose the graph into disjoint cycles, and then all these cycles can be joined up into one big one. No such good theory exists for Hamiltonian paths; indeed, finding Hamiltonian paths or cycles is NP-complete). However, Hamiltonian paths in the edge-graph of G are the same as Eulerian paths in G itself, and the de Bruijn graph of order n + 1 happens to be the edge-graph of the de Bruijn graph of order n, making everything easy here.

I’m not quite sure which notion of graph dual you had in mind or how you intended to use it, but at any rate, the above is what goes on with the de Bruijn sequence phenomena. [The one tricky thing we need for our extension property is showing that removing a Hamiltonian path (and thus at most one outgoing edge from each node) from a de Bruijn graph (over an alphabet of size > 2) leaves the remainder connected; this follows in a clever way from the fact that each node has at most one outgoing edge removed.]

More details later (perhaps, depending on what else I get done today…)!