# One-Time Pads and Transcendental/Irrational Numbers

1. Would the decimal expansions of transcendental or irrational numbers make secure one-time pad keys (excepting, of course, well-known ones such as pi, e, phi, etc.)? I suppose this is equivalent to asking if the sequences of digits in such numbers’ decimal expansion are random, but I’m not certain.

2. If a truly random one-time pad key was used to encrypt a message (say, by basing the sequence of digits in the key on radioactive decay or thermal ‘noise’), and subsequently an attempt was made to decipher the encrypted message using a quantum computer, would the result be that the computer spits out every possible message of the same length (and it would be impossible to determine which one of them was the ‘correct’ one)?

Thanks!

1. No. Or to be more clear, what you need is entropy in your key. One of the ideals in working out security is to assume that everything about the encryption system is publicly known except for the key. So in your example, knowledge that you use transendental numbers is known. Just not what those numbers are. So your problem is expressing a key in a form that has lots of entropy. Pi is hopless - the string “pi” is enough to encode the number, and thus your key. A transendental or irrational number has the same entropy as a mathematical expression for the number. So even though the number has an infinite non-repeating pattern, there isn’t infinite entopy. You could say “use the 10,000,000th digit onward of Pi” but this only has the entropy of the expression. In general, the number sequnce has the same entropy as the entropy in the offset number. So you have not gained anything useful. You might as well have used the offset number as the key in the first place.

This relates to the second part of your question. The sequence of digits in a decimal expansion (or any other expansion) of any irrational or transendental number are most certainly not random. They are the same every time you perform the expansion. Thus there is no entropy in the expansion at all, the only entropy was in the choice of the expression for the number in the first place.

1. Yes. And nicely worked out.

Well, even a quantum computer can’t invent information out of thin air. When you have a message that was encrypted using an arbitrary key the same length as the message, there is absolutely no way to recover the original message without knowing the key. As you say, every possible message of that length is a valid plaintext for some key. If you have no information about what the key is, you have no information about what the message is.

It’s like I give you a single integer, 5, and tell you I arrived at it by subtracting one integer (my “key”) from another (my “message”). There are naturally an infinite number of possible messages that could result in ciphertext 5, so without any basis for knowing the key, you have no information about my message. No possible cryptanalysis can discover what isn’t there.

As for your first question, I imagine you’re suggesting that two interested parties could use such a number as a convenient “infinite” source of key data? I suspect it would be a very bad idea, and would undermine most of the security of the one-time pad.

Let’s say you use such a scheme for a while. You’re up to the millionth digit or so of your transcendental number x. Then your foes somehow get their dirty paws on the plaintext for some of your old messages, and are able to correlate them with intercepted ciphertext (by context, time, message length, etc.). If you had been using a proper one-time pad, you’d have nothing to fear; those messages were old and the discovery of the plaintext doesn’t worry you. But since all the numbers in your pad are related mathematically, there is the real risk that some mathematical wiseguy in the enemy’s crypto department will look for such a correspondence and find it.

I don’t know how hard that would actually be; obviously if you chose something like pi your code would be broken in a jiffy and your opponents would be reading your messages as soon as the intended recipient, if not sooner. My understanding is that known transcendental numbers fall into a small number of classes with strict relationships, so I would not trust my life to a “one-time pad” based on such a scheme.

Basically, though, no matter what the practical difficulties would be, using a transcendental number as your pad would change your code from “theoretically unbreakable” to “theoretically breakable,” which is a pretty big shift.

minor quibble about question 1 - what you are really asking is whether we can have stream ciphers http://en.wikipedia.org/wiki/Stream_cipher based on decimal expansions. One time pad is simply a (best known) example of stream cipher. While the Wikipedia article explains that in practice the streams are usually generated using an algebraically specified recurrence plus some boolean expressions added for non-linearity, it would certainly be interesting to hear from the experts as to the theoretical feasibility of what you are suggesting.

The second question is definitely “yes”. The plaintext could have been any sequence of letters of the same length. All you can do with a faster computer is generate the (huge) set of all possible such sequences and maybe filter out the (still huge) subset containing the ones that say something in a human language. So as long as you don’t reuse the pads, you are good to go.

Minor quibble. A one time pad isn’t a stream cypher. A stream cypher is one way of generating a one-time pad from a much lower entropy initial condition.

The critical part of a one-time pad is that it has maximum entropy, and is only ever used once. So, the best possible one-time pad is one that is generated from a true random sequence. A stream cypher is a poorer quality approximation to that. So not all one time pads are stream cyphers.

But relating a stream cypher to the decimal expansion of an irrational is exactly right. It is exactly a possible stream cypher. Just a very poor one, for just the reasons SP outlines above. Again, it is all about entropy. There are an infinite number of irrationals (in fact more than infinite) and thus in principle they provide a rich source of streams. But the expression needed to compute them is the deterministic definition. The search space for the cypher is not infinite, it is the size of the space you allow yourself to provide definitions. It is the entropy of defining expressions in that space that determines the entropy of your key.

One small difference between the standard stream cyphers and using an irrational is that the common stream cyphers are all based upon a finite state space, and will all eventually repeat their stream. So they map to the rational numbers. This doesn’t make any difference for the value of the cypher, but is an interesting difference.

Thank you! Very helpful, all.

To clarify, the type of transcendental number I was getting at might be, for example, (pi)[sup]sqrt(7.234)[/sup] * e[sup]1/3.9873[/sup], i.e. the decimal expansion of which has probably never been considered before.

as a practical consideration, note that even if you come up with a way to generate irrational numbers with sufficiently random decimal expansions, it might be pretty complicated to compute these expansions in practice. Well, we can generate the first 10 digits on the calculator, but here you would need to generate tens of thousands of digits. Just think about how many multiplications and divisions you would need to compute the series for the exponential functions. Plus you would need to make sure that the rounding is being done the same way on both ends (computers are not really intended by design to do arithmetic accurate to thousands of digits). Meanwhile the recurrences used in professional stream ciphers are apparently really easy to compute.

You underestimate the NSA!

More seriously, you are, of course, absolutely right.

Well, no. The whole point of the OTP is that it’s provably unbreakable. There’s no theory about it: Claude Shannon, the father of information theory, created the proof in the 1940s. That’s why they’re used whenever possible, however rarely that might be.

The point about a OTP is that it must be the same length as the original text. If the OTP is purely random (i.e. has the maximum entropy possible) then it is unbreakable. The issue with a stream cypher is that although the stream is used as a OTP the stream does not have the maximum entropy possible - it has only the entropy present in the stream generator parameters. Thus if the plaintext message is longer than the length of the generator parameters it is no longer unbreakable. So my point about the space used to describe the transendantal may be viewed as this. If the textual description of the transendental number to be used is shorter than the textual length of the plaintext, the cypher can be broken. If one realises that the possible set of valid expressions for a transendental is much smaller than all possible strings in the description space, one realises that the entropy possible in the description space is far less than it might at first appear.

The difference between provably breakable and theoretically unbreakable strikes me as one of hairsplitting semantics. I read “theortically unbreakable” as “unbreakable from the application of theory.” In this case application of Shannon’s coding theorem. So it has identical semantics to provably unbreakable. YMMV

How can a string of number be purely random?

I guess if derives from some truelly random physical process you would “know” its random.

Now, if you have that string of numbers, I suppose you can run tests on it and give a measure of its randomness but its not like the measure is going to result in say a 1 for perfectly random and a 0 for totally non-random.

I can understand in a very vague way how the less random it is, the more likely or easier it is to break, but I don’t see how you can differentiate between purely random and darn close to random when it comes to its unbreakabiltiy or not.

I am pretty sure I won’t understand the answer, but I thought I’d ask anyway.

In a strange way you can. Although it isn’t a trivial (or practical) process. The basic idea is that you are given a random number sequence. Can you find a computer program that emits that same sequence? Clearly there is a trivial program - one that contains the sequence and just prints it out. So, can you find a program that prints out the sequence and is smaller than the sequence? If you can, the number isn’t purely random.

So, think about stream cyphers. They are a simple computer program that takes a couple of very large numbers, and uses them to generate a much longer stream of numbers that is used as the cypher key. That basic program, plus the parameters are smaller than the generated sequence, and rather than find the entire sequence to break the code, all you need do is search for the parameters. The Enigma machine is a great example. The initial parameters were the initial rotor settings. So a search space that is much smaller than the length of a typical message. As it tuned out, along with a few issues with the way the machine worked (for instance the machine could not encode a letter to the same letter) allowed a mechanical search engine to be built to break the codes.

Most encryption works with a stream cypher, because one time pads are impossibly cumbersome to use. The pads must not be reused, and must be the same length as the plaintext to be encrypted. And both sender and recipient must have copies. The key exchange problem becomes the limiting factor. (Diplomats with briefcases handcuffed to their arms is the sort of thing one gets with one time pads. The modern high tech key exchange is quantum encryption.) So stream cyphers use generator functions for which the initial parameters are very hard to discover. (Actually, typically a very secure, but very computationally expensive algorithm is used to exchange a session key, and the session key is used to drive a computationally easier stream cypher for each encrypted exchange. )

Of course in practice this is true. It is infeasible, and as close to impossible as you probably like. But what it isn’t, is absolutely provably impossible. It may still be that with a less than perfect sequence generator there remains some useful traction to be had with an encrypted message. Perhaps you could make a good guess as to how long the plaintext message is. That alone might provide useful intelligence information. Any sort of statistical traction applied over a long series of messages might also allow something useful to be elicited.

If I can hijack my own thread, this reminds of a question that I almost added to the OP but didn’t.

In any case, when one says “random” as for example in a sequence that has only its trivial program as its generator, does the sequence contain all digits one-at-a-time each occurring 0.1 of any chosen (infinite) subsequence, all 2-digit combinations each occurring 0.01 of any chosen (infinite) subsequence, all 3-digit combinations each occurring 0.001 of any (infinite) subsequence, etc. Can such a property even be proved for a sequence?

Thanks!

Yes, although the notion of randomness being discussed there is slightly stronger than what’s been discussed so far. Here we only require the whole string to be incompressible, but there they require every proper prefix to be incompressible as well.

I know next to nothing about this, but I know the layman’s definition of Kolmogorov complexity, and this sounds related. Say we have a sequence S of length n. Is this sequence random iff K(S) = n + O(1), where K is the Kolmogorov complexity function?

(i.e. can you define randomness wholly in terms of Kolmogorov complexity?)

Interesting, thanks. I hadn’t appreciated the equivalence of the three (qualitiative) definitions of randomness.

More to the point, in that link there was a reference to normal numbers. After looking there, I now see that what I was asking was whether a “random” sequence meant a “normal” sequence. I think.

Assuming that we are talking about computers, which is going to be the case for all serious uses of encryption, it’s worth noting that all data is encoded as series of bits. An integer, a fractional value, an imaginary number, etc. no matter what it can only be represented as 1s and 0s. A program just knows to treat the bits in a certain location as being of that type and uses code which treats the bits in that way to utilize it. But since it’s all just 1s and 0s to begin with, the max data that can be encoded is capped by the number of bits. What that data means is meaningless, just whether it can or can’t achieve the cap.

Well, that’s what I meant by “theoretically unbreakable.” I wasn’t using “theoretical” in the sense of “speculatively or probably true”; I just meant that it follows from information theory that OTP is unbreakable.