Cryptogrpahy questions

I figure some of y’all should have some knowledge of this, so here’s a few questions I’d like to posit.

  1. One-way cryptography. I remember in the early to mid-90s, when we were still using telnet to login to the Internet, we could go up in the UNIX directory and find encrypted versions of anyone’s password (that happened to be on the same server.) These encryptions were said to be one-way encrypted, so you could not simply reverse the algorithm and find the original password. Hence, there were programs that encrypted the entire dictionary plus a few common password keystroke combinations (such as “qwerty” or “asdf” or “aaaa”) compared the encrypted word with the encrypted password, and if it found a match, told you the password. Hence, we were later told to throw in numbers and punctuation marks to throw off such programs.

If I am understanding this correctly, since it’s one-way encryption, would this mean there’s a loss of data somewhere, and theoretically there could be several combinations that would be valid as the password?

  1. Simple ciphers. Now, we all know the letter-substitution or symbol substitution ciper. Ya know, like a=z, b=y, etc.
    The way to solve this is analyze a piece of encrypted text with letter frequency charts. Now, a cipher I’ve always liked would be one that say, takes a sentence, “The quick brown fox jumped over the lazy dogs” and then adds a certain phrase over it. Say your password is “toledo,86orange” The program would then take your message and add “toledo,86orangetoledo,86orangetoledo,86orange” to it (ASCII values, or whatnot) to come up with a new string.
    You obviously can’t use letter frequency to decipher this, so what could you use? To me this encrytion doesn’t seem solvable, but obviously it must be.

To wit:

The quick brown fox jumped over the lazy dogs
toledo,86orangetoledo,86orangetolego,86orange PLUS

Just add straight down, value for value “T+t” “h+o” “e+l”
and get a junk string. To decode, you take the junk
string and just work everything in reverse. This is the
kind of encryption I made up when I still used a Commie 64,
and I just wonder how effective it actually is…

Sorry for the length …

  1. A cryptographic hash takes a value of any size and creates a value of a fixed size. Typically the input size is larger, so yes, it is possible for two inputs to hash to the same output. These similarities are called “collisions”. The definition of a good hash is one that cannot be backed out, i.e. the input can’t be easily discovered from the output, and a collision can’t be easily discovered. So, by definition, a good hash won’t have the problem you describe. In fact, there are many good hashes with these properties.

  2. I dunno. I know plenty about cryptographic algorithms, and lots about information theory as well, which plays into this, but very little about actual cryptanalysis techniques.

The way to solve the cipher in your second example is still through letter frequency analysis. The trick is you must first discover the period, that is the length, of the phrase you were adding to the clear text. It’s a lot more tedious then solving a simple substitution cipher, but given a long enough cipher text and a helpful computer, you can crack it easily.

Bill H, I believe what you said, is wrong in the sense that it does not answer the question that was asked. Let me try:

Part 1

The answer to this is NO. A one way function preserves all the information. One way just means “easy one way, difficult the other way”. A simple example which you can do by hand:

The function F(x)=Y^^X(mod P) (Y elevated to X mod P where X is the variable and Y and P are given) and choose (say) Y=7, P=11 (Y must be smaller than P). You can work out F(x) fairly easily but to work out the reverse you just have to work out all the direct functions until you find the one that matches. It can be done but it requires exponentially more computing power. If the computing power for the direct function is near the limit of what you have, then the reverse is beyond your limits.

Hash functions are an entirely different and separate things used for electronic signatures (authentication). If you do not use a hash function, the secret key could be deduced from several signed messages, so the hash function is introduced. In this case, yes there is the immensely remote possibility that two different messages could give the same hash output. But hash functions are used for signatures and NOT for encryption.

Part 2: By doing that you have diluted the frequency somewhat but it is still there. No dice. By adding spurious filling (these signs are called “nulls”) you just make the message much longer and only marginally more difficult to decode

Using any kind of compression solves this problem though. When you compress (winzip for instance) almost all of the redundancy is gone and frequency analysis won’t help. That is why PGP compresses before it encrypts. So you need to shorten the message, not the other way around. Note that compression and encryption preserve 100% of the integrity of the message.

One way or another all forms of encryption are solvable in theory if you have enough cyphertext to work on. The only way a message would be 100% undecryptable is when the key length equals or exceeds the length of the message. This is absolutely secure (and not practical).

Cryptography is a subject of my interest and I know a bit about it but you cannot learn a complicated subject in a thread here. To get you started I recommend The Code Book which mixes history with theory to make for interesting reading.

Some websites on this topic:
http://members.aol.com/nbrass/enigma.htm
http://www.eclipse.net/~dhamer/
http://www.eclipse.net/~dhamer/aspects.htm
http://www.nsa.gov/museum/
http://www.nsa.gov/docs/venona/index.html
http://members.aol.com/nbrass/abwehr.htm
http://www.jjtc.com/stegdoc/
http://www.cryptograph.com/instruct.htm
http://www.rsasecurity.com/
http://ntrg.cs.tcd.ie/mepeirce/Project/Chaum/sciam.html

For interesting reading check out the Venona project. Even when you read the decrypted messages you don’t have a clue what they are talking about because they had so many code names for their spies and other things.

For Unix, passwords come from an alphabet of 93 characters and can have a length of 0-8 characters. The encrypted versions are 13 characters long and come from an alphabet of 64 characters (I think). If I did the math right, the probability of two given passwords enrypting to the same string (assuming the hashing function’s output is distributed roughly uniformly over all possible strings) is 1 in 53,000,000.

Quoth sailor:

Maybe it’s not practical anymore, in the face of RSA encryption techniques, but before computers became as ubitquious as they now are, it was the standard for maximum-security communications. A spy would be given one copy of a codebook, which consisted of nothing but a very long sequence of random letters, while the other copy was kept back at headquaters. When a spy needed to transmit a secure message, he’d use as many pages of the book as necessary, and then tear out those pages and shred/burn/eat them so they’d never be used again. The result was a codebook which carried zero information, if captured before use, and an openly-transmitted message, which likewise carried zero information, without the codebook. Nowadays, I suspect that it’s done with computer cryptography methods like the modular exponentiation function mentioned by sailor… I believe that’s the function that RSA is based on. These can theoretically be cracked, but last I heard, the record for a short-key crack was six months, and for long-key, it’s estimated that it would take trillions of years.

It’s called the Vigenere cipher, and a method for attacking it was invented by Friedman–hmm, that says Philip Friedman. The Vigenere was once considered unbreakable–and apparently was used as the official diplomatic code of some nations well into the middle of the twentieth century, even after it was trivial to break it.

I once wrote a small interpreted Basic program that solved it on an old IBM 4.77MHz 8088-based PC. If you gave it an encrypted text that was a few times longer than than the password, it would spit out the password in about a minute.

Chronos, that is called a one time pad and have been used and continue to be used. The main problem with one time pads is they are unmanageable for large volumes of information. They might be practical for a spy who sends home very brief messages infrequently but for communications on the scale of WWII it is not feasible. Even symmetrical encryption was getting to the point of becoming unmanageable because of the logistics of transporting keys. That is a very weak link and a compromised key is worse than open communication because the parties think they are communicating in secret.

It was for this reason that Diffie-Hellman-Merkle invented a key exchange system that allows two parties to exchange a key publicly and yet no one else can guess the key. In other words you and I can exchange information on this board that would allow us both to arrive at the same key and yet, everybody else reading the same information, cannot guess the key. Neat, eh?

The DHM key exchange scheme was never widespread because almost simultaneously Diffie invented the concept of assymmetrical encryptio and soon after (1977) RSA was the first practical system of asymmetrical encryption.

Just to add stuff I will say PGP is not strictly asymetrical. Rather it encrypts the message with a symetrical one time key which, in turn is encrypted with the recipient’s key and sent with the message.

The recipient’s software decrypts the key and then uses the ksy to decrypt the message. This makes it much faster to encrypt and decrypt (I assume it also makes it more vulnerable).

I guess what I’ve always wondered (and never bothered to really check out) is how, using letter-frequency tables and whatnot, any kind of automated decryption technique can give the one key and decrypted message.

What I’m saying is that what I’m writing now, for example, probably will not correspond exactly to the letter frequencies in the table. Sure, it’ll probably be close, but an exact match would require some help from chance.

If I encrypted pure gibberish, the would-be decrypter would have nothing to work with, right (I know, such a message would be useless)? Or am I missing something?

It seems to me that some amount of human intervention would be necessary. The decryption program could spit out several “reasonable” guesses, but I don’t see how it could be trusted to determine, for sure, what the message was without some human help. Especially if the message is only a short sentence, say. 50 characters, with a 5-character encryption key. I’d think that the letter-frequency in a message that short could only get you close…

I’m not really doubting, so much as seeking enlightenment. :slight_smile:

“If I encrypted pure gibberish, the would-be decrypter would have nothing to work with, right?”

Right.

But with normal text (which is why “normal” text is often avoided) the “reasonable guesses” would each have a computed probability. I think with fifty characters of normal English, and a five letter password, the password would be fairly obvious.

Many times, with text only twice or three times longer than the password (which itself was gibberish), the password was not completely known, but I was able to discern the message enough to read it.

Which is one thing I’ve wondered, code-breakers have to know what language the code is in, I suppose?

Does the US military (for example) encrypt messages in a different language that uses the same alphabet? That would make the messages much harder to decrypt I would guess. Just make sure that each unit has a person that speaks, say, albanian, and voilà! If I remember right, Navajo native americans were used in World War II for this purpose.

FWIW, I recently read an excellent book on the history of crytography and cryptanalysis called “The Code Book,” by Simon Singh (also the author of “Fermat’s Enigma”). The book also includes a challenge, with 10 cyphers of increasing complexity, with a $15,000 prize.

I guess you could teach people to use a different form of spelling. lots of abbreviations…use doubled consonants in proper names, spell some things phonetically, run short words together, etc.

Most of this, if decrypted with the known key would be fairly easy to muddle through – but it would make it much harder to determine the key by frequency analysis.

I’m sure all of that is well below the capabilities of modern encryption technology, but at least you’d leave some of the newspaper cryptologists in the dark.

Wouldn’t work with passwords, of course.

A one-time pad, as sailor says, is impractical for large volumes of data, since that needs an equal volume of key. It would be possible to use a long-period random number generator to create keys, but the algorithm would need to be kept secret, or the seed value could probably be guessed fairly easily, especially if a known message is intercepted.

The RSA algorithm can be public, since the difficulty lies in recognizing the key pairs, not in creating or applying them. IIRC, deriving a private key from the public one of the pair has been proved to be equivalent to factoring large numbers. The only general method known for factoring is to try each possible factor, i.e. every prime number from 2 to the square root of the number. That is hard! Of course, if a quick method of factoring were to be discovered, RSA would be out of business, but mathematicians have been working on it for a long time and it hasn’t happened yet.

RM Mentock, that is not a Vigenère cipher which is a polialphabetic cipher (not related to untroducing nulls).

Cryptanalysis is based on language analysis, redundancy etc. All decryption is based on having a vague idea of what you are looking for (language) and in repetition; the more cyphertext you have, the more you have to work with. If you have a single short message it may be impossible to decrypt. If all the messages were totally random, they would be unbreakable (but of no use either). If you teach your spies a new language it may well be unbreakeable. In fact, many messages can be decrypted and not understood. If you read the Venona papers you will see what I mean, everything and everybody has code names which you would have to identify.

During WWII this was a common case as places were identified by code names and there were several techniques for breaking this layer of encoding.

All of the things mentioned to obstruct decryption, can and have been done, but they are not major obstacles and might confuse the users more than the cryptanalists. They are not practical at a large level. (Can you imagine the entire US army having to learn "when I say the Russians, I mean the Japanese, etc… this would be a Monty Pithon moment) In any case such things do not remain secret for long. But if you and your friend have your own code, that might be unbreakable because the cryptanalists have so little cyphertext to go on and they are not interested in your affairs anyway. But this is not workable when trying to exchange thousands of messages daily.

The English could read the german messages but did not know the locations they were referring to. One of the techniques was called “gardening” which was planting mines in a given area and then listen to German ships report “hey, lookout, mines in area XYZ!”

Part of the reason enigma was cracked was sloppy use (repeating the settings at the beginning of each message, use of standard language etc)

This was also the case with the japanese who started every message with the formality like "I humbly beg to inform your excellency…) this is called a crib and greatly helps the cryptanalist.

The german navy was much more disciplined in their use of enigma and their codes went unbroken much longer

No, it is a variant Vigenère, using ascii instead of just the 26 letters of the alphabet. What do you mean by “untroducing nulls,” in this case?

In general, to evaluate a cipher’s effectiveness, you have to assume that the “enemy” knows your base language and your method of encryption, and they have a whole buncha samples. When you change the language, you are basically using a code instead of a cipher–and codes are extremely expensive, usually. Just imagine a small dictionary with different meanings for every single word. And, of course, if you lose one code book, you probably have to start over.

RM Mentock, on rereading what pulykamell wrote I realize I misinterpreted it. You are right and he is talking about a Vigenère type encryption. Sorry about the misundertanding.

OK,

Well, RM, thanks! That pretty much solves that question. Seems my little code wasn’t as difficult to break as I thought. After reading the explanation on the web site, it seemed ridiculously easy, in fact.

But this only applies to codes with short keys. Now, we were discussing the impracticality of keeping long keys.
Well, we have zillions and zillions combinations of text just sitting around us, so what if the spy was, say, instructed to encode his message in the Vigencre manner, but instead of using a repeated word, would, say, give a key of (for simplicities sake) 031200020403, which would correspond to March 12, 2000, paragraph 2, word 4, letter 3 of the Chicago Tribune and start the encryption from there?
Or the Bible? Or page 24 of the Penguin edition of Charles Dickens’s Hard Times, for instance?

That seems to be extremely difficult, if not impossible to decode due to the fact that the password length is equal to the message length.

Am I spy material yet? :slight_smile: