# Please explain encryption to me.

Can someone explain encryption and decryption to me? I’ve tried, but I really can’t get it. What’s with public and private keys, rainbow tables, quantum encryption, hash algorithm etc.

Most things I’ve read talk about sharing keys amongst different people. Is this the same as encrypting files on my computer? (Because it’s only me who will (hopefully) have access to those files.) If I wanted to create an uncrackable password that I could remember, could I do it? I appreciate that this may be quite a big subject, but any guidance would be appreciated.

Public key encryption is based on the fact some mathematical problems are very difficult to solve. In specific, the prime factorization of very large numbers takes a huge amount of time when the numbers are huge but still manageable with modern computer hardware (factoring a 2048-bit number would take you until all the stars in the Universe died, but 2048 bits is tiny compared to current RAM and hard disk sizes). I don’t know how much detail you want, but it’s possible to use mathematical jiggery-pokery to implement some very interesting properties:
[ul]
[li]Breaking the cipher is just as hard as factoring a huge number.[/li][li]Any message encrypted with your private key can be decrypted with your public key.[/li][li]Any message encrypted with your public key can only be decrypted with your private key.[/li][li]Using hashes, it can prove your identity even if it isn’t tied to your real name.[/li][/ul]I don’t know how much detail you want. Look up RSA to begin with, here.

A rainbow table is used to make breaking a hash quicker, at the expense of taking more RAM. In specific, it is used to recover passwords, which are always stored in hashed form on secure systems, given a set of guesses about what the password might be (to create the table) and a bunch of reduction functions specific to the hash algorithm used to hash the passwords. Use of salts, or random data introduced into the hash, make rainbow tables a lot less effective (ideally, completely ineffective).

This is pie-in-the-sky stuff right now. As I understand it, it only allows you to detect whether someone is snooping on your communications link, by measuring the polarization of the photons used in that link. This is founded on quantum entanglement and frankly resembles Alice in Wonderland more than a little.

This is an algorithm that applies a complex function to turn arbitrary-sized input into a fixed-size output such that it is virtually impossible to guess what the input was based on the output, or to craft an input you know beforehand will yield a given output. This is used to detect changes in the data that was hashed, either accidental or malicious, and can be used to verify that the data came from a specific person or entity: You do that by creating a hash of the data and then encrypting the hash with your private key. That way, others can decrypt the hash (your digital signature for that file) and verify that the decrypted hash is the same value they get when they hash the data.

Thanks for your reply, but can we move back a few steps here. Think more ‘Encryption For Dummies’. Er, like what’s ‘hashing’, and ‘prime factorization’ amongst others. My level of knowledge here is very basic.

OK, most basic first: Prime factorization is taking a number (an integer, to be precise) and determining what prime numbers (a prime is an integer only evenly divisible by itself and 1, and 1 is not a prime*) you multiply to get that number. For example, the factorization of 15 is 5 and 3, and the factorization of 21 is 7 and 3. Determining the prime factorization of a small number is very easy, determining the prime factorization of a large number is hard, and determining the prime factorization of a very large number is extremely hard. In this case, ‘very large’ means 1024 bits or 2048 bits or larger, meaning when you write the number in base 2 it takes 1024 or 2048 ones and zeroes to do it. (‘Base two’ means the only numbers you have are 0 and 1, and moving right one position multiplies by 2: ‘1’ in base 2 is 1, ‘10’ in base 2 is 2, and ‘100’ in base 2 is 4.)

*(One is not a prime because we really want to preserve the property that all integers have a unique factorization: A prime’s factorization is itself, and a non-prime (or composite) number’s factorization is a unique sequence of primes.)

A hash is a complex function (in the mathematical sense, meaning it always returns the same output when given the same input) that turns arbitrary-sized inputs (a whole file, say) into fixed-size outputs (a 256-bit number, for example). It’s obvious that it isn’t possible to give all files a unique 256-bit number, but the function is complex enough it is very very difficult to predict what number a given file will hash to, or to create a file that is guaranteed to hash to a certain predetermined number. This is useful because it means that sending a file plus the number that file is supposed to hash to makes it very very difficult for someone to fiddle with that file in transit: If the file’s been changed, the hash of the new file won’t match the hash the recipient is expecting.

To give you some idea of where I’m at, I know that if you want to encrypt some text, you can move all the letters up one (a becomes b etc) and tell your recipient that that’s what you’ve done. I am guessing, but what you tell your recipient is the key, and that’s probably private. But where do public keys come in? And is the method of encryption the things that are called AES and Blowfish?

Multiplying two large numbers together is fairly simple; working backwards to find the original two numbers is a far more difficult problem. There are mathematical “one-way functions” (very easy to find the function, very difficult to work the other way and find the reverse function) based on multiplying two large prime numbers.

With a one-way function, a user can construct a public encryption key and a private encryption key. The public key gives information for finding the one-way function; the private key is a clue that makes it easy to find the reverse function. Using that function, numbers (and any data can be expressed in numbers) can be coded in a way that is practically irreversible – unless you have that clue. So, you can publish the “public key” and people can use it to send messages only you (with the “private key”) can read.

That is a very simple cipher, called the Caesar cipher because it was first used by Caesar (well, some Caesar) in Ancient Rome. We’ve come a long long way since then. The Caesar cipher is not usefully similar to any modern cryptographic algorithms.

Again, it’s a lot more complex now.

Public keys are part of a whole new way of encrypting messages which I’ve already outlined above.

Yes, those are called ‘encryption algorithms’. AES is the Advanced Encryption Standard, which is a standard algorithm chosen by the American Federal Government. It is widely thought to be secure if your key is 256 bits long. Blowfish is another cipher, also thought to be secure but not as widely-used. Neither of them are public key ciphers, but instead are symmetric key ciphers: The only key is the private key, and you use the same key to encrypt (render unreadable) and decrypt (make readable again). They are not like the Caesar cipher at all.

Some things here. Are you saying that some mathematical function works on the binary contents of the file to turn it in to 256 bits? If you send the file with the ‘the number that file is supposed to hash to’, doesn’t that mean that if someone intercepts it, he will be able to read the file? And if he doesn’t change it, you won’t know if it’s been read.

And how does this relate to a file I have encrypted on my hard disk (using say, Truecrypt or PGP) that has a user created password, where the secuity is only to prevent someone else from seeing the file? I presume I have encrypted it (somehow), which makes it unreadable, but in some way the password decrypts it. How?

Well, some fixed number of bits, yes. (To a computer, everything is binary. All files are ‘binary files’.)

Unless you encrypt the file, yes, others will be able to read the file.

Yes, that is true.

It’s important to realize that you have to use these tools correctly for the given scenario. Sometimes, you don’t care who reads a document as long as it isn’t changed. Contracts work like this, for example. To keep a contract secure means you know if it’s been changed (even down to changing the spaces between words) and you only need a hash to do that.

It can tell you if someone is trying to change files on your hard drive. It can also allow you to send encrypted files to someone else and allow that person to know they came from you (not someone pretending to be you) and they weren’t modified in transit.

That is somewhat complex. To understand it, you really need to do some research on your own, because cryptographic algorithms are complex and subtle beasts with properties even the experts barely understand. I suggest Applied Cryptography by Bruce Schneier.

(requires some math understanding).

Hashing is not really encryption. The goal of hashing isn’t to prevent someone from reading a file, it’s to allow you to verify that a file hasn’t been modified since the hash was created.

I’m sorry, but if you’ve got the patience, I’d like to tackle this very slowly. Using very simple numbers (I presume the principle is the same), if I multiply 4 x 8, I get 32. The ‘4 x 8’ is the one way function? So what makes it difficult is in finding the numbers I used to make 32 if you don’t know it was?

So here, the public key will tell whoever I used 32. So, someone else sends me a file, I can use my private key (4 x 8) to decrypt it. And what makes it difficult is that I could have 2 x 16, or any other available combinations?

(use of a calculator is allowed)
Multiply 52403 x 68025
(how long did that take?)

Now, tell me what two numbers, when multiplied together equal 3564834504?
(how long did that take?)

2 x 1782417252

About the same length of time as the first problem, since I could see that one was divisible by 2. (and 4, and 8…)

Here’s a better example. This one will take a wee bit longer to factorize.

1157502029

*** Ponder

Actually, the numbers you would use are prime (), so 4 x 8 is a bad example. So if you’re using xy = z and you’re trying to find the factors of z, there are no options other than x and y. The thing making it difficult is that x, y, and z are very large numbers. See beowulf’s example about this, but understand the numbers we’re actually talking about are much, much, much larger.

(*) There’s a lot of active research into determining whether a given number is prime. So this might not be true anymore, but until recently you could determine with a high level of confidence if a very large number is prime, but you couldn’t be sure in a reasonable amount of time.

It sounds like you want to read Simon Singh’s The Code Book which takes you from a very basic level right through to more modern ciphers.

I second this recommendation. It was a very easy to read, but very informative, book.

Ed

No, this is still true: Remember that proving a given number is prime is the same as factoring that number. There are various algorithms you can use to generate ‘pseudo-primes’, or numbers that are very likely to be prime but aren’t proven to be prime. These are also called industrial-grade primes (or something like that), with the implication that you’d need to be very unlucky to have picked a composite number as a result of those tests.

Primality testing can be done in polynomial time (proved in 2003 by Agrawal and two students at IIT). Asking of a number “Is it prime?” is the same as asking for its factorization only insofar as the input actually is prime; in the case where it isn’t, the former is obviously a less intense question.

Of course, primality testing is a different task than generating large primes, though the two obviously have connections.

Sorry, 2002, not 2003.