In *general*, how do encryption algorithms prevent 0's mid-string?

I’m trying to implement a string to string encryption.

My understanding is that good encryptions must not be biased - any given byte in the encrypted result has an equal probability of being any value between 0 and 255.

Even with a slight bias, though, every encryption algorithm I’m familiar with does a lot of XOR-ing and bit-shifting, and you could end up with a zero.

I have XTEA working in my code, string to byte array, but the byte arrays often have zero’s smack dab in the middle. You can’t have that in a string.

The encryption has to send the message in a string field across the interwebs (Windows to a Mac-OS, but that’s not important - I know how I can make this independent by hand-mapping the characters to an encoding).

I know this has been solved many times - there are string-to-string encryptions out there.

I want to understand *how *they work. And google-fu is failing me.

Can you simply encode the resulting string with something like base64 or percent encoding? I don’t know what language you’re using but PHP, for example, has functions for doing this.

Of course you can have a zero (null byte) in a string. No one is forced to write code that assumes that a null byte terminates a string.

The best thing to do of course is not call it a string! It’s an array of unsigned bytes with a length count. End of hassle.

This is the right answer. Keep track of the length yourself.

Well, the mechanism I have for sending the message to the other party is a string. I can’t get around it.

I guess I could do what davidm is saying.

What I’m writing in is c# (and objective c - but that’s not actually me). The more it’s like straight-ahead c the better.

But, I am interested in the general solution. String to string algorithms exist. How do they do it?

C# and objective c both have string classes that do not use null terminated strings and can have 0s in the middle of the string.

That’s where the randomization comes in. The resultant data string you generate will be processed with random numbers preventing a string of zero’s. This was an issue with older algorithms, not just for a bunch of zeros, but for any repetitive data output.

It’s wasteful, but you could encode each byte of your cleartext as two bytes of string. Represent encoded zeroes as 1111111111111111 or something.

They encode the resulting binary encrypted data as ASCII, i.e. using Base-64.

Encryption algorithms operate on numbers. Once you have the encrypted numbers, you can turn them into a string however you like.

Specifically mentioned here:

[QUOTE=MSDN]
In the .NET Framework, a String object can include embedded null characters, which count as a part of the string’s length. However, in some languages such as C and C++, a null character indicates the end of a string; it is not considered a part of the string and is not counted as part of the string’s length. This means that the following common assumptions that C and C++ programmers or libraries written in C or C++ might make about strings are not necessarily valid when applied to String objects:
[/QUOTE]

Zero-terminated strings are really a C/C++ thing, whereas strings prefixed by their length (which don’t worry about null elements) is a Pascal tradition.

Yeah, there’s that. Or, limit the length of what I can encrypt to 255 bytes. Then the beginning of the string I want to produce always is an extra character saying how many zeroes have been replaced (but 255 means none). Then the next x bytes say the positions of those replacements so I can put zeros back in. before decrypting.

Thanks for the idea.

**gazpacho **- I appreciate that, but when I serialize what I’m encrypting, it’s going through Amazon queues which don’t allow for pretty much anything but zero-terminated strings. Even ints and so forth have to be parsed from string on the other end.

Right. But the numbers can be any value, from 0 to 255. If I decide to make the number 0 encode to something other than ‘string terminator,’ then I stomp on whatever value that was.

If the encryption allows for any value between 0 and 255, I don’t see how to get around that.

This is not a good idea. You are giving away information that should be secret.
The better way would be to limit your clear text to 128 permissible characters. Each character would encode into 7 bits, be encrypted, and then converted back to ASCII by adding one bit in a non-random fashion. This way you can have up to 128 formatting characters, or just the one.

I disagree. After encrypting, I’m just putting meta-info on there that would have been obvious if I had been able to pass a byte array in the first place. You would know that zeros belong in bytes 8, 31, and 49 (for instance), because there they are.

That might work…

You are still giving away information about the clear text message. It would be enough to identify the message from the set that I intercepted. Say I captured two different-looking messages, both the same length and both with zeros in 8, 31, and 49. I can be very confident that it its the same message encrypted using two different keys (which should never happen, but mistakes do occur). This would give me a ton of info.

The best way is to first compress your data and then encrypt that.

The zeros aren’t in the original clear text. I’m only replacing zeros in the encrypted bytes.

Am I missing something?

I think you are running into problems with convolving this “remove the zeros” step with encryption. There is nothing secret about the zero-removal algorithm (hell, I recommend using base64 encoding like davidm suggests in post #2) since reversing the zero-removal will just get you the cyphertext, not the plaintext.

[QUOTE=Punoqllads]
The best way is to first compress your data and then encrypt that.
[/QUOTE]

That might make for better encryption, but it won’t give zero-free cyphertext.

How does that ensure the encrypted message has no zeros mid-array?