In general, how do encryption algorithms prevent 0's mid-string?

bup · February 26, 2015, 5:48pm

I’m trying to implement a string to string encryption.

My understanding is that good encryptions must not be biased - any given byte in the encrypted result has an equal probability of being any value between 0 and 255.

Even with a slight bias, though, every encryption algorithm I’m familiar with does a lot of XOR-ing and bit-shifting, and you could end up with a zero.

I have XTEA working in my code, string to byte array, but the byte arrays often have zero’s smack dab in the middle. You can’t have that in a string.

The encryption has to send the message in a string field across the interwebs (Windows to a Mac-OS, but that’s not important - I know how I can make this independent by hand-mapping the characters to an encoding).

I know this has been solved many times - there are string-to-string encryptions out there.

I want to understand *how *they work. And google-fu is failing me.

davidm · February 26, 2015, 5:58pm

Can you simply encode the resulting string with something like base64 or percent encoding? I don’t know what language you’re using but PHP, for example, has functions for doing this.

ftg · February 26, 2015, 6:03pm

Of course you can have a zero (null byte) in a string. No one is forced to write code that assumes that a null byte terminates a string.

The best thing to do of course is not call it a string! It’s an array of unsigned bytes with a length count. End of hassle.

friedo · February 26, 2015, 6:07pm

This is the right answer. Keep track of the length yourself.

bup · February 26, 2015, 6:23pm

Well, the mechanism I have for sending the message to the other party is a string. I can’t get around it.

I guess I could do what davidm is saying.

What I’m writing in is c# (and objective c - but that’s not actually me). The more it’s like straight-ahead c the better.

bup · February 26, 2015, 6:28pm

But, I am interested in the general solution. String to string algorithms exist. How do they do it?

gazpacho · February 26, 2015, 6:33pm

C# and objective c both have string classes that do not use null terminated strings and can have 0s in the middle of the string.

busterpickle · February 26, 2015, 6:34pm

That’s where the randomization comes in. The resultant data string you generate will be processed with random numbers preventing a string of zero’s. This was an issue with older algorithms, not just for a bunch of zeros, but for any repetitive data output.

The_Hamster_King · February 26, 2015, 6:35pm

It’s wasteful, but you could encode each byte of your cleartext as two bytes of string. Represent encoded zeroes as 1111111111111111 or something.

Absolute · February 26, 2015, 6:39pm

They encode the resulting binary encrypted data as ASCII, i.e. using Base-64.

Encryption algorithms operate on numbers. Once you have the encrypted numbers, you can turn them into a string however you like.

leahcim · February 26, 2015, 6:41pm

Specifically mentioned here:

[QUOTE=MSDN]
In the .NET Framework, a String object can include embedded null characters, which count as a part of the string’s length. However, in some languages such as C and C++, a null character indicates the end of a string; it is not considered a part of the string and is not counted as part of the string’s length. This means that the following common assumptions that C and C++ programmers or libraries written in C or C++ might make about strings are not necessarily valid when applied to String objects:
[/QUOTE]

Zero-terminated strings are really a C/C++ thing, whereas strings prefixed by their length (which don’t worry about null elements) is a Pascal tradition.

bup · February 26, 2015, 6:43pm

Yeah, there’s that. Or, limit the length of what I can encrypt to 255 bytes. Then the beginning of the string I want to produce always is an extra character saying how many zeroes have been replaced (but 255 means none). Then the next x bytes say the positions of those replacements so I can put zeros back in. before decrypting.

Thanks for the idea.

**gazpacho **- I appreciate that, but when I serialize what I’m encrypting, it’s going through Amazon queues which don’t allow for pretty much anything but zero-terminated strings. Even ints and so forth have to be parsed from string on the other end.

bup · February 26, 2015, 6:46pm

Right. But the numbers can be any value, from 0 to 255. If I decide to make the number 0 encode to something other than ‘string terminator,’ then I stomp on whatever value that was.

If the encryption allows for any value between 0 and 255, I don’t see how to get around that.

AdamF · February 26, 2015, 6:51pm

This is not a good idea. You are giving away information that should be secret.
The better way would be to limit your clear text to 128 permissible characters. Each character would encode into 7 bits, be encrypted, and then converted back to ASCII by adding one bit in a non-random fashion. This way you can have up to 128 formatting characters, or just the one.

bup · February 26, 2015, 6:55pm

I disagree. After encrypting, I’m just putting meta-info on there that would have been obvious if I had been able to pass a byte array in the first place. You would know that zeros belong in bytes 8, 31, and 49 (for instance), because there they are.

That might work…

AdamF · February 26, 2015, 7:25pm

You are still giving away information about the clear text message. It would be enough to identify the message from the set that I intercepted. Say I captured two different-looking messages, both the same length and both with zeros in 8, 31, and 49. I can be very confident that it its the same message encrypted using two different keys (which should never happen, but mistakes do occur). This would give me a ton of info.

Punoqllads · February 26, 2015, 7:27pm

The best way is to first compress your data and then encrypt that.

bup · February 26, 2015, 7:42pm

The zeros aren’t in the original clear text. I’m only replacing zeros in the encrypted bytes.

Am I missing something?

leahcim · February 26, 2015, 7:43pm

I think you are running into problems with convolving this “remove the zeros” step with encryption. There is nothing secret about the zero-removal algorithm (hell, I recommend using base64 encoding like davidm suggests in post #2) since reversing the zero-removal will just get you the cyphertext, not the plaintext.

[QUOTE=Punoqllads]
The best way is to first compress your data and then encrypt that.
[/QUOTE]

That might make for better encryption, but it won’t give zero-free cyphertext.

bup · February 26, 2015, 7:45pm

How does that ensure the encrypted message has no zeros mid-array?

Topic		Replies	Views
Trying to understand creating my own file encryption program Factual Questions	25	3061	November 3, 2008
Compression Algorithims--I'm clearly missing something Factual Questions	36	4784	August 7, 2012
Encryption and C++ Factual Questions	23	2946	December 7, 2001
New Original Pure Math Question Factual Questions	9	1312	August 2, 2000
Help fight my ignorance on encryption Factual Questions	21	2460	August 18, 2007

In *general*, how do encryption algorithms prevent 0's mid-string?

Related topics

In general, how do encryption algorithms prevent 0's mid-string?