In general, how do encryption algorithms prevent 0's mid-string?

Saint_Cad · February 26, 2015, 7:51pm

Here’s an idea, have the acceptable range of the encryption be 0 to 254 then add 1 to each encrypted byte.

Punoqllads · February 26, 2015, 7:54pm

Because long sequences of 0s can be by definition compressed.

leahcim · February 26, 2015, 8:08pm

That doesn’t mean the compressed result will have no zeros in it.

septimus · February 26, 2015, 8:12pm

It is a common need, e.g. in file-long encodings like Jpeg. Those boasting of systems which avoid any problem may not have worked with many real-world high-performance encodings.

A simple solution – and IIRC something along these lines is used for Jpeg, though it’s solving a slightly different problem :–

Replace FF or 00 with FF FF or FF 01 respectively. (Codes FF 02 through FF FE are freed up for use as markers.) 1/128 of the time, 16 bits instead of 8 bits are used to represent a byte, for an average loss of 1/16 bit per byte.

With this approach, your file will grow by 0.78% on average (if 00 and FF each occur with 1/256 probability). If this is too much waste, other, more complicated, approaches are possible. (Note that the “waste” does build in a crisp possibly useful way to insert markers.)

Learjeff · February 26, 2015, 8:22pm

Exactly. It’s no longer a string, it’s an octet sequence, with a specific length.

Mangetout · February 26, 2015, 8:24pm

Agreed. The algorithm can be ‘unbiased’ within any range of possible values

In the same sort of way that a six sided die is no more or less ‘random’ than a 12 sided die.

Learjeff · February 26, 2015, 8:24pm

That’s exactly what base64 is for. Unfortunately it takes more space, but it’s pretty trivial and fast to convert.

septimus · February 26, 2015, 8:30pm

Well, nobody’s “forced” into a lot of things. It becomes a matter of relative convenience. For some specific problem environments, something like OP’s proposal may often be best. I’ve done something like this at least a dozen times (though I think I’ve also written more code than most Dopers).

septimus · February 26, 2015, 8:35pm

Yes, but the simplicity of modular-256 arithmetic over mod-255 will often be too good to pass up.

Learjeff · February 26, 2015, 8:35pm

No, AdamF is.

Another advantage of base64 is that, since data is passed as strings, you may not know what kind of processing might be applied to “text strings”. If you stick to base64, every byte will contain a well-defined printable ASCII character, with the least likelihood of getting mangled.

With base64, your text gets 33% bigger (it takes 4 bytes to carry 3 bytes of data.) That’s the biggest disadvantage.

There’s no way to predict how many zeros you’ll get, so with a “zero-replacement” protocol, you have no idea how long your messages might be. If your protocol is efficient enough that the max buffer size can still handle your max cyphertext plus your zero-hiding trick, then sure, go for that if you prefer. But base64 will be recognized and understood by others: less to explain.

gazpacho · February 26, 2015, 8:45pm

PGP uses this or something like this for encrypting emails to ensure that the encrypted email is nice ASCII text with nothing weird in that that would get corrupted by some program on the way.

davidm · February 26, 2015, 9:01pm

What’s the objection to just using Base64? Is there some need to minimize the length or something?

septimus · February 26, 2015, 9:07pm

Base64 is a fine solution. But sometimes reducing the average length will have merit, e.g. compression schemes, or large tables.

ftg · February 26, 2015, 9:20pm

Pffft, these kids today with their fancy base64. In my day we used UUEncode and were glad to have it.

Reminder: the issue isn’t a string of 0s, it’s having a zero in a string.

I also don’t see why the length variable should be constrained to a byte.

Note: the OP’s question really has nothing to do with encryption. It’s just about dealing with nulls in strings. How they got there doesn’t really matter.

You can get really creative if you want to. Suppose the string has 53 null bytes. First say: “I’m sending you 54 strings.” Then send the string. At the other end, keep reading until you’ve got 54 strings, concatenating them together including the null byte. (And remembering that 2+ nulls in a row results in empty strings.)

And if you are using something like “getstring” on the other end, my God have mercy on your code.

bup · February 26, 2015, 9:38pm

Ah - I get it! (I think - at least, I can now do what I want, and understand how string to string works).

By taking the 4 byte integers that XTEA gives me, and dividing them out into 5 bytes (each one holding a number between 0 and 64), I now have a couple of extra bits. I can map the values 0-64 to any characters I want, thus I can make it a string with no zeroes.

And I can make it encoding-independent by doing it with math, instead of depending on what bits are where. Take the mod 64, then divide by 64, then take the mode 64 again…
I get a 5 digit base-64 number.

Thanks for all the input, everyone! Anyone I didn’t understand - I’m sorry.

I think I became a little bit better of a code jockey today!

Chronos · February 27, 2015, 1:16am

The other big advantage to using base64 is that it’s a standard technique, so there are standard tools for it. You probably could write your own implementation of it… but you don’t need to, because you can probably find a library for your language of choice that’s already implemented it.

Senegoid · February 27, 2015, 2:01am

Olde-tyme C programmers were routinely derided for their reluctance to use C Standard Library functions, preferring to re-invent all their own wheels every time.

This discussion clarifies exactly why.

How do C programmers hunt elephants?

How Programmers Hunt Elephants

AdamF · February 27, 2015, 2:44am

You are correct, my mistake.

septimus · February 27, 2015, 8:46am

Cpomments like this confuse me. How many programmer man-minutes do you think are needed to build base64 routines from scratch? Or the simple escape mechanism I described?

si_blakely · February 27, 2015, 10:32am

Oh, it might only take a few minutes to implement a straightforward base64 routine that works with some nice clean input. Dealing with the edge cases, error conditions, input that does not conform to expected parameters - that takes the time. And if you don’t take the time when you write it originally, you will take huge amounts of time trying to figure out the unexpected behaviour of your wider system.

This is why using a standard library routine that has all the edge cases/error conditions/every possible input covered does actually save time and make the code more reliable.

And, as always, I quote one of my Computing lecturers on implementation estimation - “take your original guess, double it, and shift to the next order of magnitude”

Experience has shown me that he was pretty accurate.

Topic		Replies	Views
Trying to understand creating my own file encryption program Factual Questions	25	3059	November 3, 2008
Compression Algorithims--I'm clearly missing something Factual Questions	36	4780	August 7, 2012
Encryption and C++ Factual Questions	23	2946	December 7, 2001
New Original Pure Math Question Factual Questions	9	1312	August 2, 2000
Help fight my ignorance on encryption Factual Questions	21	2453	August 18, 2007

In *general*, how do encryption algorithms prevent 0's mid-string?

Related topics

In general, how do encryption algorithms prevent 0's mid-string?