Which error-correcting code did I (re)discover?

I have a little project that requires error-correcting codes, but so far my research hasn’t really run across a good introduction to the subject. My specific need is for a very computationally efficient code–ideally something that works in a lookup table.

I ran across the BCH(7,4) code, which sounds fairly nice–encode 4 bits into 7, with 1-bit correction. Trouble is, 8 bits is better for my purposes, and I’d like to actually get something with that extra bit instead of just wasting it. Also, I couldn’t find any nice simplified descriptions of the code that would be useful in actually implementing it.

So I decided to just search for a code. I coded up a really naïve program that looks like it should take around 256!/(256-16)! operations, but in practice has enough early-outs that it finishes instantly. And indeed, it found a (8,4) code with 1-bit correction and 2-bit detection.

Here are the 16 codewords:
00000000
00001111
00110011
00111100
01010101
01011010
01100110
01101001
10010110
10011001
10100101
10101010
11000011
11001100
11110000
11111111

A very pretty pattern, I thought! Some nice symmetries in there. Each word has 0, 4, or 8 bits set. It’s obvious also that there is a general class of 1-bit correcting/2-bit detecting codes where the codewords have bitcounts of 4N. A single-bit error means you have 4N+1 or 4N-1 bits set, which can’t be confused with 4(N+1)-1 or 4(N-1)+1. There is of course overlap between 4N+2 and 4(N+1)-2, so for that you would only get detection.

Obviously, a code this simple has been discovered already–so which one is it? I perused the list on Wikipedia but didn’t really see a match.

I haven’t really checked, but isn’t it just the (8,4) Hamming code (the (7,4) code extended with an extra parity bit)? Here’s a list (right side of the table) of the codewords; from a quick glance, it looks like yours can be obtained from them by a bit shift.

Yes, it’s Hamming’s (7,4) augmented as Wiki desribes: “If, in addition, an overall parity bit (bit 0) is included, the code can detect (but not correct) any two-bit error, making a SECDED code.” Any such code can be rearranged without affecting its properties, as OP has done here to make certain symmetries more apparent.

(BTW, if you remove any column from your array, and all the rows with a 1 in that column, you’ll get an elegant 8-sized group frequently seen in combinatorial solutions.)

Once upon a time my work as circuit designer and diagnostician involved me with the nitty-gritty of IBM 370 ECC, similar to your scheme but (72,8) instead of (8,4). There were various details that seemed amusing when sleep-deprived. For example, the same circuitry that checked ECC also developed ordinary parity bits to pass on the bus to CPU. Those parity bits were generated before the data was corrected, so had to themselves be corrected. There were certain triple-bit error combinations which would cause Storage Data Bus Out parity checks! :wink:

Good catch, folks. Somehow I had missed that section of the Hamming wiki. Also, I suppose I should have guessed that it was a Hamming code, given that I observed it being a perfect code. It’s sorta trivially obvious that a perfect code for 1-bit ECC must have 2^n-1 length. 7 bits fits.

'Course, now I’m curious about the next one I found: a [12,7] SECDED code. It’s no longer a perfect code; it has 87.5% efficiency. Clearly not a Hamming. Spoilered, since it’s fairly big:

[spoiler]



000000000000
000000001111
000000110011
000000111100
000001010101
000001011010
000001100110
000001101001
000010010110
000010011001
000010100101
000010101010
000011000011
000011001100
000011110000
000011111111
001100000011
001100001100
001100110000
001100111111
001101010110
001101011001
001101100101
001101101010
001110010101
001110011010
001110100110
001110101001
001111000000
001111001111
001111110011
001111111100
010100000101
010100001010
010100110110
010100111001
010101010000
010101011111
010101100011
010101101100
010110010011
010110011100
010110100000
010110101111
010111000110
010111001001
010111110101
010111111010
011000000110
011000001001
011000110101
011000111010
011001010011
011001011100
011001100000
011001101111
011010010000
011010011111
011010100011
011010101100
011011000101
011011001010
011011110110
011011111001
100100000110
100100001001
100100110101
100100111010
100101010011
100101011100
100101100000
100101101111
100110010000
100110011111
100110100011
100110101100
100111000101
100111001010
100111110110
100111111001
101000000101
101000001010
101000110110
101000111001
101001010000
101001011111
101001100011
101001101100
101010010011
101010011100
101010100000
101010101111
101011000110
101011001001
101011110101
101011111010
110000000011
110000001100
110000110000
110000111111
110001010110
110001011001
110001100101
110001101010
110010010101
110010011010
110010100110
110010101001
110011000000
110011001111
110011110011
110011111100
111100000000
111100001111
111100110011
111100111100
111101010101
111101011010
111101100110
111101101001
111110010110
111110011001
111110100101
111110101010
111111000011
111111001100
111111110000
111111111111


[/spoiler]The same patterns exist here as in my original code. As expected, each codeword has 0, 4, 8, or 12 bits set. I also find it curious that a totally naïve search is still instant. It never actually has to backtrack: the algorithm is really just “pick the first unused codeword that doesn’t violate the neighbor constraints, and repeat”.

I was hoping that a [12,8] SEC code existed, since 2^12/(1+12) > 2^8. But that hasn’t been forthcoming.

(I see I got the terminology backward – what I called IBM’s (72,8) code was actually a (72,64) code.)

The key bound, IIRC, is that K check bits can provide SECDED for at most 2[SUP]K-1[/SUP] total bits, i.e. (2[SUP]K-1[/SUP]-K) data bits. So your 2nd code, with K=5, could check 11 data bits; and thus has an easy job checking only 7 data bits.

For SEC, instead of SECDED, a check bit can be saved so your (12,8) is no problem.

Ok, interesting. I’m not a huge fan of thinking of codes in terms of check bits–as I see it, there’s no reason to make any distinction between data and check bits–but a bound is a bound. As it turns out, the only reason I couldn’t find a [12,8] SEC code was a typo on my part. Only 81% efficiency, but that’s still not bad.

I agree; and the notion of “data bit” will degrade if you come up with a code where the number of valid states is not a power of two.

OTOH, my background is in data processing where … well, data bits are rather key! :cool:

Indeed. A few times, I did try to see if I could pack one more codeword in for configurations with a low efficiency, but so far I haven’t found a case where it works. Doesn’t mean it can’t, though.

Fair enough! FWIW, my application is for flash memory on a nanosatellite. Bitflips happen…

For any real, practical error correcting code, you want to first figure out what sorts of errors you’re most likely to have. Then there are considerations like how common you expect errors to be, what the consequences are for lost data, and whether you have two-way communications or only one-way.

For instance, if errors are rare enough and you have two-way communication, you could just stick a single parity bit on a whole bunch of data bits, and if it doesn’t match up, just ask the sender to re-send that whole bunch. Alternately, though, if you’ve got transposition errors, a single parity bit will never notice the difference. Or maybe your errors are rare but tend to clump, such that you can get two, or three, or a dozen bits in a row all flipped, or all forced to 0, or all forced to 1. Maybe you’ve got problems with analog to digital conversion, such that some bits are in between and you might assign them to the wrong end, but you’re pretty sure which bits are the suspect ones.

Rather than re-inventing the wheel, you probably want to dig into the literature and find someone else who’s been working to similar constraints to yours, and just use their solution off-the-shelf.

We could probably stand to do more digging, but as far as we know, single-bit correction is good for our purposes. This is for flash memory, which as it degrades tends to produce isolated stuck bits. There’s no real clustering, so there’s no need to handle long strings of errors, like you’d get with radio communication or optical storage.

We’ll also have block-level CRC, though that’s more to detect if a flash page is damaged enough that we shouldn’t use it again. Obviously, since this is storage, there’s no ability to rerequest the data.

Also, we have resource constraints, both in physical hardware and manpower. A 90% solution that ships is better than a 99.9% solution that doesn’t. It’s not the end of the world if some errors still fall through the cracks, but it would be nice to have some basic protection.