Like when you zip a file, how is data integrity maintained? I know that binary is just 1s and 0s but how do you compress them?
Here’s a link that explains the basics.
Zipping programs take advantage of the fact that most computer files contain repeating patterns of information. Think about how many times the word ‘the’ appears in this thread. If we could replace all of them with a number referencing ‘the’ in a seperate table, we’d only need to save the pattern once and the rest of them we could replace with an integer. This is a rough approximation of the LZW compression algorithm that drives a zip compresser.
Thanks for the links.
Since atypicalcarl provided a link for info on compression, I’ll just add a bit about data integrity. What you do, is take all the bits in a given “chunk” of data, and “mix” them all up, in what is known as a hash function. You then append the result of the hash function (called a “checksum” for purposes of data integrity) to the end of that chunk. Then, you proceed to the next chunk, and perform the same function on that, and so forth. This adds a bit of redundancy to the data, but the overhead is fairly minimal. On read-back, for every chunk, you again calculate its checksum, and compare it to the value right after the chunk. If these don’t match, you know there’s been some corruption.
So, if your chunk size is 4096 bytes, and your checksum is 64 bits, in the ideal case you expand your data size by 0.2%, while insuring that random corruption of the data has only 1 chance in 18,446,744,073,709,551,616 of not being detected.