Discrete Cosine Transform - why does it work?

Reuben · September 25, 2001, 10:27am

One for the maths buffs…

The Discrete Cosine Transform (DCT) is used in JPEG and MPEG image compression. Given an input image (a 2D array of color values) the transform calculates a same-sized array of co-efficients that is very easy to compress into a much smaller amount of data without losing much picture information.

My question is: Why does it work?

It really is an incredible trick - just by multiplying each original pixel against a series of cosine values, you get something that’s easy to compress. But why??

I know this is a rather obscure question, and am expecting 0 posts, but hopefully someone out there will have the answer…

Ms2001 · September 25, 2001, 11:27am

The transformed data isn’t inherently more compressible, it’s just that the human eye will notice the compression less if it’s done that way. You don’t notice one gradient changing into a slightly different gradient the way you might notice pixels getting slightly blockier.

That’s why JPEG is mostly only good for compressing photos - try compressing a line drawing or text (something with a lot of sharp edges), and you can easily tell the difference. You need to combine a lot of cosines to get something that doesn’t look like a gradient, and any loss will ruin it.

sailor · September 25, 2001, 12:08pm

Yes, JPEG works best with graphics with many colors and not sharp or strong transitions, like photos. GIF works best with graphics with a limited number of flat colors which means sharp transitions.

Ms2001 · September 25, 2001, 6:14pm

http://www-mtl.mit.edu/Courses/6.095/notes/dct.html shows the specific cosine patterns that are used by DCT. When compressing an image with sharp edges, you might see bits of garbage that look like these patterns… that’s because a block with sharp edges most likely has some component of all the patterns, and if one of them is dropped it leaves a pattern-shaped ghost.

JPEG considers high-frequency data less important than low-frequency data (since it appears less in natural scenes), so you’ll see artifacts like the patterns toward the lower right more often than the ones toward the upper left.

KeithB · September 25, 2001, 6:34pm

I might be wrong wrt to the DCT, but in general a transform does not yield data that is more easy to compress. The actual transform does not get rid of any data. What it does is change the data into a different domain that will make it easier to get at and throw away the data that is not needed in your case – in the case of JPEG, it is the detail you cannot see.

In the case of the DCT, the throwing away of data might be part of the transform, so the above is not strictly true.

Arjuna34 · September 25, 2001, 6:48pm

You’re right, KeithB. The DCT is just a linear transform, completely invertible. What it does do is separate the low-frequency changes from the high-frequency changes (i.e. quickly changing patterns in the image). Since the eye doesn’t see high-frequency patterns well, these are quantitized more heavily (i.e. with fewer bits) than the low frequency ones, or even discarded. When you put the image back together with the inverse DCT, it looks pretty much the same (hopefully).

The pixels aren’t multiplied by cosines, what happens is that the image is decomposed into a series of 2D cosines. It can be shown that any signal (of any dimension, such as 1D for speech or WAV files, or 2D for images) can be broken down into a summation of cosines of various frequencies, phases, and amplitudes. The “coefficients” are what’s computed with the DCT- the exact amplitude and phase of each cosine, at predefined frequencies (i.e. acos(2pi*w+p), where a is the amplitude, w is the frequency, p is the phase, and pi =3.14…). If you apply these coefficients to the cosines, and add all the cosines up, you’ll get the original image back. Thus, the coefficients represent the image. Actually, the image is broken up into small blocks (8x8 pixels is typical), and the coefficients are found for each block.

Now, if you keep the coefficients exactly, you haven’t compressed anything. But since the eye is less sensitive to high frequency changes, the coefficients of the high-frequency cosines can be encoded with fewer bits, or even discarded. These compressed coefficients are then saved as a JPG file.

Then, to put the image back together, you apply the compressed coefficients to the cosines, and add them up.

There are an infinite number of ways to decompose an image, cosines using the DCT is just one.

Arjuna34

KeithB · September 25, 2001, 7:35pm

Thanks for the confirmation, Arunja43.

One other point about JPEG: They separate the image into brightness and color components and take advantage of the deficiencies in human sight to compress things:

http://www.faqs.org/faqs/jpeg-faq/

Reuben · September 26, 2001, 9:10am

Thanks guys! I understand it a lot better now - much obliged!

sailor · September 26, 2001, 1:04pm

OTOH the Indiscreet Cosine Transform is worthless a it has no discernment or good judgment.

Crafter_Man · September 26, 2001, 1:07pm

Just a nit pick… there is a difference between data compression and data reduction. Aren’t we really discussing the latter?

Topic		Replies	Views
Software to recover from jpeg compression better? Factual Questions	0	739	November 13, 2007
File compression: the general case Factual Questions	61	2290	November 2, 2001
Image Compression... Factual Questions	4	811	November 27, 2000
Have I revolutionised video compression? Factual Questions	7	779	November 22, 2000
How does file compression work? Factual Questions	38	2434	August 3, 2002

Discrete Cosine Transform - why does it work?

Related topics