Forums

Understanding a DCT

Started by wond3rboy October 26, 2012
Hello, I am sorry for this naive question but it is something that has been
nagging me. I wanted to understand the interpretation of the DCT of any
image. These are the questions I have, hope someone can answer them for
me:

1. Are DCTs always done on 8x8 pixel sets? If so, each DCT block will
consist of 64 basis image coefficients( a white represents a presence and a
black indicates and absence).

2. Can DCTs be done on 16x16 pixel sets of an image? If so then each DCT
coefficient block will consist of 16x16=256 coefficients as weights for its
basis images?

3. What is the effect of performing a larger or smaller size DCT?

Thank you.
wond3rboy <47228@dsprelated> wrote:

> Hello, I am sorry for this naive question but it is something that has been > nagging me. I wanted to understand the interpretation of the DCT of any > image. These are the questions I have, hope someone can answer them for > me:
> 1. Are DCTs always done on 8x8 pixel sets? If so, each DCT block will > consist of 64 basis image coefficients( a white represents a presence and a > black indicates and absence).
> 2. Can DCTs be done on 16x16 pixel sets of an image? If so then each DCT > coefficient block will consist of 16x16=256 coefficients as weights for its > basis images?
> 3. What is the effect of performing a larger or smaller size DCT?
These sound suspiciously like homework, but I will answer them anyway. The answers won't be quite good enough to turn in, but might get you in the right direction. The DCT, in general, is one dimensional. Like the Fourier transform in general, and other discrete transforms, it is separable in rectangular coordinates. (Look up separable in any book on partial differential equations.) Being seperable makes the computation in 2D easier. In any case, the DCT itself can be, and often is done, on lengths other than 8. For image processing, 8x8 is popular, but it is a tradeoff that has to be made in any image compression algorithm based on DCT. If it works right, the 8x8 squares will not be visible in the decompressed image, but that isn't always true. Especially for MPEG, with fast changing scenes, they are often visible. I remember noticing it during fireworks for the 2008 olympics, first believing it to be a neat special effect, and then, after a few seconds, deciding that it was the side effect of the algorithm. DCT is preferred over DST or DFT for image compression, as the boundaries between squares are less visible. My guess is that as processing power increases that they will move to larger transforms like 16x16. Best would be to transform the whole image, but that takes too long. -- glen
wond3rboy wrote:
> Hello, I am sorry for this naive question but it is something that has been > nagging me. I wanted to understand the interpretation of the DCT of any > image. These are the questions I have, hope someone can answer them for > me: > > 1. Are DCTs always done on 8x8 pixel sets? If so, each DCT block will > consist of 64 basis image coefficients( a white represents a presence and a > black indicates and absence). > > 2. Can DCTs be done on 16x16 pixel sets of an image? If so then each DCT > coefficient block will consist of 16x16=256 coefficients as weights for its > basis images? > > 3. What is the effect of performing a larger or smaller size DCT? > > Thank you. >
if all else fails what is/are relevant definition(s) ?
Am 26.10.12 22:42, schrieb wond3rboy:
> 1. Are DCTs always done on 8x8 pixel sets? If so, each DCT block will > consist of 64 basis image coefficients( a white represents a presence and a > black indicates and absence).
As pointed out by glen, DCT like DFT can be done at any size, not even a power of 2. Power of 2 just leads to the fastest algorithm in general.
> 2. Can DCTs be done on 16x16 pixel sets of an image? If so then each DCT > coefficient block will consist of 16x16=256 coefficients as weights for its > basis images?
8x8 is the standard size for JPEG still image compression, but others are possible and used. For instance, H.264 or MPEG4 AVC can use 4x4 or 8x8. Nothing prevents you from applying basically the same principles with a 16x16 or even 17x13 transform
> 3. What is the effect of performing a larger or smaller size DCT?
The idea of using DCT is to detect correlations between the pixels and reduce the blocks to a small number of coefficients. This works well when you have a constant color or something like a smooth variation - in essence only low-frequency components. But have a look at the higher order basis functions for large block sizes. Unless you are taking photographs of a zebra, where could you find a regular spaced stripe-pattern in an arbitrarily cut block from an image? The inverse transform does exist mathematically, so these zebra patterns are needed to describe any possible input. When you increase the block size, you are looking for longer correlations at the expense of distortions at sudden changes, where the higher order basis functions are desperately needed. Look for "Gibbs phenomenon" if you are not familiar with that. Larger blocks o better in smooth areas, where they can exploit the long range correlations, whereas smaller blocks are better at edges, where locality is needed. Christian
"Christian Gollwitzer" <auriocus@gmx.de> wrote:

> 8x8 is the standard size for JPEG still image compression, but others are > possible and used. For instance, H.264 or MPEG4 AVC can use 4x4 or 8x8. > Nothing prevents you from applying basically the same principles with a > 16x16 or even 17x13 transform
I saw a book where the author proposed 3-dimensional DCT for moving picture compression. He claimed less of computing burden compared to traditional motion compensation. Vladimir Vassilevsky DSP and Mixed Signal Consultant www.abvolt.com
Thank you all for your replies. This not a homework, we had a lecture about
DCTs that just went over my head. I have gone through separability and
understood that the kernels for DCT (forward and reverse) are separable.
One more question I have is how to interpret DCT outputs? I wanted to know
whether my understanding is correct and would be very thankful if you would
help me out.

When I do a DCT on a 256x256 image in 8x8 pixel blocks. I should get 1024
blocks in the DCT output. Each block will consist of 64 basis
coefficients(represented by squares of intensity of white through black )
arranged in 8 rows and 8 columns. An intensity of white means a strong
presence and an intensity of black means no presence? In the top left
corner is the DC and than the increasing frequencies in both directions(to
the sides and downward). 

Thank you. 
wond3rboy <47228@dsprelated> wrote:

(snip)

> When I do a DCT on a 256x256 image in 8x8 pixel blocks. I should get 1024 > blocks in the DCT output. Each block will consist of 64 basis > coefficients(represented by squares of intensity of white through black ) > arranged in 8 rows and 8 columns. An intensity of white means a strong > presence and an intensity of black means no presence? In the top left > corner is the DC and than the increasing frequencies in both directions(to > the sides and downward).
For DCT, there is a choice for each end of having the boundary on or half way between sample points. Otherwise, f(x)=sum A(k)cos(k x pi/8) and g(y)=sum B(l)cos(l y pi/8) where the appropriate x, y, k, and l, depend on the boundary conditions. Then h(x,y)=sum C(k,l)cos(k x pi/8)cos(l y pi/8) The x's, y's, k's and l's are either integers or odd half integers. -- glen
>wond3rboy <47228@dsprelated> wrote: > >(snip) > >> When I do a DCT on a 256x256 image in 8x8 pixel blocks. I should get
1024
>> blocks in the DCT output. Each block will consist of 64 basis >> coefficients(represented by squares of intensity of white through black
)
>> arranged in 8 rows and 8 columns. An intensity of white means a strong >> presence and an intensity of black means no presence? In the top left >> corner is the DC and than the increasing frequencies in both
directions(to
>> the sides and downward). > >For DCT, there is a choice for each end of having the boundary on or >half way between sample points. Otherwise, > >f(x)=sum A(k)cos(k x pi/8) and g(y)=sum B(l)cos(l y pi/8) > >where the appropriate x, y, k, and l, depend on the boundary >conditions. > >Then h(x,y)=sum C(k,l)cos(k x pi/8)cos(l y pi/8) > >The x's, y's, k's and l's are either integers or odd >half integers. > >-- glen >
Thank you very much!