Hello, I am sorry for this naive question but it is something that has been nagging me. I wanted to understand the interpretation of the DCT of any image. These are the questions I have, hope someone can answer them for me: 1. Are DCTs always done on 8x8 pixel sets? If so, each DCT block will consist of 64 basis image coefficients( a white represents a presence and a black indicates and absence). 2. Can DCTs be done on 16x16 pixel sets of an image? If so then each DCT coefficient block will consist of 16x16=256 coefficients as weights for its basis images? 3. What is the effect of performing a larger or smaller size DCT? Thank you.

# Understanding a DCT

Started by ●October 26, 2012

Reply by ●October 26, 20122012-10-26

wond3rboy <47228@dsprelated> wrote:> Hello, I am sorry for this naive question but it is something that has been > nagging me. I wanted to understand the interpretation of the DCT of any > image. These are the questions I have, hope someone can answer them for > me:> 1. Are DCTs always done on 8x8 pixel sets? If so, each DCT block will > consist of 64 basis image coefficients( a white represents a presence and a > black indicates and absence).> 2. Can DCTs be done on 16x16 pixel sets of an image? If so then each DCT > coefficient block will consist of 16x16=256 coefficients as weights for its > basis images?> 3. What is the effect of performing a larger or smaller size DCT?These sound suspiciously like homework, but I will answer them anyway. The answers won't be quite good enough to turn in, but might get you in the right direction. The DCT, in general, is one dimensional. Like the Fourier transform in general, and other discrete transforms, it is separable in rectangular coordinates. (Look up separable in any book on partial differential equations.) Being seperable makes the computation in 2D easier. In any case, the DCT itself can be, and often is done, on lengths other than 8. For image processing, 8x8 is popular, but it is a tradeoff that has to be made in any image compression algorithm based on DCT. If it works right, the 8x8 squares will not be visible in the decompressed image, but that isn't always true. Especially for MPEG, with fast changing scenes, they are often visible. I remember noticing it during fireworks for the 2008 olympics, first believing it to be a neat special effect, and then, after a few seconds, deciding that it was the side effect of the algorithm. DCT is preferred over DST or DFT for image compression, as the boundaries between squares are less visible. My guess is that as processing power increases that they will move to larger transforms like 16x16. Best would be to transform the whole image, but that takes too long. -- glen

Reply by ●October 26, 20122012-10-26

wond3rboy wrote:> Hello, I am sorry for this naive question but it is something that has been > nagging me. I wanted to understand the interpretation of the DCT of any > image. These are the questions I have, hope someone can answer them for > me: > > 1. Are DCTs always done on 8x8 pixel sets? If so, each DCT block will > consist of 64 basis image coefficients( a white represents a presence and a > black indicates and absence). > > 2. Can DCTs be done on 16x16 pixel sets of an image? If so then each DCT > coefficient block will consist of 16x16=256 coefficients as weights for its > basis images? > > 3. What is the effect of performing a larger or smaller size DCT? > > Thank you. >if all else fails what is/are relevant definition(s) ?

Reply by ●October 27, 20122012-10-27

Am 26.10.12 22:42, schrieb wond3rboy:> 1. Are DCTs always done on 8x8 pixel sets? If so, each DCT block will > consist of 64 basis image coefficients( a white represents a presence and a > black indicates and absence).As pointed out by glen, DCT like DFT can be done at any size, not even a power of 2. Power of 2 just leads to the fastest algorithm in general.> 2. Can DCTs be done on 16x16 pixel sets of an image? If so then each DCT > coefficient block will consist of 16x16=256 coefficients as weights for its > basis images?8x8 is the standard size for JPEG still image compression, but others are possible and used. For instance, H.264 or MPEG4 AVC can use 4x4 or 8x8. Nothing prevents you from applying basically the same principles with a 16x16 or even 17x13 transform> 3. What is the effect of performing a larger or smaller size DCT?The idea of using DCT is to detect correlations between the pixels and reduce the blocks to a small number of coefficients. This works well when you have a constant color or something like a smooth variation - in essence only low-frequency components. But have a look at the higher order basis functions for large block sizes. Unless you are taking photographs of a zebra, where could you find a regular spaced stripe-pattern in an arbitrarily cut block from an image? The inverse transform does exist mathematically, so these zebra patterns are needed to describe any possible input. When you increase the block size, you are looking for longer correlations at the expense of distortions at sudden changes, where the higher order basis functions are desperately needed. Look for "Gibbs phenomenon" if you are not familiar with that. Larger blocks o better in smooth areas, where they can exploit the long range correlations, whereas smaller blocks are better at edges, where locality is needed. Christian

Reply by ●October 27, 20122012-10-27

"Christian Gollwitzer" <auriocus@gmx.de> wrote:> 8x8 is the standard size for JPEG still image compression, but others are > possible and used. For instance, H.264 or MPEG4 AVC can use 4x4 or 8x8. > Nothing prevents you from applying basically the same principles with a > 16x16 or even 17x13 transformI saw a book where the author proposed 3-dimensional DCT for moving picture compression. He claimed less of computing burden compared to traditional motion compensation. Vladimir Vassilevsky DSP and Mixed Signal Consultant www.abvolt.com

Reply by ●October 27, 20122012-10-27

Thank you all for your replies. This not a homework, we had a lecture about DCTs that just went over my head. I have gone through separability and understood that the kernels for DCT (forward and reverse) are separable. One more question I have is how to interpret DCT outputs? I wanted to know whether my understanding is correct and would be very thankful if you would help me out. When I do a DCT on a 256x256 image in 8x8 pixel blocks. I should get 1024 blocks in the DCT output. Each block will consist of 64 basis coefficients(represented by squares of intensity of white through black ) arranged in 8 rows and 8 columns. An intensity of white means a strong presence and an intensity of black means no presence? In the top left corner is the DC and than the increasing frequencies in both directions(to the sides and downward). Thank you.

Reply by ●October 28, 20122012-10-28

wond3rboy <47228@dsprelated> wrote: (snip)> When I do a DCT on a 256x256 image in 8x8 pixel blocks. I should get 1024 > blocks in the DCT output. Each block will consist of 64 basis > coefficients(represented by squares of intensity of white through black ) > arranged in 8 rows and 8 columns. An intensity of white means a strong > presence and an intensity of black means no presence? In the top left > corner is the DC and than the increasing frequencies in both directions(to > the sides and downward).For DCT, there is a choice for each end of having the boundary on or half way between sample points. Otherwise, f(x)=sum A(k)cos(k x pi/8) and g(y)=sum B(l)cos(l y pi/8) where the appropriate x, y, k, and l, depend on the boundary conditions. Then h(x,y)=sum C(k,l)cos(k x pi/8)cos(l y pi/8) The x's, y's, k's and l's are either integers or odd half integers. -- glen

Reply by ●October 29, 20122012-10-29

>wond3rboy <47228@dsprelated> wrote: > >(snip) > >> When I do a DCT on a 256x256 image in 8x8 pixel blocks. I should get1024>> blocks in the DCT output. Each block will consist of 64 basis >> coefficients(represented by squares of intensity of white through black)>> arranged in 8 rows and 8 columns. An intensity of white means a strong >> presence and an intensity of black means no presence? In the top left >> corner is the DC and than the increasing frequencies in bothdirections(to>> the sides and downward). > >For DCT, there is a choice for each end of having the boundary on or >half way between sample points. Otherwise, > >f(x)=sum A(k)cos(k x pi/8) and g(y)=sum B(l)cos(l y pi/8) > >where the appropriate x, y, k, and l, depend on the boundary >conditions. > >Then h(x,y)=sum C(k,l)cos(k x pi/8)cos(l y pi/8) > >The x's, y's, k's and l's are either integers or odd >half integers. > >-- glen >Thank you very much!