comp.dsp | Understanding a DCT

Hello, I am sorry for this naive question but it is something that has been
nagging me. I wanted to understand the interpretation of the DCT of any
image. These are the questions I have, hope someone can answer them for
me:

1. Are DCTs always done on 8x8 pixel sets? If so, each DCT block will
consist of 64 basis image coefficients( a white represents a presence and a
black indicates and absence).

2. Can DCTs be done on 16x16 pixel sets of an image? If so then each DCT
coefficient block will consist of 16x16=256 coefficients as weights for its
basis images?

3. What is the effect of performing a larger or smaller size DCT?

Thank you.

Reply by glen herrmannsfeldt ●October 26, 20122012-10-26

wond3rboy <47228@dsprelated> wrote:

> Hello, I am sorry for this naive question but it is something that has been
> nagging me. I wanted to understand the interpretation of the DCT of any
> image. These are the questions I have, hope someone can answer them for
> me:

> 1. Are DCTs always done on 8x8 pixel sets? If so, each DCT block will
> consist of 64 basis image coefficients( a white represents a presence and a
> black indicates and absence).

> 2. Can DCTs be done on 16x16 pixel sets of an image? If so then each DCT
> coefficient block will consist of 16x16=256 coefficients as weights for its
> basis images?

> 3. What is the effect of performing a larger or smaller size DCT?

These sound suspiciously like homework, but I will answer them anyway.

The answers won't be quite good enough to turn in, but might get you
in the right direction.

The DCT, in general, is one dimensional. Like the Fourier transform
in general, and other discrete transforms, it is separable in
rectangular coordinates. (Look up separable in any book on partial
differential equations.)

Being seperable makes the computation in 2D easier.

In any case, the DCT itself can be, and often is done, on lengths
other than 8. For image processing, 8x8 is popular, but it is a
tradeoff that has to be made in any image compression algorithm
based on DCT. 

If it works right, the 8x8 squares will not be visible in the
decompressed image, but that isn't always true. Especially for MPEG,
with fast changing scenes, they are often visible. 

I remember noticing it during fireworks for the 2008 olympics,
first believing it to be a neat special effect, and then, after
a few seconds, deciding that it was the side effect of the
algorithm.

DCT is preferred over DST or DFT for image compression, as
the boundaries between squares are less visible. 

My guess is that as processing power increases that they will
move to larger transforms like 16x16.

Best would be to transform the whole image, but that takes too long.

-- glen

Reply by Richard Owlett ●October 26, 20122012-10-26

wond3rboy wrote:
> Hello, I am sorry for this naive question but it is something that has been
> nagging me. I wanted to understand the interpretation of the DCT of any
> image. These are the questions I have, hope someone can answer them for
> me:
>
> 1. Are DCTs always done on 8x8 pixel sets? If so, each DCT block will
> consist of 64 basis image coefficients( a white represents a presence and a
> black indicates and absence).
>
> 2. Can DCTs be done on 16x16 pixel sets of an image? If so then each DCT
> coefficient block will consist of 16x16=256 coefficients as weights for its
> basis images?
>
> 3. What is the effect of performing a larger or smaller size DCT?
>
> Thank you.
>


if all else fails
what is/are relevant definition(s) ?

Reply by Christian Gollwitzer ●October 27, 20122012-10-27

Am 26.10.12 22:42, schrieb wond3rboy:
> 1. Are DCTs always done on 8x8 pixel sets? If so, each DCT block will
> consist of 64 basis image coefficients( a white represents a presence and a
> black indicates and absence).

As pointed out by glen, DCT like DFT can be done at any size, not even a 
power of 2. Power of 2 just leads to the fastest algorithm in general.

> 2. Can DCTs be done on 16x16 pixel sets of an image? If so then each DCT
> coefficient block will consist of 16x16=256 coefficients as weights for its
> basis images?


8x8 is the standard size for JPEG still image compression, but others 
are possible and used. For instance, H.264 or MPEG4 AVC can use 4x4 or 
8x8. Nothing prevents you from applying basically the same principles 
with a 16x16 or even 17x13 transform

> 3. What is the effect of performing a larger or smaller size DCT?

The idea of using DCT is to detect correlations between the pixels and 
reduce the blocks to a small number of coefficients. This works well 
when you have a constant color or something like a smooth variation - in 
essence only low-frequency components. But have a look at the higher 
order basis functions for large block sizes. Unless you are taking 
photographs of a zebra, where could you find a regular spaced 
stripe-pattern in an arbitrarily cut block from an image?

The inverse transform does exist mathematically, so these zebra patterns 
are needed to describe any possible input. When you increase the block 
size, you are looking for longer correlations at the expense of 
distortions at sudden changes, where the higher order basis functions 
are desperately needed. Look for "Gibbs phenomenon" if you are not 
familiar with that. Larger blocks o better in smooth areas, where they 
can exploit the long range correlations, whereas smaller blocks are 
better at edges, where locality is needed.

	Christian

Reply by Vladimir Vassilevsky ●October 27, 20122012-10-27

"Christian Gollwitzer" <auriocus@gmx.de> wrote:

> 8x8 is the standard size for JPEG still image compression, but others are 
> possible and used. For instance, H.264 or MPEG4 AVC can use 4x4 or 8x8. 
> Nothing prevents you from applying basically the same principles with a 
> 16x16 or even 17x13 transform

I saw a book where the author proposed 3-dimensional DCT for moving picture 
compression. He claimed less of computing burden compared to traditional 
motion compensation.

Vladimir Vassilevsky
DSP and Mixed Signal Consultant
www.abvolt.com

Reply by wond3rboy ●October 27, 20122012-10-27

Thank you all for your replies. This not a homework, we had a lecture about
DCTs that just went over my head. I have gone through separability and
understood that the kernels for DCT (forward and reverse) are separable.
One more question I have is how to interpret DCT outputs? I wanted to know
whether my understanding is correct and would be very thankful if you would
help me out.

When I do a DCT on a 256x256 image in 8x8 pixel blocks. I should get 1024
blocks in the DCT output. Each block will consist of 64 basis
coefficients(represented by squares of intensity of white through black )
arranged in 8 rows and 8 columns. An intensity of white means a strong
presence and an intensity of black means no presence? In the top left
corner is the DC and than the increasing frequencies in both directions(to
the sides and downward). 

Thank you.

Reply by glen herrmannsfeldt ●October 28, 20122012-10-28

wond3rboy <47228@dsprelated> wrote:

(snip)

> When I do a DCT on a 256x256 image in 8x8 pixel blocks. I should get 1024
> blocks in the DCT output. Each block will consist of 64 basis
> coefficients(represented by squares of intensity of white through black )
> arranged in 8 rows and 8 columns. An intensity of white means a strong
> presence and an intensity of black means no presence? In the top left
> corner is the DC and than the increasing frequencies in both directions(to
> the sides and downward). 

For DCT, there is a choice for each end of having the boundary on or
half way between sample points. Otherwise, 

f(x)=sum A(k)cos(k x pi/8) and g(y)=sum B(l)cos(l y pi/8)

where the appropriate x, y, k, and l, depend on the boundary
conditions.

Then h(x,y)=sum C(k,l)cos(k x pi/8)cos(l y pi/8)

The x's, y's, k's and l's are either integers or odd
half integers.

-- glen

Reply by wond3rboy ●October 29, 20122012-10-29

>wond3rboy <47228@dsprelated> wrote:
>
>(snip)
>
>> When I do a DCT on a 256x256 image in 8x8 pixel blocks. I should get
1024
>> blocks in the DCT output. Each block will consist of 64 basis
>> coefficients(represented by squares of intensity of white through black
)
>> arranged in 8 rows and 8 columns. An intensity of white means a strong
>> presence and an intensity of black means no presence? In the top left
>> corner is the DC and than the increasing frequencies in both
directions(to
>> the sides and downward). 
>
>For DCT, there is a choice for each end of having the boundary on or
>half way between sample points. Otherwise, 
>
>f(x)=sum A(k)cos(k x pi/8) and g(y)=sum B(l)cos(l y pi/8)
>
>where the appropriate x, y, k, and l, depend on the boundary
>conditions.
>
>Then h(x,y)=sum C(k,l)cos(k x pi/8)cos(l y pi/8)
>
>The x's, y's, k's and l's are either integers or odd
>half integers.
>
>-- glen
>

Thank you very much!

Understanding a DCT

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group