why DCT

Started by May 3, 2005
```Hi guys,
I was curious to know why is it that DCT is used in image compression.
why not FFT?

```
```never mind, I got it.

```
```Neo wrote:
>
> never mind, I got it.

You got it?
I doubt it.
Can you elaborate?

Regards
Guido
```
```Closest approximation to the Karhonen Loeve transform?

```
```porterbo...@yahoo.com wrote:

> Closest approximation to the Karhonen Loeve transform?

Same acronym as the Dallas Children's Theater? (www.dct.org)

Ciao,

Peter K.

```
```porterboy76@yahoo.com wrote:
>
> Closest approximation to the Karhonen Loeve transform?

That is usually mentioned, but I have discovered a more
compelling reason.
But I would like to know what the original poster has found.

Regards
Guido
```
```Neo wrote:

> I was curious to know why is it that DCT is used in image compression.
> why not FFT?

The difference in use between DST, DCT, and DFT are the boundary
conditions.  (This is explained in Numerical Recipes, where I looked
when I wondered some time ago.)

DFT has periodic boundary conditions, DST has the function go to
zero at the boundary, and DCT has the derivative go to zero at the
boundary.  It seems that DCT has a smaller effect on the image
than the others.   That is, the effect of the blocks is less
visible with DCT than DFT or DST.   FFT is an algorithm,
or class of algorithms, for evaluating DFT.

-- glen

```
```glen herrmannsfeldt wrote:
> Neo wrote:
>
>
> > I was curious to know why is it that DCT is used in image
compression.
> > why not FFT?
>
>
> The difference in use between DST, DCT, and DFT are the boundary
> conditions.  (This is explained in Numerical Recipes, where I looked
> when I wondered some time ago.)
>
> DFT has periodic boundary conditions, DST has the function go to
> zero at the boundary, and DCT has the derivative go to zero at the
> boundary.  It seems that DCT has a smaller effect on the image
> than the others.   That is, the effect of the blocks is less
> visible with DCT than DFT or DST.   FFT is an algorithm,
> or class of algorithms, for evaluating DFT.
>
> -- glen

Hi

Can you explain what is meant by "derivative go to zero" or
"function
go to zero" ? and when does it go to zero ?

Thanks
Sandeep

```
```guido,
I found out that the DCT distributes most of the energy in the lower
order co-efficients compared to the FFT and also gives a purely real
output. This energy localisation aids in more efficient encoding later
on.
Is this reasonable?

```
```Neo wrote:
>
> guido,
> I found out that the DCT distributes most of the energy in the lower
> order co-efficients compared to the FFT and also gives a purely real
> output. This energy localisation aids in more efficient encoding later
> on.
> Is this reasonable?

No!;-)
Your question was why is the DCT used in *image* compression!
The general term "energy compaction" does not explain why it is good
just for *image* compression, you could say the same thing for any
other compression object.

The actual reason is much simpler, and therefore apparently very
difficult to recognize by complicated-thinking people.

Here is the explanation:

What are people doing when they have a bunch of images and want
a quick preview?  They use thumbnails!  What are thumbnails?
Thumbnails are small downscaled versions of the original image!
If you want more details of the image, you can zoom in stepwise
by enlarging (upscaling) the image.
That is the key to understanding the use of DCT for image compression:

The fundamental property of lossy image compression is the similarity
of different resolutions of the same image.  "Lossy" compression means
that we assign *the same* output representation to *multiple*, *similar*
input representations.  The basic similarity relation for images is
resolution, or scale, invariance:  If we see the same image in different
resolutions (scales, sizes), or the same subject from different distances,
we talk about *the same* image (or subject).

The  DCT provides the best resolution separation property for digital images.
The 8-point DCT gives you 8 linearly increasing resolution representations
from 8 spatial sample values.  You can hardly do better than that.
Wavelet transforms, as used in JPEG2000, for example, do *not*
provide such optimal resolution separation.

Everybody who knows the DCT knows that the DC term represents a
1/8 scale  of the input sequence ("thumbnail" version).
The DC and first AC together represent a 2/8 or 1/4 scale of the
input sequence.  The DC and first 2 ACs together represent a 3/8
scale of the input sequence, and so on.
Every DCT coefficient adds corresponding resolution detail.
This is easy to demonstrate, but was not known before.
(See new JPEG scaling features presented at http://jpegclub.org
which directly apply this property.)
This fundamental DCT property explains why the DCT is the best
transform for image compression.

Regards
Guido
```
```Sorry, I should have said an odd-order *generalized* DFT (the GDFT
includes arbitrary shifts in the input/output indices).  To express the
half-sample shifts in terms of an ordinary DFT, as you noticed, you
need to multiply by 2 or 4.

e.g. the DCT-4 and DCT-8 correspond to what is sometimes called an "O^2
DFT", i.e. half-sample shifts in the input and output, for which to
convert to a regular DFT you have to multiply N by 4 and interleave
with zeros.

Steven

```
```<stevenj@alum.mit.edu> wrote in message

> Actually, there are eight DCTs, although only four are commonly used.
> This is easy to see if you realize that DCTs are just real-even DFTs
> with optional half-sample shifts in the input and/or output.  There are
> 4 possible half-sample shifts, and the "logical" DFT can have a
length
> that is either even (DCTs I-IV) or odd (DCTs V-VIII).

OK, this is just one of DSP things I just don't get, so please
educate me:

Transform        Cosine argument             Normalized
DCT-1             j*k*pi/(N-1)               j*k*2*pi/(2*N-2)
DCT-2       (j+1/2)*k*pi/N             (2*j+1)*k*2*pi/(4*N)
DCT-3       j*(k+1/2)*pi/N             j*(2*k+1)*2*pi/(4*N)
DCT-4 (j+1/2)*(k+1/2)*pi/N       (2*j+1)*(2*k+1)*2*pi/(8*N)
DCT-5             j*k*pi/(N-1/2)             j*k*2*pi/(2*N-1)
DCT-6       (j+1/2)*k*pi/(N-1/2)       (2*j+1)*k*2*pi/(4*N-2)
DCT-7       j*(k+1/2)*pi/(N-1/2)       j*(2*k+1)*2*pi/(4*N-2)
DCT-8 (j+1/2)*(k+1/2)*pi/(N+1/2) (2*j+1)*(2*k+1)*2*pi/(8*N+4)

If you normalize the numerator and denominator of the arguments
of the cosines in the various transforms so that the numerator
is the product of integers times 2*pi, as would be the case in
the associated DFT, only the DCT-5 has an odd denominator. DCTs
6 & 7 seem to come from DFTs that have order twice an odd number
and DCT-8 from order 4X an odd number, but I don't see the odd
order DFT associated with these orders.

--
write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
6.0134700243160014d-154/),(/'x'/)); end

```
```glen herrmannsfeldt wrote:
> The book by Rao and Hwang, "Techniques and Standards for Image,
Video,
> and Audio Coding" has a description of the four DCT's.  That is, with
> the forward and inverse samples at integer or half integer points.

Actually, there are eight DCTs, although only four are commonly used.
This is easy to see if you realize that DCTs are just real-even DFTs
with optional half-sample shifts in the input and/or output.  There are
4 possible half-sample shifts, and the "logical" DFT can have a length
that is either even (DCTs I-IV) or odd (DCTs V-VIII).

Steven G. Johnson

```
```jim wrote:

(snip)

> Yes, I'll buy that. The 'interval boundary' is between pixels and the
> period of each interval extends from the mid point of the previous
> adjoining interval to the mid-point of the next adjoining interval. It's
> essentially what you get when you take the DFT and apply the Shift
> Theorem with a non-integer half sample shift.
> 	But I said in my previous post the derivative at the "first and last
> sample point" which is what I interpreted Glen's comments to mean.

It was supposed to be the boundary condition for the transform, which,
as you say, in this case is not at a sample point.

(To me a pixel has a finite width, as in an image, where a sample point
has no width.  I don't know if that is standard notation, though.)

The book by Rao and Hwang, "Techniques and Standards for Image, Video,
and Audio Coding" has a description of the four DCT's.  That is, with
the forward and inverse samples at integer or half integer points.

They call DCT-II (roman numeral two) the one with forward samples at
half integers, and inverse at integers.  At some point they indicate
that this is the one that DCT means through the rest of the book.

Somehow the symmetry of DCT-I and DCT-IV seem nicer to me, but DCT-II
seems to be what is being used.

-- glen

```
```
Guido Vollbeding wrote:
>
> jim wrote:
> >
> > The DCT coefficients are discrete numbers they don't have derivatives.
> > But, you are talking about the underlying continuous function. The DCT
> > used in JPEG is not really a cosine function. Its something like
> > cos(pi*(2*x+1)/16) (for x:0..7) so the derivative of underlying function
> > at the first and last sample point is different for each of the 8
> > frequencies.
>
> Right.
> I will present a clear diagram and explanation later when uploading
> my draft magazine article on http://jpegclub.org .
>
> Here is an excerpt (in German, sorry, too lazy to translate now):
>
>  "Ausgangspunkt sind kontinuierliche Cosinus-Schwingungen mit
>   unterschiedlicher Frequenz:
>
>     cos t u pi   ;   u = 0,1,...,7   ;   0 < t < 1
>
>   u ist der diskrete Index mit wachsender Frequenz, t ist die
>   kontinuierliche Variable im Interval 0 bis 1.
>   So ergibt sich f&#2013266172;r u = 0 eine konstante Funktion,
f&#2013266172;r u = 1 ergibt
>   sich eine halbe Cosinus-Vollschwingung, f&#2013266172;r u = 2 eine ganze
>   Cosinus-Vollschwingung usw.
>   Die kontinuierliche Variable t wird nun statt im ganzen Interval
>   0 bis 1 an den diskreten Stellen
>
>         2 x + 1
>     t = -------   ;   x = 0,1,...,7
>           16
>
>   abgetastet. So entsteht innerhalb des Intervals (Block bei JPEG)
>   eine &#2013265924;quidistante Abtastung mit Abstand 1/8 zwischen den
Punkten,
>   an den Intervalgrenzen (Blockgrenzen bei JPEG) vorn und hinten
>   ein Abstand von 1/16 zum Randpunkt. Auf diese Weise entsteht
>   durch die Aneinanderreihung von Bl&#2013266166;cken bei JPEG ein
gleichm&#2013265924;&#2013265951;iger
>   Abstand (1/8) zwischen den Pixeln &#2013266172;ber das gesamte
Bild."
>
> So the slope of the underlying continuous functions is indeed zero
> at the interval boundaries.

Yes, I'll buy that. The 'interval boundary' is between pixels and the
period of each interval extends from the mid point of the previous
essentially what you get when you take the DFT and apply the Shift
Theorem with a non-integer half sample shift.
But I said in my previous post the derivative at the "first and last
sample point" which is what I interpreted Glen's comments to mean.

I would be curious to see the German translated.

-jim

> The discretization is done in such a way that an equidistant grid
> of sample points is accomplished when putting more blocks together.
>
> Regards
> Guido

----== Posted via Newsfeeds.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeeds.com The #1 Newsgroup Service in the World! 120,000+ Newsgroups
----= East and West-Coast Server Farms - Total Privacy via Encryption =----
```
```jim wrote:
>
> The DCT coefficients are discrete numbers they don't have derivatives.
> But, you are talking about the underlying continuous function. The DCT
> used in JPEG is not really a cosine function. Its something like
> cos(pi*(2*x+1)/16) (for x:0..7) so the derivative of underlying function
> at the first and last sample point is different for each of the 8
> frequencies.

Right.
my draft magazine article on http://jpegclub.org .

Here is an excerpt (in German, sorry, too lazy to translate now):

"Ausgangspunkt sind kontinuierliche Cosinus-Schwingungen mit
unterschiedlicher Frequenz:

cos t u pi   ;   u = 0,1,...,7   ;   0 < t < 1

u ist der diskrete Index mit wachsender Frequenz, t ist die
kontinuierliche Variable im Interval 0 bis 1.
So ergibt sich f&#2013266172;r u = 0 eine konstante Funktion,
f&#2013266172;r u = 1 ergibt
sich eine halbe Cosinus-Vollschwingung, f&#2013266172;r u = 2 eine ganze
Cosinus-Vollschwingung usw.
Die kontinuierliche Variable t wird nun statt im ganzen Interval
0 bis 1 an den diskreten Stellen

2 x + 1
t = -------   ;   x = 0,1,...,7
16

abgetastet. So entsteht innerhalb des Intervals (Block bei JPEG)
eine &#2013265924;quidistante Abtastung mit Abstand 1/8 zwischen den Punkten,
an den Intervalgrenzen (Blockgrenzen bei JPEG) vorn und hinten
ein Abstand von 1/16 zum Randpunkt. Auf diese Weise entsteht
durch die Aneinanderreihung von Bl&#2013266166;cken bei JPEG ein
gleichm&#2013265924;&#2013265951;iger
Abstand (1/8) zwischen den Pixeln &#2013266172;ber das gesamte Bild."

So the slope of the underlying continuous functions is indeed zero
at the interval boundaries.
The discretization is done in such a way that an equidistant grid
of sample points is accomplished when putting more blocks together.

Regards
Guido
```
```"Guido Vollbeding" <guido@jpegclub.org> wrote in message
news:427F1599.9C942CAE@jpegclub.org...
> These are just phrases, and they don't explain anything.

I provided a very concrete mathematical explanation two posts down.

--
Matt

```
```
glen herrmannsfeldt wrote:
>
> Matt Timmermans wrote:
>
> (someone wrote)
>
> >>I was curious to know why is it that DCT is used in
>  >> image compression.  why not FFT?
>
> > Argh.  I don't like any of the answers you have so far.
>
> > The energy compaction property is useful for the lossless parts of the
> > compression process, but the reason the DCT is used for *lossy*
compression
> > is that it separates the image into relatively important components, to
> > which they eye is quite sensitive, and relatively unimportant components,
to
> > which the eye is relatively insensitive.
>
> Well, there is that, but there is also that many signals don't have so
> much high frequency content.  It is possible that if there was more high
> frequency content, evolution would have supplied us with eyes to see it
> better, and required different compression algorithms.
>

Get a good magnifying glass or better yet a microscope. There's lots of
high frequency that you aren't seeing.

> > This allows the important components to be recorded accurately, with more
> > bits, while the unimportant components can be encoded less accurately,
with
> > fewer bits.  Reducing the accuracy of these unimportant components is
called
> > quantization.  It is the lossy part of lossy compression.
>
> > The DCT is used instead of the DFT, because it accomplishes this
separation
> > better w.r.t the human visual response.  It has basis vectors for the
most
> > important components like overall brightness and ramps, while the DFT and
> > DST do not.
>
> I believe that the main difference between DFT, DST, DCT, and the
> various MDCTs is the boundary conditions.  They are all based on
> sinusoidal basis functions but which functions are used depends on the
> boundary conditions.   DFT have periodic boundary conditions, DST have
> the function going to zero at the boundary.  It isn't hard to see that
> those would cause large artifacts in the reconstructed image.  DCT has
> the derivative go to zero at the boundary, relatively less visible.

The DCT coefficients are discrete numbers they don't have derivatives.
But, you are talking about the underlying continuous function. The DCT
used in JPEG is not really a cosine function. Its something like
cos(pi*(2*x+1)/16) (for x:0..7) so the derivative of underlying function
at the first and last sample point is different for each of the 8
frequencies.

-jim

----== Posted via Newsfeeds.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeeds.com The #1 Newsgroup Service in the World! 120,000+ Newsgroups
----= East and West-Coast Server Farms - Total Privacy via Encryption =----
```
```sandeep_mc81@yahoo.com wrote:

> glen herrmannsfeldt wrote:

(snip)

>>The difference in use between DST, DCT, and DFT are the boundary
>>conditions.  (This is explained in Numerical Recipes, where I looked
>>when I wondered some time ago.)

>>DFT has periodic boundary conditions, DST has the function go to
>>zero at the boundary, and DCT has the derivative go to zero at the
>>boundary.  It seems that DCT has a smaller effect on the image
>>than the others.   That is, the effect of the blocks is less
>>visible with DCT than DFT or DST.   FFT is an algorithm,
>>or class of algorithms, for evaluating DFT.

> Can you explain what is meant by "derivative go to zero" or
"function
> go to zero" ? and when does it go to zero ?

Discrete transforms are used for blocks of data of a finite size,
8 by 8 in many image compression algorithms.

A transform such as DFT, DST, or DCT represents the input as a sum
of basis functions with transform supplied coefficients.

Consider the sine transform for f(x) where x is from 0 to 8.

f(x)=A sin(pi x/8) + B sin(2 pi x/8) + C sin(3 pi x/8)
+ D sin(4 pi x/8) + E sin(5 pi x/8) + F sin(6 pi x/8)
+ G sin(7 pi x/8);

Since sin() goes to zero at zero and pi, f(0) and f(8) will be zero.
DST is a useful transform for functions that naturally go to zero.

Consider the vibration of a violin string, tied down at each end.
The sine transform would be a useful transform in that case.

The DFT has periodic boundary conditions, so it would generate
reconstructions where f(0)=f(8), also not a property of image blocks.

DCT has the boundary condition f'(0)=f'(8), which, while not necessarily
true is much less visible than the others.

-- glen

```
```Guido Vollbeding wrote:

(snip)

> I think the DFT and DST provide similar features, but the DCT is
> real (while DFT is complex) and has better boundary conditions
> for block decomposition than the DST (see cosine zero slope
> property at interval ends).

Well, for the DFT case consider it as sines and cosines with
the appropriate amplitude.   For real input the cos terms have real
coefficients and the sin terms imaginary coefficients.  After you
remove half the terms, you are still left with periodic boundary
conditions.

So, for a given number of coefficients, which transform provides
the best compression and resultant reconstruction?

-- glen

```