DCT in video compression

Started by RichD July 15, 2018
The mp3 standard uses the discrete cosine transform (DCT), 
rather than the full DFT.

Doesn't that imply throwing away half the information?
Can anyone explain this?

--
Rich
Am 15.07.18 um 08:17 schrieb RichD:
> The mp3 standard uses the discrete cosine transform (DCT), > rather than the full DFT.
Yes
> Doesn't that imply throwing away half the information?
No. It implies different boundary conditions. While the mathematical definition of the (continuous) Fourier transform operates on an infinite signal, these discrete transforms only operate over a buffer. Therefore, it needs to be defined, what the data "outside" this buffer looks like. If you do DFT (or FFT, which is the same result but faster), then it is assumed that everything repeats at the boundary from start. If you do DCT, it is assumed that it is mirrored. Let's say your buffer contains 1,2,3,4 Then for DFT it is assumed to be ....1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4,.... For DCT: ....1,2,3,4,4,3,2,1,1,2,3,4,4,3,2,1,.... When the latter is transformed, due to symmetry, only cosine terms can be there, the sine terms give zero. When you transform the former, due to symmetry, half of the coefficients are the complex conjugate of the other and also redundant. Therefore, there is always the same amount of information. The DCT assumption has the advantage that there is can not be a jump at the boundary. Christian
On Sat, 14 Jul 2018 23:17:27 -0700 (PDT), RichD
<r_delaney2001@yahoo.com> wrote:

>The mp3 standard uses the discrete cosine transform (DCT), >rather than the full DFT. > >Doesn't that imply throwing away half the information? >Can anyone explain this? > >-- >Rich
No, it's real-input, real-output.
Eric Jacobsen <theman@ericjacobsen.org> wrote:

><r_delaney2001@yahoo.com> wrote:
>>The mp3 standard uses the discrete cosine transform (DCT), >>rather than the full DFT.
>>Doesn't that imply throwing away half the information? >>Can anyone explain this?
>No, it's real-input, real-output.
The math works. The input is extended by reflection to create an even signal twice as long as the original signal; this is then subjected to a discrete cosine transform (preserving information, as the signal is even). In this cosine transform, the bins are more closely spaced by a factor of two, relative to the base case of taking the original (unextended) signal and subjecting it to a discrete Fourier transform. So there are twice as many bins with the cosine method. Had the original input been complex, this cosine method would still work -- but the transformed values would also be complex. Steve
spope384@gmail.com (Steve Pope) writes:

> Eric Jacobsen <theman@ericjacobsen.org> wrote: > >><r_delaney2001@yahoo.com> wrote: > >>>The mp3 standard uses the discrete cosine transform (DCT), >>>rather than the full DFT. > >>>Doesn't that imply throwing away half the information? >>>Can anyone explain this? > >>No, it's real-input, real-output. > > The math works. The input is extended by reflection to create an even > signal twice as long as the original signal; this is then subjected > to a discrete cosine transform (preserving information, as the signal > is even).
Steve, I don't see anything about "extension by reflection" in the definition of the DCT here (e.g., DCT-II): https://en.wikipedia.org/wiki/Discrete_cosine_transform#Formal_definition To what are you referring? I think the key concept to focus on, and which is mentioned early on in the Wikipedia article, is that the transform can be represented by an invertible NxN matrix, where N is the input sequence length. The fact it is invertible is what guarantees the transform does not "throw away information." -- Randy Yates, DSP/Embedded Firmware Developer Digital Signal Labs http://www.digitalsignallabs.com
On Monday, July 16, 2018 at 12:35:20 AM UTC-7, Randy Yates wrote:
> spope384@gmail.com (Steve Pope) writes:
...
> > I don't see anything about "extension by reflection" in the definition > of the DCT here (e.g., DCT-II): > > https://en.wikipedia.org/wiki/Discrete_cosine_transform#Formal_definition > > To what are you referring? >
...
> Randy Yates
Randy Look for "implicit even/odd extensions of DCT input data". I think the mp3 mentioned by the OP is DCT-IV. But what do you expect from using a wikipedia as a technical reference? Dale B. Dalrymple
dbd <d.dalrymple@sbcglobal.net> writes:

> On Monday, July 16, 2018 at 12:35:20 AM UTC-7, Randy Yates wrote: >> spope384@gmail.com (Steve Pope) writes: > ... >> >> I don't see anything about "extension by reflection" in the definition >> of the DCT here (e.g., DCT-II): >> >> https://en.wikipedia.org/wiki/Discrete_cosine_transform#Formal_definition >> >> To what are you referring? >> > ... >> Randy Yates > > Randy > > Look for "implicit even/odd extensions of DCT input data". I think the > mp3 mentioned by the OP is DCT-IV. But what do you expect from using a > wikipedia as a technical reference?
Hi Dale, It seemed to me that Steve was saying you first have to extend the input sequence before applying the DCT. Based on the wikipedia article DCT definitions, that is not the case. Are the Wikipedia article definitions wrong? If not, then the quality of the rest of the article is irrelevent as I understand the OP's question. The OP's question seems to be: how can we represent the input sequence with only cosines instead of both sines and cosines? Don't we lose information? By "lose information" I take him to mean that we cannot fully represent an arbitrary N-sample input sequence with the transformed output sequence. My answer is that, since the transform is invertible, you can always get back the exact N input samples given the N transformed samples. As far as I can tell, the discussions regarding extensions of the DCT input data give insight on the utility of the DCT in compression and are not relevent to the OPs "loss of information" question. -- Randy Yates, Embedded Linux Developer The Garner Underground, Inc. http://www.garnerundergroundinc.com
Randy Yates  <randyy@garnerundergroundinc.com> wrote:

>dbd <d.dalrymple@sbcglobal.net> writes:
>> On Monday, July 16, 2018 at 12:35:20 AM UTC-7, Randy Yates wrote:
>>> spope384@gmail.com (Steve Pope) writes:
>>> I don't see anything about "extension by reflection" in the definition >>> of the DCT here (e.g., DCT-II):
>>> https://en.wikipedia.org/wiki/Discrete_cosine_transform#Formal_definition
>>> To what are you referring?
>> Look for "implicit even/odd extensions of DCT input data". I think the >> mp3 mentioned by the OP is DCT-IV. But what do you expect from using a >> wikipedia as a technical reference?
>Hi Dale, > >It seemed to me that Steve was saying you first have to extend the input >sequence before applying the DCT. Based on the wikipedia article >DCT definitions, that is not the case.
You don't have to literally extend the data by reflection, this is just one way of visualizing the math. And technically, if you do not perform said extension, then the transform you subsequently perform is not strictly speaking a DCT, it could maybe be called a discrete half-cosine transform. In any case, the math works, and can be shown to be the equivalent of a DFT. Steve
spope384@gmail.com (Steve Pope) writes:

> Randy Yates <randyy@garnerundergroundinc.com> wrote: > >>dbd <d.dalrymple@sbcglobal.net> writes: > >>> On Monday, July 16, 2018 at 12:35:20 AM UTC-7, Randy Yates wrote: > >>>> spope384@gmail.com (Steve Pope) writes: > >>>> I don't see anything about "extension by reflection" in the definition >>>> of the DCT here (e.g., DCT-II): > >>>> https://en.wikipedia.org/wiki/Discrete_cosine_transform#Formal_definition > >>>> To what are you referring? > >>> Look for "implicit even/odd extensions of DCT input data". I think the >>> mp3 mentioned by the OP is DCT-IV. But what do you expect from using a >>> wikipedia as a technical reference? > >>Hi Dale, >> >>It seemed to me that Steve was saying you first have to extend the input >>sequence before applying the DCT. Based on the wikipedia article >>DCT definitions, that is not the case. > > You don't have to literally extend the data by reflection, this > is just one way of visualizing the math. > > And technically, if you do not perform said extension, then > the transform you subsequently perform is not strictly speaking a DCT, it > could maybe be called a discrete half-cosine transform.
[mallat, p.349] does use your exact terminology. I'll have to study more on the definitions and notions of extensions. @book{mallat, title = "A Wavelet Tour of Signal Processing", author = "St{\'e}phane Mallat", publisher = "Academic Press", edition = "second", year = "1999"} -- Randy Yates, DSP/Embedded Firmware Developer Digital Signal Labs http://www.digitalsignallabs.com
On Saturday, July 14, 2018 at 11:46:59 PM UTC-7, Christian Gollwitzer wrote:
> Am 15.07.18 um 08:17 schrieb RichD: > > The mp3 standard uses the discrete cosine transform (DCT), > > rather than the full DFT.
> Yes
> > Doesn't that imply throwing away half the information?
> No. It implies different boundary conditions. While the mathematical > definition of the (continuous) Fourier transform operates on an infinite > signal, these discrete transforms only operate over a buffer. Therefore, > it needs to be defined, what the data "outside" this buffer looks like. > If you do DFT (or FFT, which is the same result but faster), then it is > assumed that everything repeats at the boundary from start. If you do > DCT, it is assumed that it is mirrored. Let's say your buffer contains
> 1,2,3,4
The more usual description is that the FFT has periodic boundary conditions, DST has the function go to zero at the boundary, and DCT has the derivative go to zero. Putting transform blocks together, as is usually done in image reconstruction, the boundaries are less visible with DCT.
> Then for DFT it is assumed to be
> ....1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4,....
Specifically periodic boundary conditions.
> For DCT:
> ....1,2,3,4,4,3,2,1,1,2,3,4,4,3,2,1,....
The derivative is zero, such that the basis functions, and so the result of the transform have reflection symmetry.
> When the latter is transformed, due to symmetry, only cosine terms can > be there, the sine terms give zero. When you transform the former, due > to symmetry, half of the coefficients are the complex conjugate of the > other and also redundant. Therefore, there is always the same amount of > information.
> The DCT assumption has the advantage that there is can not be a jump at > the boundary.
The derivative has a jump, though. It is the same math as standing waves with shorted (DST) or open (DCT) boundary conditions on a transmission line. As to the OP question, there are the same number of basis functions, but they are different functions. In the 8 element case, the DFT has four sines and four cosines, DST has 8 sines, and DCT has 8 cosines. (Given real input data.)