DCT in video compression

Started by RichD July 15, 2018
Randy Yates  <yates@digitalsignallabs.com> wrote:

>spope384@gmail.com (Steve Pope) writes:
>> You don't have to literally extend the data by reflection, this >> is just one way of visualizing the math. >> >> And technically, if you do not perform said extension, then >> the transform you subsequently perform is not strictly speaking a DCT, it >> could maybe be called a discrete half-cosine transform.
>[mallat, p.349] does use your exact terminology. I'll have to study >more on the definitions and notions of extensions. > >@book{mallat, > title = "A Wavelet Tour of Signal Processing", > author = "St{\'e}phane Mallat", > publisher = "Academic Press", > edition = "second", > year = "1999"} >
Cool. Thanks. Steve
On July 15, Steve Pope wrote:
>>>The mp3 standard uses the discrete cosine transform (DCT), >>>rather than the full DFT. >>>Doesn't that imply throwing away half the information? >>>Can anyone explain this? > >> No, it's real-input, real-output. > > The math works. The input is extended by reflection to create an even > signal twice as long as the original signal; this is then subjected > to a discrete cosine transform (preserving information, as the signal > is even). In this cosine transform, the bins are more closely spaced by a > factor of two, relative to the base case of taking the original > (unextended) signal and subjecting it to a discrete Fourier > transform. So there are twice as many bins with the cosine method.
OK, throw away the sine terms, but double the number of bins. Thus from an information sense, considering compression, where's the profit? - Rich
Am 19.07.2018 um 19:23 schrieb RichD:
> On July 15, Steve Pope wrote: >> The math works. The input is extended by reflection to create an even >> signal twice as long as the original signal; this is then subjected >> to a discrete cosine transform (preserving information, as the signal >> is even). In this cosine transform, the bins are more closely spaced by a >> factor of two, relative to the base case of taking the original >> (unextended) signal and subjecting it to a discrete Fourier >> transform. So there are twice as many bins with the cosine method. > > OK, throw away the sine terms, but double the number of > bins. Thus from an information sense, considering compression, > where's the profit?
Both transforms are invertible and transform N data points into N coefficients, so there is no compression in this step. The advantage comes with lossy compression. This basically means that you set small coefficients to zero. Now, with DFT, you have a discontinuity at the boundary of the buffer, because usually your input is not periodic. If you set the higher frequencies to zero, you'll end up with ringing artifacts - just imagine that these higher frequencies represent a sinc function in frequency space to correspond to a jump in real space. For DCT, there is no jump over the edge, only the first derivative is discontinuous. This leads to much smaller ringing artifacts when you quantize the coefficients. The answer is therefore, with DCT you can disturb the coefficients more (and more easily) for the same image quality in the end. Christian
RichD  <r_delaney2001@yahoo.com> wrote:

>OK, throw away the sine terms, but double the number of >bins. Thus from an information sense, considering compression, >where's the profit?
The advantage comes in when you quantize the DCT coefficients. It has different (one hopes, preferable) qualities compared to quantizing the equivalent DFT coefficients. Steve
Steve Pope <spope384@gmail.com> wrote:

>RichD <r_delaney2001@yahoo.com> wrote: > >>OK, throw away the sine terms, but double the number of >>bins. Thus from an information sense, considering compression, >>where's the profit? > >The advantage comes in when you quantize the DCT coefficients. >It has different (one hopes, preferable) qualities compared to >quantizing the equivalent DFT coefficients.
Oops, Christian already described this... Steve
On Sunday, July 15, 2018 at 8:17:30 AM UTC+2, RichD wrote:

> The mp3 standard uses the discrete cosine transform (DCT), > rather than the full DFT.
> Doesn't that imply throwing away half the information?
No ... because you also have this 50% window overlap which cancels out the loss. It works out. :-) More specifically, it's an MDCT (modified DCT) which is a kind of lapped transform which can be factored into a bunch of parallel "rotation butterflies" across block boundaries followed by type-IV DCTs within block boundaries. The resulting mapping is linear and orthogonal. There is no information thrown away and no redundancy is added. You can view this kind of transform of a long signal as a bank of filters working in parallel. In that context, the term "critically- sampled" is used to describe the fact that we don't throw anything away and also don't add any redundancy. The nice properties of the MDCT are that the resulting transform coefficients describe a part of the original signal that is pretty localized both in time and frequency, the *whole* mapping is orthogonal and energy preserving (by *whole* I mean the transforms applied on the whole signal, so, multiple blocks seen as one big transform) and that thing is critically-sampled which is nice for compression. You can't get this with an FFT. If you just apply FFTs on consecutive blocks of your signal with no windowing, you'd have an effect called "spectral leakage" (i.e. bad localization in frequency domain) or you could use overlapped FFTs with other windows but then it wouldn't be critically sampled anymore. I believe, there is actually a mathematical proof that says we can't have a filterbank with complex-valued transform coefficients that is critically sampled while being well localized in time and frequency at the same time. This is only possible for a transform that gives you real-valued coefficients (not violating Heisenberg, of course). So, these real-to-real or compelx-to-real transforms are really as good as it gets. At least for non-parametric audio coding. I'm ignoring advanced things like SBR (subband replication) here which make use of non-critically sampled complex-valued filterbanks.
> Can anyone explain this?
I hope I could help clearing this up and add valuable information. Cheers! SG
On Sunday, July 15, 2018 at 2:17:30 AM UTC-4, RichD wrote:
> The mp3 standard uses the discrete cosine transform (DCT), > rather than the full DFT. > > Doesn't that imply throwing away half the information? > Can anyone explain this? > > -- > Rich
I maybe didn't read all of the responses, but few seem to talk about the statistical nature of the DCT. This is all in laymans terms: it is mathematically proven to be close to optimum (optimum would be KLT -- another transform, but needs to be tuned for every set of compressed data) for the kind of data that tends to be (laymans terms) not varying very much. It packs the data in a way that statistically maximizes the amount of signal represented requiring less 'information' or 'power' than other transforms. FFT is less efficient IN GENERAL, DST is less efficient for the statistically correlated (doesn't change much) kind of data, DLT and other transforms are also not quite as good as DCT. There are some variations of DCT which are better for certain cases -- MDCT, for example -- which handles edges in the data better, and then generally have better compression stats also. So, the choice of DCT isn't really strongly driven by 'real to real', or 'not needing all of the FFT', but is more a matter of the compression efficiency of sorts. There are whole, very tedious books on the subject of DCT. The book that I use as a reference is: Discrete Cosine Transform, by Rao and Yip. They are much more formal (and correct) in the information that they provide than I am!!! Again, I wrote this in laymans terms as reasonably as I can. Again, I just wanted to clarify that the DCT is a mathematically desirable transform in its own right. John
On Thursday, July 19, 2018 at 12:06:03 PM UTC-7, Christian Gollwitzer wrote:

(snip)

> Both transforms are invertible and transform N data points into N > coefficients, so there is no compression in this step. The advantage > comes with lossy compression. This basically means that you set small > coefficients to zero.
> Now, with DFT, you have a discontinuity at the boundary of the buffer, > because usually your input is not periodic. If you set the higher > frequencies to zero, you'll end up with ringing artifacts - just imagine > that these higher frequencies represent a sinc function in frequency > space to correspond to a jump in real space. For DCT, there is no jump > over the edge, only the first derivative is discontinuous. This leads to > much smaller ringing artifacts when you quantize the coefficients. The > answer is therefore, with DCT you can disturb the coefficients more (and > more easily) for the same image quality in the end.
yes. And to finish the discussion, the DST has zeros for boundary conditions.