Reply by February 7, 20192019-02-07
On Thursday, July 19, 2018 at 12:06:03 PM UTC-7, Christian Gollwitzer wrote:

(snip)

> Both transforms are invertible and transform N data points into N > coefficients, so there is no compression in this step. The advantage > comes with lossy compression. This basically means that you set small > coefficients to zero.
> Now, with DFT, you have a discontinuity at the boundary of the buffer, > because usually your input is not periodic. If you set the higher > frequencies to zero, you'll end up with ringing artifacts - just imagine > that these higher frequencies represent a sinc function in frequency > space to correspond to a jump in real space. For DCT, there is no jump > over the edge, only the first derivative is discontinuous. This leads to > much smaller ringing artifacts when you quantize the coefficients. The > answer is therefore, with DCT you can disturb the coefficients more (and > more easily) for the same image quality in the end.
yes. And to finish the discussion, the DST has zeros for boundary conditions.
Reply by October 11, 20182018-10-11
On Sunday, July 15, 2018 at 2:17:30 AM UTC-4, RichD wrote:
> The mp3 standard uses the discrete cosine transform (DCT), > rather than the full DFT. > > Doesn't that imply throwing away half the information? > Can anyone explain this? > > -- > Rich
I maybe didn't read all of the responses, but few seem to talk about the statistical nature of the DCT. This is all in laymans terms: it is mathematically proven to be close to optimum (optimum would be KLT -- another transform, but needs to be tuned for every set of compressed data) for the kind of data that tends to be (laymans terms) not varying very much. It packs the data in a way that statistically maximizes the amount of signal represented requiring less 'information' or 'power' than other transforms. FFT is less efficient IN GENERAL, DST is less efficient for the statistically correlated (doesn't change much) kind of data, DLT and other transforms are also not quite as good as DCT. There are some variations of DCT which are better for certain cases -- MDCT, for example -- which handles edges in the data better, and then generally have better compression stats also. So, the choice of DCT isn't really strongly driven by 'real to real', or 'not needing all of the FFT', but is more a matter of the compression efficiency of sorts. There are whole, very tedious books on the subject of DCT. The book that I use as a reference is: Discrete Cosine Transform, by Rao and Yip. They are much more formal (and correct) in the information that they provide than I am!!! Again, I wrote this in laymans terms as reasonably as I can. Again, I just wanted to clarify that the DCT is a mathematically desirable transform in its own right. John
Reply by SG August 31, 20182018-08-31
On Sunday, July 15, 2018 at 8:17:30 AM UTC+2, RichD wrote:

> The mp3 standard uses the discrete cosine transform (DCT), > rather than the full DFT.
> Doesn't that imply throwing away half the information?
No ... because you also have this 50% window overlap which cancels out the loss. It works out. :-) More specifically, it's an MDCT (modified DCT) which is a kind of lapped transform which can be factored into a bunch of parallel "rotation butterflies" across block boundaries followed by type-IV DCTs within block boundaries. The resulting mapping is linear and orthogonal. There is no information thrown away and no redundancy is added. You can view this kind of transform of a long signal as a bank of filters working in parallel. In that context, the term "critically- sampled" is used to describe the fact that we don't throw anything away and also don't add any redundancy. The nice properties of the MDCT are that the resulting transform coefficients describe a part of the original signal that is pretty localized both in time and frequency, the *whole* mapping is orthogonal and energy preserving (by *whole* I mean the transforms applied on the whole signal, so, multiple blocks seen as one big transform) and that thing is critically-sampled which is nice for compression. You can't get this with an FFT. If you just apply FFTs on consecutive blocks of your signal with no windowing, you'd have an effect called "spectral leakage" (i.e. bad localization in frequency domain) or you could use overlapped FFTs with other windows but then it wouldn't be critically sampled anymore. I believe, there is actually a mathematical proof that says we can't have a filterbank with complex-valued transform coefficients that is critically sampled while being well localized in time and frequency at the same time. This is only possible for a transform that gives you real-valued coefficients (not violating Heisenberg, of course). So, these real-to-real or compelx-to-real transforms are really as good as it gets. At least for non-parametric audio coding. I'm ignoring advanced things like SBR (subband replication) here which make use of non-critically sampled complex-valued filterbanks.
> Can anyone explain this?
I hope I could help clearing this up and add valuable information. Cheers! SG
Reply by Steve Pope July 19, 20182018-07-19
Steve Pope <spope384@gmail.com> wrote:

>RichD <r_delaney2001@yahoo.com> wrote: > >>OK, throw away the sine terms, but double the number of >>bins. Thus from an information sense, considering compression, >>where's the profit? > >The advantage comes in when you quantize the DCT coefficients. >It has different (one hopes, preferable) qualities compared to >quantizing the equivalent DFT coefficients.
Oops, Christian already described this... Steve
Reply by Steve Pope July 19, 20182018-07-19
RichD  <r_delaney2001@yahoo.com> wrote:

>OK, throw away the sine terms, but double the number of >bins. Thus from an information sense, considering compression, >where's the profit?
The advantage comes in when you quantize the DCT coefficients. It has different (one hopes, preferable) qualities compared to quantizing the equivalent DFT coefficients. Steve
Reply by Christian Gollwitzer July 19, 20182018-07-19
Am 19.07.2018 um 19:23 schrieb RichD:
> On July 15, Steve Pope wrote: >> The math works. The input is extended by reflection to create an even >> signal twice as long as the original signal; this is then subjected >> to a discrete cosine transform (preserving information, as the signal >> is even). In this cosine transform, the bins are more closely spaced by a >> factor of two, relative to the base case of taking the original >> (unextended) signal and subjecting it to a discrete Fourier >> transform. So there are twice as many bins with the cosine method. > > OK, throw away the sine terms, but double the number of > bins. Thus from an information sense, considering compression, > where's the profit?
Both transforms are invertible and transform N data points into N coefficients, so there is no compression in this step. The advantage comes with lossy compression. This basically means that you set small coefficients to zero. Now, with DFT, you have a discontinuity at the boundary of the buffer, because usually your input is not periodic. If you set the higher frequencies to zero, you'll end up with ringing artifacts - just imagine that these higher frequencies represent a sinc function in frequency space to correspond to a jump in real space. For DCT, there is no jump over the edge, only the first derivative is discontinuous. This leads to much smaller ringing artifacts when you quantize the coefficients. The answer is therefore, with DCT you can disturb the coefficients more (and more easily) for the same image quality in the end. Christian
Reply by RichD July 19, 20182018-07-19
On July 15, Steve Pope wrote:
>>>The mp3 standard uses the discrete cosine transform (DCT), >>>rather than the full DFT. >>>Doesn't that imply throwing away half the information? >>>Can anyone explain this? > >> No, it's real-input, real-output. > > The math works. The input is extended by reflection to create an even > signal twice as long as the original signal; this is then subjected > to a discrete cosine transform (preserving information, as the signal > is even). In this cosine transform, the bins are more closely spaced by a > factor of two, relative to the base case of taking the original > (unextended) signal and subjecting it to a discrete Fourier > transform. So there are twice as many bins with the cosine method.
OK, throw away the sine terms, but double the number of bins. Thus from an information sense, considering compression, where's the profit? - Rich
Reply by Steve Pope July 19, 20182018-07-19
Randy Yates  <yates@digitalsignallabs.com> wrote:

>spope384@gmail.com (Steve Pope) writes:
>> You don't have to literally extend the data by reflection, this >> is just one way of visualizing the math. >> >> And technically, if you do not perform said extension, then >> the transform you subsequently perform is not strictly speaking a DCT, it >> could maybe be called a discrete half-cosine transform.
>[mallat, p.349] does use your exact terminology. I'll have to study >more on the definitions and notions of extensions. > >@book{mallat, > title = "A Wavelet Tour of Signal Processing", > author = "St{\'e}phane Mallat", > publisher = "Academic Press", > edition = "second", > year = "1999"} >
Cool. Thanks. Steve
Reply by July 18, 20182018-07-18
On Saturday, July 14, 2018 at 11:46:59 PM UTC-7, Christian Gollwitzer wrote:
> Am 15.07.18 um 08:17 schrieb RichD: > > The mp3 standard uses the discrete cosine transform (DCT), > > rather than the full DFT.
> Yes
> > Doesn't that imply throwing away half the information?
> No. It implies different boundary conditions. While the mathematical > definition of the (continuous) Fourier transform operates on an infinite > signal, these discrete transforms only operate over a buffer. Therefore, > it needs to be defined, what the data "outside" this buffer looks like. > If you do DFT (or FFT, which is the same result but faster), then it is > assumed that everything repeats at the boundary from start. If you do > DCT, it is assumed that it is mirrored. Let's say your buffer contains
> 1,2,3,4
The more usual description is that the FFT has periodic boundary conditions, DST has the function go to zero at the boundary, and DCT has the derivative go to zero. Putting transform blocks together, as is usually done in image reconstruction, the boundaries are less visible with DCT.
> Then for DFT it is assumed to be
> ....1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4,....
Specifically periodic boundary conditions.
> For DCT:
> ....1,2,3,4,4,3,2,1,1,2,3,4,4,3,2,1,....
The derivative is zero, such that the basis functions, and so the result of the transform have reflection symmetry.
> When the latter is transformed, due to symmetry, only cosine terms can > be there, the sine terms give zero. When you transform the former, due > to symmetry, half of the coefficients are the complex conjugate of the > other and also redundant. Therefore, there is always the same amount of > information.
> The DCT assumption has the advantage that there is can not be a jump at > the boundary.
The derivative has a jump, though. It is the same math as standing waves with shorted (DST) or open (DCT) boundary conditions on a transmission line. As to the OP question, there are the same number of basis functions, but they are different functions. In the 8 element case, the DFT has four sines and four cosines, DST has 8 sines, and DCT has 8 cosines. (Given real input data.)
Reply by Randy Yates July 16, 20182018-07-16
spope384@gmail.com (Steve Pope) writes:

> Randy Yates <randyy@garnerundergroundinc.com> wrote: > >>dbd <d.dalrymple@sbcglobal.net> writes: > >>> On Monday, July 16, 2018 at 12:35:20 AM UTC-7, Randy Yates wrote: > >>>> spope384@gmail.com (Steve Pope) writes: > >>>> I don't see anything about "extension by reflection" in the definition >>>> of the DCT here (e.g., DCT-II): > >>>> https://en.wikipedia.org/wiki/Discrete_cosine_transform#Formal_definition > >>>> To what are you referring? > >>> Look for "implicit even/odd extensions of DCT input data". I think the >>> mp3 mentioned by the OP is DCT-IV. But what do you expect from using a >>> wikipedia as a technical reference? > >>Hi Dale, >> >>It seemed to me that Steve was saying you first have to extend the input >>sequence before applying the DCT. Based on the wikipedia article >>DCT definitions, that is not the case. > > You don't have to literally extend the data by reflection, this > is just one way of visualizing the math. > > And technically, if you do not perform said extension, then > the transform you subsequently perform is not strictly speaking a DCT, it > could maybe be called a discrete half-cosine transform.
[mallat, p.349] does use your exact terminology. I'll have to study more on the definitions and notions of extensions. @book{mallat, title = "A Wavelet Tour of Signal Processing", author = "St{\'e}phane Mallat", publisher = "Academic Press", edition = "second", year = "1999"} -- Randy Yates, DSP/Embedded Firmware Developer Digital Signal Labs http://www.digitalsignallabs.com