Forums

FFT vs DCT

Started by Raeldor September 26, 2011
Hi All,

I'm trying to understand the differences between FFT and DCT and was
hoping someone may be able to help me with a couple of questions.

1. Is DCT basically the same as an FFT that operates on real numbers?

2. I keep reading about 'real-odd' and 'real-even'.  What does this
mean?

3. The fftw documentation also talks about being 'shifted by half a
sample'.  If by sample it's talking about each sample value at a
specific time point, how can it be shifted by half, since it's just an
1d array of numbers?

Thanks
Rael


Raeldor wrote:

> Hi All, > > I'm trying to understand the differences between FFT and DCT and was > hoping someone may be able to help me with a couple of questions. > > 1. Is DCT basically the same as an FFT that operates on real numbers? > > 2. I keep reading about 'real-odd' and 'real-even'. What does this > mean? > > 3. The fftw documentation also talks about being 'shifted by half a > sample'. If by sample it's talking about each sample value at a > specific time point, how can it be shifted by half, since it's just an > 1d array of numbers? > > Thanks > Rael >
You Again?

You know, I feel a little sorry for you.  It's obvious you've been
bullied or abused at some point and now feel like you need to prove
yourself by name-calling people who are taking an interest in a
subject you know something about.  It doesn't make you look any
smarter, it just makes you look like someone who needs help.  I've met
people like you before.  You might know a lot, and you might even be
smart, but you will never be content until you start treating people
with respect, because no matter how much you know or how clever you
are they'll never respect YOU until you do.
Getting back to my original question (and to qualify a little more), I
tried implementing the 1D DCT myself based on formulas from Wikipedia,
and came up with...

        // now roll dct on mel log buffer
        for (int s=0; s < quantizeCount; s++) {
            float mfcc[256];
            float total = 0.0f;
            for (int m=0; m < mfccCount; m++) {
                mfcc[m] = 0.0f;
                for (int k=0; k < melCount; k++) {
                    mfcc[m] += cos(m*(k+0.5f)*M_PI/
(float)melCount)*melBuffer[s*melCount+k];
                }
                total += mfcc[m];
                mfccBuffer[s*mfccCount+m] = mfcc[m];
            }
        }


But the output is quite different to that of an inverse-FFT, hence the
questions.  Also, from other DCT code I've seen, my code seems maybe
overly simple?

Thanks

On Sep 26, 10:47&#2013266080;am, Raeldor <rael...@gmail.com> wrote:
> Getting back to my original question (and to qualify a little more), I > tried implementing the 1D DCT myself based on formulas from Wikipedia, > and came up with... > > &#2013266080; &#2013266080; &#2013266080; &#2013266080; // now roll dct on mel log buffer > &#2013266080; &#2013266080; &#2013266080; &#2013266080; for (int s=0; s < quantizeCount; s++) { > &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; float mfcc[256]; > &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; float total = 0.0f; > &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; for (int m=0; m < mfccCount; m++) { > &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; mfcc[m] = 0.0f; > &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; for (int k=0; k < melCount; k++) { > &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; mfcc[m] += cos(m*(k+0.5f)*M_PI/ > (float)melCount)*melBuffer[s*melCount+k]; > &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; } > &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; total += mfcc[m]; > &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; mfccBuffer[s*mfccCount+m] = mfcc[m]; > &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; } > &#2013266080; &#2013266080; &#2013266080; &#2013266080; } > > But the output is quite different to that of an inverse-FFT, hence the > questions. &#2013266080;Also, from other DCT code I've seen, my code seems maybe > overly simple? > > Thanks
I just took a VERY quick look at your code. Do you want to interchange total += mfcc[m] and the brace above it?
On 9/26/11 11:15 AM, Raeldor wrote:
> You Again? > > You know, I feel a little sorry for you. It's obvious you've been > bullied or abused at some point and now feel like you need to prove > yourself by name-calling people who are taking an interest in a > subject you know something about. It doesn't make you look any > smarter, it just makes you look like someone who needs help. I've met > people like you before. You might know a lot, and you might even be > smart, but you will never be content until you start treating people > with respect, because no matter how much you know or how clever you > are they'll never respect YOU until you do.
maybe what Vlad was trying to say is that there *are* a couple of online resources that can answer your question with sufficient (but not too much) mathematical sophistication to get to the core of it. Wikipedia is a good place to start. my recollection of the DCT is like this: first start with N/2 real data points: x[n] for 0 <= n < N/2 then create another sequence by appending a reflection of those N/2 points as so: x[N/2 + n] = +x[N/2-1 - n] for 0 <= n < N/2 so even though this sequence is of length N, it is completely defined by N/2 real points. now, if you consider this x[n] to be periodic (which you should whenever dealing with the DFT despite what some here at comp.dsp would say), so x[n-N]= x[n] for all n, you can consider those N/2 reflected points to be on the left (before x[0]) and reflected. so x[-1-n] = x[n] for 0 <= n < N/2 the axis of symmetry is halfway between x[-1] and x[0]. this is where this "shifted by half a sample" comes from. then compute the DFT (or FFT) of this sequence and you get X[k] and after the DFT, we consider only the even indexed samples X[2*k]. the reason why is a sorta "conservation of information" theorem. you're result can have only N/2 degrees of freedom since it is defined by only N/2 independent numbers. now, because it was shifted half a sample to the left that means you want to compensate by multiplying X[k] by X[k] <--- X[k] * e^(j*2*pi*k*(1/2)/N) you'll see that the 2 and 1/2 kill each other off. but, since you're only considering the even indices, it's: X[2*k] <--- X[2*k] * e^(j*2*pi*k/N) for 0 <= k < N/2 after doing that, you should have N/2 real values that totally define the spectrum of that even symmetry input. Wikipedia calls this the "DCT-II" or "DCT Type 2". for the Discrete Sine Transform it's similar. let x[N/2 + n] = -x[N/2-1 - n] for 0 <= n < N/2 and after the DFT, we consider only odd indices X[2*k + 1]. X[2*k+1] <--- X[2*k+1] * e^(j*2*pi*k/N) for 0 <= k < N/2 -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
On Sep 26, 9:18&#2013266080;am, maury <maury...@core.com> wrote:
> On Sep 26, 10:47&#2013266080;am, Raeldor <rael...@gmail.com> wrote: > > > > > > > > > > > Getting back to my original question (and to qualify a little more), I > > tried implementing the 1D DCT myself based on formulas from Wikipedia, > > and came up with... > > > &#2013266080; &#2013266080; &#2013266080; &#2013266080; // now roll dct on mel log buffer > > &#2013266080; &#2013266080; &#2013266080; &#2013266080; for (int s=0; s < quantizeCount; s++) { > > &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; float mfcc[256]; > > &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; float total = 0.0f; > > &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; for (int m=0; m < mfccCount; m++) { > > &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; mfcc[m] = 0.0f; > > &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; for (int k=0; k < melCount; k++) { > > &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; mfcc[m] += cos(m*(k+0.5f)*M_PI/ > > (float)melCount)*melBuffer[s*melCount+k]; > > &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; } > > &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; total += mfcc[m]; > > &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; mfccBuffer[s*mfccCount+m] = mfcc[m]; > > &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; } > > &#2013266080; &#2013266080; &#2013266080; &#2013266080; } > > > But the output is quite different to that of an inverse-FFT, hence the > > questions. &#2013266080;Also, from other DCT code I've seen, my code seems maybe > > overly simple? > > > Thanks > > I just took a VERY quick look at your code. Do you want to interchange > total += mfcc[m] and the brace above it?
Thanks for the reply. I think it's in the right place, but it was just placed for debugging and isn't actually used to calculate the end result.
On Sep 26, 9:49&#2013266080;am, robert bristow-johnson <r...@audioimagination.com>
wrote:
> On 9/26/11 11:15 AM, Raeldor wrote: > > > You Again? > > > You know, I feel a little sorry for you. &#2013266080;It's obvious you've been > > bullied or abused at some point and now feel like you need to prove > > yourself by name-calling people who are taking an interest in a > > subject you know something about. &#2013266080;It doesn't make you look any > > smarter, it just makes you look like someone who needs help. &#2013266080;I've met > > people like you before. &#2013266080;You might know a lot, and you might even be > > smart, but you will never be content until you start treating people > > with respect, because no matter how much you know or how clever you > > are they'll never respect YOU until you do. > > maybe what Vlad was trying to say is that there *are* a couple of online > resources that can answer your question with sufficient (but not too > much) mathematical sophistication to get to the core of it. &#2013266080;Wikipedia > is a good place to start. > > my recollection of the DCT is like this: &#2013266080;first start with N/2 real data > points: > > &#2013266080; &#2013266080; x[n] &#2013266080; &#2013266080;for &#2013266080; 0 <= n < N/2 > > then create another sequence by appending a reflection of those N/2 > points as so: > > &#2013266080; &#2013266080; x[N/2 + n] = +x[N/2-1 - n] &#2013266080; &#2013266080;for &#2013266080; 0 <= n < N/2 > > so even though this sequence is of length N, it is completely defined by > N/2 real points. &#2013266080;now, if you consider this x[n] to be periodic (which > you should whenever dealing with the DFT despite what some here at > comp.dsp would say), so x[n-N]= x[n] for all n, you can consider those > N/2 reflected points to be on the left (before x[0]) and reflected. &#2013266080;so > > &#2013266080; &#2013266080; &#2013266080;x[-1-n] = x[n] &#2013266080; &#2013266080;for &#2013266080; 0 <= n < N/2 > > the axis of symmetry is halfway between x[-1] and x[0]. &#2013266080;this is where > this "shifted by half a sample" comes from. > > then compute the DFT (or FFT) of this sequence and you get X[k] > and after the DFT, we consider only the even indexed samples X[2*k]. the > reason why is a sorta "conservation of information" theorem. &#2013266080;you're > result can have only N/2 degrees of freedom since it is defined by only > N/2 independent numbers. > > now, because it was shifted half a sample to the left that means you > want to compensate by multiplying X[k] by > > &#2013266080; &#2013266080; &#2013266080;X[k] &#2013266080;<--- &#2013266080; X[k] * e^(j*2*pi*k*(1/2)/N) > > you'll see that the 2 and 1/2 kill each other off. &#2013266080;but, since you're > only considering the even indices, it's: > > &#2013266080; &#2013266080; &#2013266080;X[2*k] &#2013266080;<--- &#2013266080; X[2*k] * e^(j*2*pi*k/N) &#2013266080; &#2013266080;for &#2013266080; 0 <= k < N/2 > > after doing that, you should have N/2 real values that totally define > the spectrum of that even symmetry input. &#2013266080;Wikipedia calls this the > "DCT-II" or "DCT Type 2". > > for the Discrete Sine Transform it's similar. &#2013266080;let > > &#2013266080; &#2013266080; x[N/2 + n] = -x[N/2-1 - n] &#2013266080; &#2013266080;for &#2013266080; 0 <= n < N/2 > > and after the DFT, we consider only odd indices X[2*k + 1]. > > &#2013266080; &#2013266080; &#2013266080;X[2*k+1] &#2013266080;<--- &#2013266080; X[2*k+1] * e^(j*2*pi*k/N) &#2013266080; &#2013266080;for &#2013266080; 0 <= k < N/2 > > -- > > r b-j &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080; &#2013266080;r...@audioimagination.com > > "Imagination is more important than knowledge."
Very interesting reply, thank you. So let me make sure I understand what you're saying. In order to calculate the DCT using FFT, I have to reflect my input data using the formula x[N/2 + n] = +x[N/2-1 - n] (what is the '+' before the x here, is that significant? Should I be making it an absolute value?). So, for example is N=8, and data = abcd (array indexed at zero) then n=0, x[8/2+0] = x[8/2-1--0], x[4] = x[3], x[4] = d n=1, x[8/2+1] = x[8/2-1--1], x[5] = x[2], x[5] = c n=2, x[8/2+2] = x[8/2-1--2], x[6] = x[1], x[6] = b n=3, x[8/2+3] = x[8/2-1--3], x[7] = x[0], x[7] = a So, I end up with abcddcba. You're then saying this is considered 'half-shifted' because if I put them on the left before x[0], it becomes x[-4] = d x[-3] = c x[-2] = b x[-1] = a x[0] = a x[1] = b x[2] = c x[3] = d Where x[0]=x[-1] so the reflection is essentially at x[-0.5]? By that reasoning if it WASN'T half shifted the original output would be abcdcba? After this, I run it through the FFT and from the output I get back N values. Now, normally when doing FFT to generate a spectrogram, I don't reflect the data and on the returning values I create my magnitude by taking the real and imaginary and doing sqrt(real^2+imag^2). This time, are you saying that the REAL numbers (even) represent the DCT and the IMAGINARY (even) represent the DST? Also, on that last formula X[2*k] * e^(j*2*pi*k/N), what do 'e' and 'j' represent? Are they constants? Thanks! Rael
On 9/26/2011 11:15 AM, Raeldor wrote:
> You Again? > > You know, I feel a little sorry for you. It's obvious you've been > bullied or abused at some point and now feel like you need to prove > yourself by name-calling people who are taking an interest in a > subject you know something about. It doesn't make you look any > smarter, it just makes you look like someone who needs help. I've met > people like you before. You might know a lot, and you might even be > smart, but you will never be content until you start treating people > with respect, because no matter how much you know or how clever you > are they'll never respect YOU until you do.
Vlad, like me, he has little patience for those who ask here before looking for elementary answers in likely places. We differ mostly in my ignoring lazy questioners and his berating them. "Is DCT basically the same as an FFT that operates on real numbers?" is a classic stupid question. It is stupid as a question to others because it can be easily answered by looking up (say, in MathWorld or Wikipedia) what DCT and FFT are. Asking others to do your scut work is a sign of disrespect. You don't get respect without showing respect. Jerry -- Engineering is the art of making what you want from things you can get.
On 9/26/11 1:30 PM, Raeldor wrote:
> On Sep 26, 9:49 am, robert bristow-johnson<r...@audioimagination.com> > wrote:
...
>> for the Discrete Sine Transform it's similar. let >> >> x[N/2 + n] = -x[N/2-1 - n] for 0<= n< N/2 >> >> and after the DFT, we consider only odd indices X[2*k + 1]. >> >> X[2*k+1] <--- X[2*k+1] * e^(j*2*pi*k/N) for 0<= k< N/2
i made a mistake here. should be (for the DST-II): X[2*k+1] <--- X[2*k+1] * e^(j*pi*(2*k+1)/N) for 0<= k< N/2 and the resulting value should be purely imaginary. Re{X[.]} = 0.
> In order to calculate the DCT using FFT, I have > to reflect my input data using the formula x[N/2 + n] = +x[N/2-1 - n] > (what is the '+' before the x here, is that significant)?
only to differentiate the DCT from the DST.
> Should I be making it an absolute value?).
no, i've been keeping n and k to in [0, N/2) interval but not necessarily the index to the FFT which is, for addressing purposes, in the [0, N) interval.
> So, for example is N=8, and data = > abcd (array indexed at zero) then > > n=0, x[8/2+0] = x[8/2-1--0], x[4] = x[3], x[4] = d > n=1, x[8/2+1] = x[8/2-1--1], x[5] = x[2], x[5] = c > n=2, x[8/2+2] = x[8/2-1--2], x[6] = x[1], x[6] = b > n=3, x[8/2+3] = x[8/2-1--3], x[7] = x[0], x[7] = a > > So, I end up with abcddcba. You're then saying this is considered > 'half-shifted' because if I put them on the left before x[0], it > becomes > > x[-4] = d > x[-3] = c > x[-2] = b > x[-1] = a > x[0] = a > x[1] = b > x[2] = c > x[3] = d > > Where x[0]=x[-1] so the reflection is essentially at x[-0.5]?
that's where the axis of symmetry is. now we know something about a spectrum that is real and symmetrical about 0. it means that the signal is real and symmetric about 0. but this is symmetric about -1/2. so, if the signal is modeled as a real symmetric about 0, but then delayed by 1/2 sample which does this: X[k] <--- X[k] * e^(-j*pi*k/N) to the spectrum. so you undo that operation.
> By that > reasoning if it WASN'T half shifted the original output would be > abcdcba?
you have only one d. what happened to the other one?
> After this, I run it through the FFT and from the output I get back N > values. Now, normally when doing FFT to generate a spectrogram, I > don't reflect the data and on the returning values I create my > magnitude by taking the real and imaginary and doing > sqrt(real^2+imag^2).
and it's always positive. you lose sign or polarity information. the result of the DCT-II (as is designated in the Wikipedia article), is real, but you didn't lose sign information which the abs value operation does to real numbers.
> This time, are you saying that the REAL numbers > (even) represent the DCT and the IMAGINARY (even) represent the DST?
actually Im{ } of odd indexed samples (after the phase correction). i wrote that wrong about the DST originally.
> Also, on that last formula X[2*k] * e^(j*2*pi*k/N), what do 'e' and > 'j' represent? Are they constants?
oh dear. i wonder what will happen when Vlad sees this. natural log of e is 1 and j^2 = -1. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."