DSPRelated.com
Forums

FFT & averaging

Started by Rolf Keller May 10, 2005
Hi,

I'm FFTing speech signals and analyse the energy spectra. Because 
these signals flutter extremely, I average 16 or 32 FFT results. 
So far, I've taken the energy, i. e. the value |Re|^2 + |Im|^2, 
of any indiviual bin and averaged 10 or 20 of these values.

But recently I read that this "incoherent" averaging  does not 
reduce the noise power. This seems correct to me. Additionally I 
read that the noise power can be reduced by "coherent" averaging, 
i. e. by taking the values |Re| and |Im|, average them and 
compute the energy AFTER averaging.

Any ideas on that?  Pros and cons? Unfortunately my math 
knowlwedge ist not sufficient for this.

-- 
Rolf Keller 

hi rolf,

> I'm FFTing speech signals and analyse the energy spectra. Because > these signals flutter extremely, I average 16 or 32 FFT results. > So far, I've taken the energy, i. e. the value |Re|^2 + |Im|^2, > of any indiviual bin and averaged 10 or 20 of these values.
i prefer to use amplitude = sqrt(re*re + im*im) and logamp = log2(amplitude) would you tell us about your samplerate, window-size and amount of overlapping? 10 or 20 frames can be a lot, you might sum up different vocals and consonants.
> But recently I read that this "incoherent" averaging does not > reduce the noise power. This seems correct to me. Additionally I > read that the noise power can be reduced by "coherent" averaging, > i. e. by taking the values |Re| and |Im|, average them and > compute the energy AFTER averaging. > > Any ideas on that? Pros and cons? Unfortunately my math > knowlwedge ist not sufficient for this.
did you care about subtracting the calculated noise-floor from the FFTed signal??? you can smear the bin-amplitudes in two dimensions, once with their previous values and another time with adjacent bins. then you end up with an adaptive noise-profile, you can later subtract from the FFTed signal. but i don't see why this should not reduce noise. the smearing will reduce fluctuations, while the final subtraction reduces the average noise, of course not completely, but usually enough for proper speech-recognition. hope it helps, carsten neubauer http://www.c14sw.de
>But recently I read that this "incoherent" averaging does not >reduce the noise power. This seems correct to me.
But it does increase the SNR.
Rolf Keller wrote:

> I'm FFTing speech signals and analyse the energy spectra. Because these > signals flutter extremely, I average 16 or 32 FFT results. So far, I've > taken the energy, i. e. the value |Re|^2 + |Im|^2, of any indiviual bin > and averaged 10 or 20 of these values.
> But recently I read that this "incoherent" averaging does not reduce > the noise power. This seems correct to me. Additionally I read that the > noise power can be reduced by "coherent" averaging, i. e. by taking the > values |Re| and |Im|, average them and compute the energy AFTER averaging.
Coherent averaging would seem to me to require the source signals to have the same phase. That seems unlikely with voice signals. Most of the discussion about coherence comes from optics, though it could be applied in other cases. You might look at an optics book. If you have two incoherent audio sources (two people screaming) the power would be twice that of one source. If you have two coherent sources (two amplifiers and speakers with one signal generator) the power (intensity) you measure will be somewhere between zero and four times that of one source, depending on constructive or destructive interference. With incoherent sources you add intensity (power), with coherent sources in phase you add the amplitude and then square for intensity (power). -- glen
I am hoping someone in this thread could clarify an issue for me - I am
currently doing a basic app that takes in the speech signal and I would
like to compute the signal energy for that particular speech signal.

Do I need to transform the speech signal via FFT and then compute the
signal energy or could I just compute the signal energy without needing
to transform the signal...

what I have understood from the above posts is that I would not need to
transform the signal via FFT - but why is it that in some DSP books
they perform the signal energy calc on the output of the FFT?

thanks
angelo

beat wrote:
> I am hoping someone in this thread could clarify an issue for me - I am > currently doing a basic app that takes in the speech signal and I would > like to compute the signal energy for that particular speech signal. > > Do I need to transform the speech signal via FFT and then compute the > signal energy or could I just compute the signal energy without needing > to transform the signal... > > what I have understood from the above posts is that I would not need to > transform the signal via FFT - but why is it that in some DSP books > they perform the signal energy calc on the output of the FFT? > > thanks > angelo >
Parseval's equality tells us the sum of the squared time sequence is related to the sum of the squared (magnitude) of the fourier sequence by a constant factor i.e. sum( abs(x).^2 ) == K * sum( abs(fft(x)).^2 ) where K depends on the fft algorithm, and the length of the dft. -- Mark
In simpler words, since the DFT is unitary, the euclidian length of a
vector is invariant under the DFT.

Regards,
Andor

"Andor" <an2or@mailcircuit.com> wrote in message
news:1115889764.864637.65850@g43g2000cwa.googlegroups.com...
> In simpler words, since the DFT is unitary, the euclidian length of a > vector is invariant under the DFT.
Andor, Perhaps the words are shorter (or the sentence is) but by no means are they 'simpler' :-) It took me a while to digest what you said and I'm still not very sure if I truly understand it's implication. Cheers Bhaskar
> > Regards, > Andor >
Bhaskar Thiagarajan wrote:

> > In simpler words, since the DFT is unitary, the euclidian length of
a
> > vector is invariant under the DFT. > > Andor, > Perhaps the words are shorter (or the sentence is) but by no means
are they
> 'simpler' :-) > It took me a while to digest what you said and I'm still not very
sure if I
> truly understand it's implication.
Allow me in this case to elaborate by giving the definitions: Let V be some n-dimensional (wher n < infinity) complex vector space with an inner product. Write the inner product for vectors x and y as (x,y), meaning (x,y) := x^H y = sum_{k=1}^n x_k* y_k, where x^H is the adjoint of x (write x as a column vector and x^H as the complex conjugate row vector). Let F: V->V be a linear map, and F^H be its adjoint (complex conjugate transpose). By definition, if F^H F = 1 (where "1" denotes the identity map on V) then F is called unitary. The simple consequence of this definition is that (F x, F y) = (F x)^H (F y) = x^H F^H F y = x^H y = (x, y). Now if you let F be the DFT, and X = F x, then Parseval's relation is a simple consequence of the unitarity of F: (x, x) = (X, X). Note that for the DFT to be unitary, you need to scale the matrices F and F^H by a factor 1/sqrt(n). This is usual in physics but not so in ee, but I find it makes notation a lot more compact. Regards, Andor
Andor wrote:
> Bhaskar Thiagarajan wrote: > > >>>In simpler words, since the DFT is unitary, the euclidian length of >>>vector is invariant under the DFT. >> >>Andor, >>Perhaps the words are shorter (or the sentence is) but by no means > > are they > >>'simpler' :-)
> > Allow me in this case to elaborate by giving the definitions: Let V be > some n-dimensional (wher n < infinity) complex vector space with an > inner product. Write the inner product for vectors x and y as (x,y), > meaning > > (x,y) := x^H y = sum_{k=1}^n x_k* y_k, > > where x^H is the adjoint of x (write x as a column vector and x^H as > the complex conjugate row vector). Let F: V->V be a linear map, and F^H > be its adjoint (complex conjugate transpose). By definition, if > > F^H F = 1 (where "1" denotes the identity map on V) > > then F is called unitary. The simple consequence of this definition is > that > > (F x, F y) = (F x)^H (F y) = x^H F^H F y = x^H y = (x, y). > > Now if you let F be the DFT, and X = F x, then Parseval's relation is a > simple consequence of the unitarity of F: > > (x, x) = (X, X). > > Note that for the DFT to be unitary, you need to scale the matrices F > and F^H by a factor 1/sqrt(n). This is usual in physics but not so in > ee, but I find it makes notation a lot more compact. > > Regards, > Andor >
Ah, yes. Much simpler. *POP* *(struggles to extract tongue from cheek)* Sorry Andor, I just couldn't resist. I view Parseval's equality in terms of what it means to the practitioner, not the mathematical properties that make it so. I'm sure both viewpoints are needed at various times. From a practitioner's standpoint, it allows one to make a sliding power detecter by squaring each sample and boxcar filtering that sequence -- both operations have minimal cpu requirements independent of the length of the window. Pretty nifty. -- Mark