comp.dsp | Does the 'square of an fft' mean anything?

I'm experimenting with analyzing speech.
I'm also more comfortable thinking in terms of a "spectrum analyzer 
display" than the "spectrograms" that I see on most speech recognition 
pages.

The question is "How does the spectrum of a speech sample vary with time?"

My first approach was:
  1. calculate fft of a window
  2. plot it
  3. calculate fft of a window offset from first window
  4. plot it in same space as 1st, but offset slightly vertically and 
horizontally to have psudeo-3D effect.

This showed some of the structure of interest. But i was, for want of 
better term, noisy. On impulse, I plotted the square of the value in 
each bin rather than it's value. This allowed me to "see" more clearly 
what was happening in time and frequency. [ I've even discovered 
formants ;/]

So does the square of an fft "mean" anything?

Reply by Matt Roos ●December 13, 20032003-12-13

I believe if your signal is a linear, stationary stochastic process (which
speech is not, but you're using short FFTs so it's not a terrible problem
anyway) then taking the square of the FFT provides you with an estimate of
the power spectral density.  So to partially answer your question, the
square of the FFT is used A LOT and gives you a PSD.  If you haven't tried
it already, you may want to view your PSDs on a log scale (or take the log
of them) as this will expose some of the lower level formants which may be
closer to the noise floor.  Again, this is standard procedure (and puts your
signal's PSD on a dB scale)...

PSDdB = 20*log10(abs(FFT(x))) = 10*log10(abs(FFT(x))^2)

"Richard Owlett" <rowlett@atlascomm.net> wrote in message
news:vtm7tv8vgg04d6@corp.supernews.com...
> I'm experimenting with analyzing speech.
> I'm also more comfortable thinking in terms of a "spectrum analyzer
> display" than the "spectrograms" that I see on most speech recognition
> pages.
>
> The question is "How does the spectrum of a speech sample vary with time?"
>
> My first approach was:
>   1. calculate fft of a window
>   2. plot it
>   3. calculate fft of a window offset from first window
>   4. plot it in same space as 1st, but offset slightly vertically and
> horizontally to have psudeo-3D effect.
>
> This showed some of the structure of interest. But i was, for want of
> better term, noisy. On impulse, I plotted the square of the value in
> each bin rather than it's value. This allowed me to "see" more clearly
> what was happening in time and frequency. [ I've even discovered
> formants ;/]
>
> So does the square of an fft "mean" anything?
>
>

Reply by Rune Allnor ●December 13, 20032003-12-13

Richard Owlett <rowlett@atlascomm.net> wrote in message news:<vtm7tv8vgg04d6@corp.supernews.com>...
> I'm experimenting with analyzing speech.
> I'm also more comfortable thinking in terms of a "spectrum analyzer 
> display" than the "spectrograms" that I see on most speech recognition 
> pages.
> 
> The question is "How does the spectrum of a speech sample vary with time?"
> 
> My first approach was:
>   1. calculate fft of a window
>   2. plot it
>   3. calculate fft of a window offset from first window
>   4. plot it in same space as 1st, but offset slightly vertically and 
> horizontally to have psudeo-3D effect.
> 
> This showed some of the structure of interest. But i was, for want of 
> better term, noisy. On impulse, I plotted the square of the value in 
> each bin rather than it's value. This allowed me to "see" more clearly 
> what was happening in time and frequency. [ I've even discovered 
> formants ;/]
> 
> So does the square of an fft "mean" anything?

Hi Richard. 

The "sqaure of an FFT" means something, it even has a name, the 
"periodogram". It differs from the "usual" DFT in that you lose 
phase information that otherwise is kept in the amplitudes, although 
seldom plotted. 

The periodogram expresses the spectral power density spectrum (PSD)of 
the signal, i.e. in which frequency bands there is lot of signal energy, 
and in which frequency bands there is only noise. This is useful to the 
speech processing analysts. Some genuine experts on speech processing 
frequent this group so I will refrain from commenting on the application. 

The periodogram has been studied extensively in the literature, and 
has some well-known properties. For instance, the expected value 
of a periodogram coefficient is

  E[P(f)] = p(f)

where P(f) is the estimated power and p(f) is the true power at 
frequency f. However, the variance of the estimate is large:

  Var[P(f)] = p^2(f)

the square of the estimate. So, the periodogram basically tells you that 
you measured *something* (but I trust you knew that before computing 
the spectrogram...) but is somewhat vague in specifying exactly *what*
you measured. So there is some incentive to improve on the spectrum 
estimate.

Improving the spectrogram can be done in several ways. You can, for 
instance, average the periodogram in frequency domain. On further 
contemplation, one will find that multiplying the time window with 
a non-rectangular window function (Bartlett, Hamming, Hann, Kaiser, 
triangular,...) achieves this. The gain is a somewhat less noisy 
frequency spectrum obtained at the expense of less frequency resolution.
There are some dangers in doing this, you need to choose the correct
window function to guarantee the resulting PSD estimate is positive.
Look for Blackman-Tukey Spectrum Estimators in the literature.

The second possible approach is to compute several periodograms and 
average them in time. Again, the gain is a less noisy periodogram 
that is obtained at the expense of lesser time resolution of features.
This is known as the Welch method.

The third approach is a combination of these two approches, i.e. an 
average of several, windowed periodograms. The question remains whether 
such an approachwould yield useful spectral estimates.

Last, I think your comment on data visualization and presentation is 
very interesting. The fact that different visualization methods work 
differently for differemt people and/or for presenting different 
effects, is lost on lots of people. What you describe is completely 
in line with some results I saw when I worked at a seismic research 
lab, i.e. that data processing and interpretation is a highly 
subjective dicipline.

Rune

Reply by Richard Owlett ●December 13, 20032003-12-13

Rune Allnor wrote:
> Richard Owlett <rowlett@atlascomm.net> wrote in message news:<vtm7tv8vgg04d6@corp.supernews.com>...
> 
>>I'm experimenting with analyzing speech.
>>I'm also more comfortable thinking in terms of a "spectrum analyzer 
>>display" than the "spectrograms" that I see on most speech recognition 
>>pages.
>>
>>The question is "How does the spectrum of a speech sample vary with time?"
>>
>>My first approach was:
>>  1. calculate fft of a window
>>  2. plot it
>>  3. calculate fft of a window offset from first window
>>  4. plot it in same space as 1st, but offset slightly vertically and 
>>horizontally to have psudeo-3D effect.
>>
>>This showed some of the structure of interest. But i was, for want of 
>>better term, noisy. On impulse, I plotted the square of the value in 
>>each bin rather than it's value. This allowed me to "see" more clearly 
>>what was happening in time and frequency. [ I've even discovered 
>>formants ;/]
>>
>>So does the square of an fft "mean" anything?
> 
> 
> Hi Richard. 
> 
> The "sqaure of an FFT" means something, it even has a name, the 
> "periodogram". It differs from the "usual" DFT in that you lose 
> phase information that otherwise is kept in the amplitudes, although 
> seldom plotted. 
> 
> The periodogram expresses the spectral power density spectrum (PSD)of 
> the signal, i.e. in which frequency bands there is lot of signal energy, 
> and in which frequency bands there is only noise. This is useful to the 
> speech processing analysts. Some genuine experts on speech processing 
> frequent this group so I will refrain from commenting on the application. 
> 
> The periodogram has been studied extensively in the literature, and 
> has some well-known properties. For instance, the expected value 
> of a periodogram coefficient is
> 
>   E[P(f)] = p(f)
> 
> where P(f) is the estimated power and p(f) is the true power at 
> frequency f. However, the variance of the estimate is large:
> 
>   Var[P(f)] = p^2(f)
> 

You lost me somewhere in the last 2 paragraphs.
When dealing with speech, in what sense can a certain power be 
"expected" at a particular frequency?
Or are looking at the processing what the current phoneme is?

> the square of the estimate. So, the periodogram basically tells you that 
> you measured *something* (but I trust you knew that before computing 
> the spectrogram...) but is somewhat vague in specifying exactly *what*
> you measured. So there is some incentive to improve on the spectrum 
> estimate.
> 
> Improving the spectrogram can be done in several ways. You can, for 
> instance, average the periodogram in frequency domain. On further 
> contemplation, one will find that multiplying the time window with 
> a non-rectangular window function (Bartlett, Hamming, Hann, Kaiser, 
> triangular,...) achieves this. The gain is a somewhat less noisy 
> frequency spectrum obtained at the expense of less frequency resolution.
> There are some dangers in doing this, you need to choose the correct
> window function to guarantee the resulting PSD estimate is positive.
> Look for Blackman-Tukey Spectrum Estimators in the literature.
> 
> The second possible approach is to compute several periodograms and 
> average them in time. Again, the gain is a less noisy periodogram 
> that is obtained at the expense of lesser time resolution of features.
> This is known as the Welch method.
> 
> The third approach is a combination of these two approches, i.e. an 
> average of several, windowed periodograms. The question remains whether 
> such an approachwould yield useful spectral estimates.
> 

Would any of these be used when examining speech recorded under near 
ideal studio conditions [ The sample I'm experimenting with is a CD of 
Alexander Scourby reading Genesis 37:26. ]? Or are they used in 
analyzing speech in a noisy channel?




> Last, I think your comment on data visualization and presentation is 
> very interesting. The fact that different visualization methods work 
> differently for differemt people and/or for presenting different 
> effects, is lost on lots of people. What you describe is completely 
> in line with some results I saw when I worked at a seismic research 
> lab, i.e. that data processing and interpretation is a highly 
> subjective dicipline.
> 

And what you saw first. For me, hours in front of a Tektronix Spectrum 
Analyzer.


> Rune

Reply by Richard Owlett ●December 13, 20032003-12-13

Matt Roos wrote:

> I believe if your signal is a linear, stationary stochastic process (which
> speech is not, but you're using short FFTs so it's not a terrible problem
> anyway) then taking the square of the FFT provides you with an estimate of
> the power spectral density.  So to partially answer your question, the
> square of the FFT is used A LOT and gives you a PSD.  If you haven't tried
> it already, you may want to view your PSDs on a log scale (or take the log
> of them) as this will expose some of the lower level formants which may be
> closer to the noise floor.  Again, this is standard procedure (and puts your
> signal's PSD on a dB scale)...
> 
> PSDdB = 20*log10(abs(FFT(x))) = 10*log10(abs(FFT(x))^2)

I'll experiment with that. Since I'm using Scilab I'll be using 
natural logs, but that just means I'll plot in "nepers" rather than 
"bels". Haven't had a chance to use *that* term in 40 yrs.
{ Student radio station had one storage cabinet labeled "Danger 
100,000 nepers".}

> 
> "Richard Owlett" <rowlett@atlascomm.net> wrote in message
> news:vtm7tv8vgg04d6@corp.supernews.com...
> 
>>I'm experimenting with analyzing speech.
>>I'm also more comfortable thinking in terms of a "spectrum analyzer
>>display" than the "spectrograms" that I see on most speech recognition
>>pages.
>>
>>The question is "How does the spectrum of a speech sample vary with time?"
>>
>>My first approach was:
>>  1. calculate fft of a window
>>  2. plot it
>>  3. calculate fft of a window offset from first window
>>  4. plot it in same space as 1st, but offset slightly vertically and
>>horizontally to have psudeo-3D effect.
>>
>>This showed some of the structure of interest. But i was, for want of
>>better term, noisy. On impulse, I plotted the square of the value in
>>each bin rather than it's value. This allowed me to "see" more clearly
>>what was happening in time and frequency. [ I've even discovered
>>formants ;/]
>>
>>So does the square of an fft "mean" anything?
>>
>>
> 
> 
>

Reply by glen herrmannsfeldt ●December 13, 20032003-12-13

Richard Owlett wrote:

> I'm experimenting with analyzing speech.
> I'm also more comfortable thinking in terms of a "spectrum analyzer 
> display" than the "spectrograms" that I see on most speech recognition 
> pages.

> The question is "How does the spectrum of a speech sample vary with time?"

(snip)

> So does the square of an fft "mean" anything?

The other posts so far don't seem to mention it, but you want the
square of the magnitude (absolute value).  As the FFT is complex,
the distinction is important.

The result is the sum of the square of the real and imaginary
parts at each frequency.

-- glen

Reply by Richard Owlett ●December 13, 20032003-12-13

glen herrmannsfeldt wrote:
> Richard Owlett wrote:
> 
>> I'm experimenting with analyzing speech.
>> I'm also more comfortable thinking in terms of a "spectrum analyzer 
>> display" than the "spectrograms" that I see on most speech recognition 
>> pages.
> 
> 
>> The question is "How does the spectrum of a speech sample vary with 
>> time?"
> 
> 
> (snip)
> 
>> So does the square of an fft "mean" anything?
> 
> 
> The other posts so far don't seem to mention it, but you want the
> square of the magnitude (absolute value).  As the FFT is complex,
> the distinction is important.
> 
> The result is the sum of the square of the real and imaginary
> parts at each frequency.

Now if only someone could give me an an explanation of what a complex 
frequency is that made *intuitive* sense. Bugged me when i came across 
it while attempting an EE degree 40 yrs ago. Still bugs me. I accept 
that the math does work, so I play by the rules ;)

> 
> -- glen
>

Reply by Rich Webb ●December 13, 20032003-12-13

On Sat, 13 Dec 2003 18:44:47 -0600, Richard Owlett
<rowlett@atlascomm.net> wrote:

>Now if only someone could give me an an explanation of what a complex 
>frequency is that made *intuitive* sense. Bugged me when i came across 
>it while attempting an EE degree 40 yrs ago. Still bugs me. I accept 
>that the math does work, so I play by the rules ;)

Just think of it as another unit vector, one that points off over *that*
way as opposed to the one that points over *this* way. A complex
frequency just happens to have a particular orientation in an arbitrary
frame of reference. 

-- 
Rich Webb   Norfolk, VA

Reply by Jerry Avins ●December 14, 20032003-12-14

Richard Owlett wrote:

   ...

> Now if only someone could give me an an explanation of what a complex 
> frequency is that made *intuitive* sense. Bugged me when i came across 
> it while attempting an EE degree 40 yrs ago. Still bugs me. I accept 
> that the math does work, so I play by the rules ;)

It's actually pretty simple. The imaginary part of
exp[(sigma+j*omega)*t] describes the periodic component. (When sigma is
zero, we have an ordinary sinusoid.) Sigma describes the growth (if
positive) or decay (if negative) of that sinusoid with time. Note that
exp[(sigma+j*omega)*t] = exp(sigma*t) * exp(j*omega*t) =
e^(sigma*t) * sin(omega*t). You could reasonably ask, "What's the big
deal?"

Jerry
-- 
Engineering is the art of making what you want from things you can get.
&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;

Reply by glen herrmannsfeldt ●December 14, 20032003-12-14

Jerry Avins wrote:

> Richard Owlett wrote:

>> Now if only someone could give me an an explanation of what a complex 
>> frequency is that made *intuitive* sense. Bugged me when i came across 
>> it while attempting an EE degree 40 yrs ago. Still bugs me. I accept 
>> that the math does work, so I play by the rules ;)

> It's actually pretty simple. The imaginary part of
> exp[(sigma+j*omega)*t] describes the periodic component. (When sigma is
> zero, we have an ordinary sinusoid.) Sigma describes the growth (if
> positive) or decay (if negative) of that sinusoid with time. Note that
> exp[(sigma+j*omega)*t] = exp(sigma*t) * exp(j*omega*t) =
> e^(sigma*t) * sin(omega*t). You could reasonably ask, "What's the big
> deal?"

Along with complex index of refraction, where the real part is the one
you usually learn about, and the imaginary part is absorption.

Not so different from the real and imaginary parts of impedance, though
the real part is the reactance part, and the imaginary part is the
resistance (absorption) part.

Just about anything that could go into a sin() or cos() or exp() can
have real and imaginary parts.

-- glen

Previous12 Next

Does the 'square of an fft' mean anything?

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group