DSPRelated.com
Forums

Does the 'square of an fft' mean anything?

Started by Richard Owlett December 13, 2003
I'm experimenting with analyzing speech.
I'm also more comfortable thinking in terms of a "spectrum analyzer 
display" than the "spectrograms" that I see on most speech recognition 
pages.

The question is "How does the spectrum of a speech sample vary with time?"

My first approach was:
  1. calculate fft of a window
  2. plot it
  3. calculate fft of a window offset from first window
  4. plot it in same space as 1st, but offset slightly vertically and 
horizontally to have psudeo-3D effect.

This showed some of the structure of interest. But i was, for want of 
better term, noisy. On impulse, I plotted the square of the value in 
each bin rather than it's value. This allowed me to "see" more clearly 
what was happening in time and frequency. [ I've even discovered 
formants ;/]

So does the square of an fft "mean" anything?


I believe if your signal is a linear, stationary stochastic process (which
speech is not, but you're using short FFTs so it's not a terrible problem
anyway) then taking the square of the FFT provides you with an estimate of
the power spectral density.  So to partially answer your question, the
square of the FFT is used A LOT and gives you a PSD.  If you haven't tried
it already, you may want to view your PSDs on a log scale (or take the log
of them) as this will expose some of the lower level formants which may be
closer to the noise floor.  Again, this is standard procedure (and puts your
signal's PSD on a dB scale)...

PSDdB = 20*log10(abs(FFT(x))) = 10*log10(abs(FFT(x))^2)

"Richard Owlett" <rowlett@atlascomm.net> wrote in message
news:vtm7tv8vgg04d6@corp.supernews.com...
> I'm experimenting with analyzing speech. > I'm also more comfortable thinking in terms of a "spectrum analyzer > display" than the "spectrograms" that I see on most speech recognition > pages. > > The question is "How does the spectrum of a speech sample vary with time?" > > My first approach was: > 1. calculate fft of a window > 2. plot it > 3. calculate fft of a window offset from first window > 4. plot it in same space as 1st, but offset slightly vertically and > horizontally to have psudeo-3D effect. > > This showed some of the structure of interest. But i was, for want of > better term, noisy. On impulse, I plotted the square of the value in > each bin rather than it's value. This allowed me to "see" more clearly > what was happening in time and frequency. [ I've even discovered > formants ;/] > > So does the square of an fft "mean" anything? > >
Richard Owlett <rowlett@atlascomm.net> wrote in message news:<vtm7tv8vgg04d6@corp.supernews.com>...
> I'm experimenting with analyzing speech. > I'm also more comfortable thinking in terms of a "spectrum analyzer > display" than the "spectrograms" that I see on most speech recognition > pages. > > The question is "How does the spectrum of a speech sample vary with time?" > > My first approach was: > 1. calculate fft of a window > 2. plot it > 3. calculate fft of a window offset from first window > 4. plot it in same space as 1st, but offset slightly vertically and > horizontally to have psudeo-3D effect. > > This showed some of the structure of interest. But i was, for want of > better term, noisy. On impulse, I plotted the square of the value in > each bin rather than it's value. This allowed me to "see" more clearly > what was happening in time and frequency. [ I've even discovered > formants ;/] > > So does the square of an fft "mean" anything?
Hi Richard. The "sqaure of an FFT" means something, it even has a name, the "periodogram". It differs from the "usual" DFT in that you lose phase information that otherwise is kept in the amplitudes, although seldom plotted. The periodogram expresses the spectral power density spectrum (PSD)of the signal, i.e. in which frequency bands there is lot of signal energy, and in which frequency bands there is only noise. This is useful to the speech processing analysts. Some genuine experts on speech processing frequent this group so I will refrain from commenting on the application. The periodogram has been studied extensively in the literature, and has some well-known properties. For instance, the expected value of a periodogram coefficient is E[P(f)] = p(f) where P(f) is the estimated power and p(f) is the true power at frequency f. However, the variance of the estimate is large: Var[P(f)] = p^2(f) the square of the estimate. So, the periodogram basically tells you that you measured *something* (but I trust you knew that before computing the spectrogram...) but is somewhat vague in specifying exactly *what* you measured. So there is some incentive to improve on the spectrum estimate. Improving the spectrogram can be done in several ways. You can, for instance, average the periodogram in frequency domain. On further contemplation, one will find that multiplying the time window with a non-rectangular window function (Bartlett, Hamming, Hann, Kaiser, triangular,...) achieves this. The gain is a somewhat less noisy frequency spectrum obtained at the expense of less frequency resolution. There are some dangers in doing this, you need to choose the correct window function to guarantee the resulting PSD estimate is positive. Look for Blackman-Tukey Spectrum Estimators in the literature. The second possible approach is to compute several periodograms and average them in time. Again, the gain is a less noisy periodogram that is obtained at the expense of lesser time resolution of features. This is known as the Welch method. The third approach is a combination of these two approches, i.e. an average of several, windowed periodograms. The question remains whether such an approachwould yield useful spectral estimates. Last, I think your comment on data visualization and presentation is very interesting. The fact that different visualization methods work differently for differemt people and/or for presenting different effects, is lost on lots of people. What you describe is completely in line with some results I saw when I worked at a seismic research lab, i.e. that data processing and interpretation is a highly subjective dicipline. Rune
Rune Allnor wrote:
> Richard Owlett <rowlett@atlascomm.net> wrote in message news:<vtm7tv8vgg04d6@corp.supernews.com>... > >>I'm experimenting with analyzing speech. >>I'm also more comfortable thinking in terms of a "spectrum analyzer >>display" than the "spectrograms" that I see on most speech recognition >>pages. >> >>The question is "How does the spectrum of a speech sample vary with time?" >> >>My first approach was: >> 1. calculate fft of a window >> 2. plot it >> 3. calculate fft of a window offset from first window >> 4. plot it in same space as 1st, but offset slightly vertically and >>horizontally to have psudeo-3D effect. >> >>This showed some of the structure of interest. But i was, for want of >>better term, noisy. On impulse, I plotted the square of the value in >>each bin rather than it's value. This allowed me to "see" more clearly >>what was happening in time and frequency. [ I've even discovered >>formants ;/] >> >>So does the square of an fft "mean" anything? > > > Hi Richard. > > The "sqaure of an FFT" means something, it even has a name, the > "periodogram". It differs from the "usual" DFT in that you lose > phase information that otherwise is kept in the amplitudes, although > seldom plotted. > > The periodogram expresses the spectral power density spectrum (PSD)of > the signal, i.e. in which frequency bands there is lot of signal energy, > and in which frequency bands there is only noise. This is useful to the > speech processing analysts. Some genuine experts on speech processing > frequent this group so I will refrain from commenting on the application. > > The periodogram has been studied extensively in the literature, and > has some well-known properties. For instance, the expected value > of a periodogram coefficient is > > E[P(f)] = p(f) > > where P(f) is the estimated power and p(f) is the true power at > frequency f. However, the variance of the estimate is large: > > Var[P(f)] = p^2(f) >
You lost me somewhere in the last 2 paragraphs. When dealing with speech, in what sense can a certain power be "expected" at a particular frequency? Or are looking at the processing what the current phoneme is?
> the square of the estimate. So, the periodogram basically tells you that > you measured *something* (but I trust you knew that before computing > the spectrogram...) but is somewhat vague in specifying exactly *what* > you measured. So there is some incentive to improve on the spectrum > estimate. > > Improving the spectrogram can be done in several ways. You can, for > instance, average the periodogram in frequency domain. On further > contemplation, one will find that multiplying the time window with > a non-rectangular window function (Bartlett, Hamming, Hann, Kaiser, > triangular,...) achieves this. The gain is a somewhat less noisy > frequency spectrum obtained at the expense of less frequency resolution. > There are some dangers in doing this, you need to choose the correct > window function to guarantee the resulting PSD estimate is positive. > Look for Blackman-Tukey Spectrum Estimators in the literature. > > The second possible approach is to compute several periodograms and > average them in time. Again, the gain is a less noisy periodogram > that is obtained at the expense of lesser time resolution of features. > This is known as the Welch method. > > The third approach is a combination of these two approches, i.e. an > average of several, windowed periodograms. The question remains whether > such an approachwould yield useful spectral estimates. >
Would any of these be used when examining speech recorded under near ideal studio conditions [ The sample I'm experimenting with is a CD of Alexander Scourby reading Genesis 37:26. ]? Or are they used in analyzing speech in a noisy channel?
> Last, I think your comment on data visualization and presentation is > very interesting. The fact that different visualization methods work > differently for differemt people and/or for presenting different > effects, is lost on lots of people. What you describe is completely > in line with some results I saw when I worked at a seismic research > lab, i.e. that data processing and interpretation is a highly > subjective dicipline. >
And what you saw first. For me, hours in front of a Tektronix Spectrum Analyzer.
> Rune
Matt Roos wrote:

> I believe if your signal is a linear, stationary stochastic process (which > speech is not, but you're using short FFTs so it's not a terrible problem > anyway) then taking the square of the FFT provides you with an estimate of > the power spectral density. So to partially answer your question, the > square of the FFT is used A LOT and gives you a PSD. If you haven't tried > it already, you may want to view your PSDs on a log scale (or take the log > of them) as this will expose some of the lower level formants which may be > closer to the noise floor. Again, this is standard procedure (and puts your > signal's PSD on a dB scale)... > > PSDdB = 20*log10(abs(FFT(x))) = 10*log10(abs(FFT(x))^2)
I'll experiment with that. Since I'm using Scilab I'll be using natural logs, but that just means I'll plot in "nepers" rather than "bels". Haven't had a chance to use *that* term in 40 yrs. { Student radio station had one storage cabinet labeled "Danger 100,000 nepers".}
> > "Richard Owlett" <rowlett@atlascomm.net> wrote in message > news:vtm7tv8vgg04d6@corp.supernews.com... > >>I'm experimenting with analyzing speech. >>I'm also more comfortable thinking in terms of a "spectrum analyzer >>display" than the "spectrograms" that I see on most speech recognition >>pages. >> >>The question is "How does the spectrum of a speech sample vary with time?" >> >>My first approach was: >> 1. calculate fft of a window >> 2. plot it >> 3. calculate fft of a window offset from first window >> 4. plot it in same space as 1st, but offset slightly vertically and >>horizontally to have psudeo-3D effect. >> >>This showed some of the structure of interest. But i was, for want of >>better term, noisy. On impulse, I plotted the square of the value in >>each bin rather than it's value. This allowed me to "see" more clearly >>what was happening in time and frequency. [ I've even discovered >>formants ;/] >> >>So does the square of an fft "mean" anything? >> >> > > >
Richard Owlett wrote:

> I'm experimenting with analyzing speech. > I'm also more comfortable thinking in terms of a "spectrum analyzer > display" than the "spectrograms" that I see on most speech recognition > pages.
> The question is "How does the spectrum of a speech sample vary with time?"
(snip)
> So does the square of an fft "mean" anything?
The other posts so far don't seem to mention it, but you want the square of the magnitude (absolute value). As the FFT is complex, the distinction is important. The result is the sum of the square of the real and imaginary parts at each frequency. -- glen
glen herrmannsfeldt wrote:
> Richard Owlett wrote: > >> I'm experimenting with analyzing speech. >> I'm also more comfortable thinking in terms of a "spectrum analyzer >> display" than the "spectrograms" that I see on most speech recognition >> pages. > > >> The question is "How does the spectrum of a speech sample vary with >> time?" > > > (snip) > >> So does the square of an fft "mean" anything? > > > The other posts so far don't seem to mention it, but you want the > square of the magnitude (absolute value). As the FFT is complex, > the distinction is important. > > The result is the sum of the square of the real and imaginary > parts at each frequency.
Now if only someone could give me an an explanation of what a complex frequency is that made *intuitive* sense. Bugged me when i came across it while attempting an EE degree 40 yrs ago. Still bugs me. I accept that the math does work, so I play by the rules ;)
> > -- glen >
On Sat, 13 Dec 2003 18:44:47 -0600, Richard Owlett
<rowlett@atlascomm.net> wrote:


>Now if only someone could give me an an explanation of what a complex >frequency is that made *intuitive* sense. Bugged me when i came across >it while attempting an EE degree 40 yrs ago. Still bugs me. I accept >that the math does work, so I play by the rules ;)
Just think of it as another unit vector, one that points off over *that* way as opposed to the one that points over *this* way. A complex frequency just happens to have a particular orientation in an arbitrary frame of reference. -- Rich Webb Norfolk, VA
Richard Owlett wrote:

   ...

> Now if only someone could give me an an explanation of what a complex > frequency is that made *intuitive* sense. Bugged me when i came across > it while attempting an EE degree 40 yrs ago. Still bugs me. I accept > that the math does work, so I play by the rules ;)
It's actually pretty simple. The imaginary part of exp[(sigma+j*omega)*t] describes the periodic component. (When sigma is zero, we have an ordinary sinusoid.) Sigma describes the growth (if positive) or decay (if negative) of that sinusoid with time. Note that exp[(sigma+j*omega)*t] = exp(sigma*t) * exp(j*omega*t) = e^(sigma*t) * sin(omega*t). You could reasonably ask, "What's the big deal?" Jerry -- Engineering is the art of making what you want from things you can get. &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;
Jerry Avins wrote:

> Richard Owlett wrote:
>> Now if only someone could give me an an explanation of what a complex >> frequency is that made *intuitive* sense. Bugged me when i came across >> it while attempting an EE degree 40 yrs ago. Still bugs me. I accept >> that the math does work, so I play by the rules ;)
> It's actually pretty simple. The imaginary part of > exp[(sigma+j*omega)*t] describes the periodic component. (When sigma is > zero, we have an ordinary sinusoid.) Sigma describes the growth (if > positive) or decay (if negative) of that sinusoid with time. Note that > exp[(sigma+j*omega)*t] = exp(sigma*t) * exp(j*omega*t) = > e^(sigma*t) * sin(omega*t). You could reasonably ask, "What's the big > deal?"
Along with complex index of refraction, where the real part is the one you usually learn about, and the imaginary part is absorption. Not so different from the real and imaginary parts of impedance, though the real part is the reactance part, and the imaginary part is the resistance (absorption) part. Just about anything that could go into a sin() or cos() or exp() can have real and imaginary parts. -- glen