Forums

autocorrelation for frequency estimation question

Started by Unknown August 9, 2003
When using an fft for frequency estimation of a complex audio waveform
(singing voice, piano, etc.), the bin representing half the fundamental
frequency will be mostly empty.  Won't the bin of the autocorrelation
result for double the wavelength of the fundamental frequency have some
energy in it?  How does one handle this false positive?

Also, the fft of short segments requires windowing for reasonable 
results.  Should the segments being autocorrelated also be windowed?

Is the segment duration/frequency resolution trade-off of using an
autocorrelation function similar to that of a complex DFT?  Seems
like they should be about the same.

What's the best textbook on this subject?

Thanks.

-- 
Ron Nicholson   rhn AT nicholson DOT com   http://www.nicholson.com/rhn/ 
#include <canonical.disclaimer>        // only my own opinions, etc.
Ronald H. Nicholson, Jr. <rhn@nojunk.rahul.net> writes:

> When using an fft for frequency estimation of a complex audio waveform > (singing voice, piano, etc.), the bin representing half the fundamental > frequency will be mostly empty. Won't the bin of the autocorrelation > result for double the wavelength of the fundamental frequency have some > energy in it? How does one handle this false positive?
If you're doing period detection (i.e. using the autocorrelation in the time domain), then you will come across problems at any integer multiple of the true frequency. Of course, you can normally discount the case for '0', but it will depend on your expected wavelength.
> Also, the fft of short segments requires windowing for reasonable > results. Should the segments being autocorrelated also be windowed?
It depends on the sort of estimator you use. Some autocorrlation estimators include a triangular window (effectively) others include the opposite (to remove the triangular window effect).
> Is the segment duration/frequency resolution trade-off of using an > autocorrelation function similar to that of a complex DFT? Seems > like they should be about the same.
They're probably close; that's why many "high resolution" techniques use the autocorrelation matrix.
> What's the best textbook on this subject?
Barry Quinn's and Ted Hannan's book on Frequency Estimation is probably the best. Ciao, Peter K. -- Peter J. Kootsookos "Na, na na na na na na, na na na na" - 'Hey Jude', Lennon/McCartney
Ronald H. Nicholson, Jr. <rhn@nojunk.rahul.net> wrote in message news:<bh3c9e$qua$1@blue.rahul.net>...
> When using an fft for frequency estimation of a complex audio waveform > (singing voice, piano, etc.), the bin representing half the fundamental > frequency will be mostly empty.
but it's also possible that the bin at the fundamental has zero energy also. just because a frequency is contained in a quasi-periodic signal doesn't mean it's the fundamental and just because a frequency is missing in the quasi-periodic signal doesn't mean it can't be the fundamental frequency.
> Won't the bin of the autocorrelation > result for double the wavelength of the fundamental frequency have some > energy in it?
yes, double the period of a periodic function is also a period. so is any other integer multiple of the period also a period. that is a 100 Hz waveform can also be considered mathematically to be a 50 Hz waveform. after all, it repeats 50 times per second.
> How does one handle this false positive?
you have to define *why* that is a false positive and then you can write code to handle it. this octave thing has always been a little problem for Pitch Detect Algorithms (PDAs). what do you do when a 100 Hz waveform has another 50 Hz sine wave, that is 80 dB quieter, added to it? mathematically, it is a 50 Hz waveform and not a 100 Hz, but i doubt it would be perceived as such.
> Also, the fft of short segments requires windowing for reasonable > results. Should the segments being autocorrelated also be windowed?
i would window the "fixed" piece x(t) but not the sliding piece x(t-tau).
> Is the segment duration/frequency resolution trade-off of using an > autocorrelation function similar to that of a complex DFT? Seems > like they should be about the same.
i dunno. the duration needs to be long enough for the longest anticipated period. other than that, i don't think there is an issue about segment length.
> What's the best textbook on this subject?
beats the hellouta me.
> Thanks.
FWIW. r b-j
Ronald H. Nicholson, Jr. <rhn@nojunk.rahul.net> wrote in message news:<bh3c9e$qua$1@blue.rahul.net>...
> When using an fft for frequency estimation of a complex audio waveform > (singing voice, piano, etc.), the bin representing half the fundamental > frequency will be mostly empty. Won't the bin of the autocorrelation > result for double the wavelength of the fundamental frequency have some > energy in it? How does one handle this false positive?
What do you mean by "false positive"? From your description it appears you are talking about the first harmonic. The harmonics are very important in sound, as they contribute to determine how we percieve the sound. When a trumpet and a clarinet that plays the same tone, sound differently, it's because the harmonics in the tone are different.
> Also, the fft of short segments requires windowing for reasonable > results. Should the segments being autocorrelated also be windowed?
Yes they should. Find a text on statistical signal processing and look up the difference between the "periodogram" and "Welch's method". The difference is in windowing and averaging.
> Is the segment duration/frequency resolution trade-off of using an > autocorrelation function similar to that of a complex DFT? Seems > like they should be about the same.
They are the same. If a method can be described in terms of the DFT, the properties of the DFT applies to the method.
> What's the best textbook on this subject?
For spectrum estimation (no application to sound or music) I would suggest Kay: "Modern Spectral Estimation with Applications" Prentice-Hall, 1988. Rune
In article <f56893ae.0308100014.5f07180d@posting.google.com>,
Rune Allnor <allnor@tele.ntnu.no> wrote:
>Ronald H. Nicholson, Jr. <rhn@nojunk.rahul.net> wrote in message news:<bh3c9e$qua$1@blue.rahul.net>... >> When using an fft for frequency estimation of a complex audio waveform >> (singing voice, piano, etc.), the bin representing half the fundamental >> frequency will be mostly empty. Won't the bin of the autocorrelation >> result for double the wavelength of the fundamental frequency have some >> energy in it? How does one handle this false positive? > >What do you mean by "false positive"? From your description it appears >you are talking about the first harmonic.
No. I was asking about twice the wavelength, which is half the "fundamental" frequency, not twice the frequency (the first harmonic). Since an X Hz repeating waveform is also a X/2 Hz repeating waveform, but is not perceived as such by humans in audio, my question is how to remove this mathematically correct, but perceptually non-existant frequency from the list of potential estimates. One posssible method, which depends on the waveform of course, is to check for the presence of any energy in the odd harmonics. Comments? IMHO. YMMV. -- Ron Nicholson rhn AT nicholson DOT com http://www.nicholson.com/rhn/ #include <canonical.disclaimer> // only my own opinions, etc.
Ronald H. Nicholson, Jr. <rhn@nojunk.rahul.net> wrote in message news:<bh6et0$glo$1@blue.rahul.net>...
> In article <f56893ae.0308100014.5f07180d@posting.google.com>, > Rune Allnor <allnor@tele.ntnu.no> wrote: > >Ronald H. Nicholson, Jr. <rhn@nojunk.rahul.net> wrote in message news:<bh3c9e$qua$1@blue.rahul.net>... > >> When using an fft for frequency estimation of a complex audio waveform > >> (singing voice, piano, etc.), the bin representing half the fundamental > >> frequency will be mostly empty. Won't the bin of the autocorrelation > >> result for double the wavelength of the fundamental frequency have some > >> energy in it? How does one handle this false positive? > > > >What do you mean by "false positive"? From your description it appears > >you are talking about the first harmonic. > > No. I was asking about twice the wavelength, which is half the > "fundamental" frequency, not twice the frequency (the first harmonic).
You're right. I didn't read your post properly.
> Since an X Hz repeating waveform is also a X/2 Hz repeating waveform, > but is not perceived as such by humans in audio, my question is how > to remove this mathematically correct, but perceptually non-existant > frequency from the list of potential estimates. > > One posssible method, which depends on the waveform of course, is to > check for the presence of any energy in the odd harmonics. Comments?
That may provide you with some help, yes. Having said that, keep in mind that the ear weights the spectrum quite severely (check out A-weighting with the audiologists). You don't mention if you have used this, so probably you haven't. However, if you have used some sort of measuring equipment meant for use with audiology, you should check to see whether some sort of audiological weighting was used in the processing of your measurements. FWIW, Rune