DSPRelated.com
Forums

Pitch Estimation using Autocorrelation

Started by olivers September 7, 2005
Rune Allnor wrote:

>>>Where did you find that? I am not aware of any simple relation between >>>the time-domain autocorrelation function and the pitch of the signal. >> >>If the signal does have a sinusoidal component at period T, then when >>correlating with the version of the signal shifted by T, there will be >>a peak, corresponding to 1/T and all of the multiples (the harmonics). >>In fact, when shifted by T/2, there will be a peak with negative value, >>provided that there are no components of lower frequency. > > > OK, I am sure you are right, provided the signal consists of a single > sinusoidal. If there are more sinusoidals, or noise present...
Say that the signal X(t) consists of a sinusioid, s(t) plus the rest; let's call the rest y(t). X(t-T) consists of the same sinusoid s(t) (since s(t-T) = s(t)), plus y(t-T). But since y(t-T) is a linear combination (in an infinite-dimensional base) of sinusoidal components WITHOUT the frequency component of s(t), then y(t-T) and s(t) are uncorrelated. So, the correlation between X(t) and X(t-T) is: oo / | | X(t) * X(t-T) dt = {s(t) + y(t)} * {s(t) + y(t-T)} = | / -oo = s^2(t) + s(t)*{y(t) + y(t-T)} + y(t)*y(t-T) The first term is whatever value it is (greater than zero). The second term is zero, since s(t) and y (at whatever offset) are uncorrelated. And I think my mistake/oversight (which I see that Jerry partly clarified it) was to assume that because y(t) has no component of the same frequency as s(t), corresponding to period T, the integral of y(t)*y(t-T) would be zero, which is not the case (I mean, even putting aside the harmonics -- let's assume that y(t) has no frequencies that are a multiple of s(t)'s frequency -- even then, the integral is *not* zero). Now, if y(t) is just noise, then of course, all the terms except s^2(t) become zero when taking their integral from -oo to oo. If there are no frequencies close to s(t)'s frequency, it seems intuitive that the integral will be considerably less than the integral of s^2(t) -- since s^2(t) is non-negative, coming from the fact that it is the product of two perfectly aligned sinusoids. If y(t) has no frequencies that are close, then the product will always be frequencies that are far from aligned, and the integral will tend to have a very low value. As Jerry pointed out, also, the pitch is typically determined/ associated to the lowest frequency that is *predominant* -- under that assumption, all the other terms in the correlation integral are much lower (possibly negligible) than the term from s^2(t). Again, this simply justifies (empirically/intuitively, which I know, is dangerous :-)) that the correlation should disclose information about the pitch -- but that doesn't mean that it is a good idea or that the autocorrelation method is well-suited for the task. Computing the DFT (via FFT) sounds definitely like a better idea. Carlos --
olivers wrote:
> From what I understand the first minimum of an autocorrelation function > (say 1200 samples with a lag between 0 and 600) will give me the sample > value which can be directly mapped to frequency and thus pitch. > > I have tried to extract this minima from my autocorrelation result with > varied results. I tried using a C (third fret 5th string) on my guitar and > got a sample value for the first minimum which varied between 68 and 75. > From my conversion chart > (http://grace.evergreen.edu/~arunc/intro_doc/node12.htm#SECTION00092000000000000000) > > this corresponds to a note which varies between d4 and e4 which is clearly > incorrect. > > I thought maybe that I am doing something wrong in the extraction of the > first minimum. Another article I have just read indicates that the minima > represents half the period where the waveform is out of phase thus, the > maxima indicates the period of the waveform and directly relates to the > pitch. At a guess my value of 75 (half the period) which translates to 150 > is still wrong. > > I am identifying the first minimum by searching for the first change in > sign indicating the first zero crossing. Is this how its done?
If the first minima is negative, it might be between the first two zero crossings. Or you might wan to look for a sign change in the first derivative. IMHO. YMMV. -- rhn A.T nicholson D.o.T c-O-m
Carlos Moreno wrote:
> Rune Allnor wrote: > > >>>Where did you find that? I am not aware of any simple relation between > >>>the time-domain autocorrelation function and the pitch of the signal. > >> > >>If the signal does have a sinusoidal component at period T, then when > >>correlating with the version of the signal shifted by T, there will be > >>a peak, corresponding to 1/T and all of the multiples (the harmonics). > >>In fact, when shifted by T/2, there will be a peak with negative value, > >>provided that there are no components of lower frequency. > > > > > > OK, I am sure you are right, provided the signal consists of a single > > sinusoidal. If there are more sinusoidals, or noise present...
[-- snip --]
> Again, this simply justifies (empirically/intuitively, which I > know, is dangerous :-)) that the correlation should disclose > information about the pitch -- but that doesn't mean that it is > a good idea or that the autocorrelation method is well-suited > for the task. Computing the DFT (via FFT) sounds definitely > like a better idea.
If you got the impression that I think the autocorrelation function does not contain information about the pitch, then I phrased my post poorly. Of course that information is kept somewhere in there. I just doubt whether that information can be extracted from the time-domain representation of the autocorrelation function. It would make more sense to me to look at it in frequency domain. Rune
Well, well, well...

Nothing unusual... comp.dsp "experts" discussing something they
don't have a clue about and giving wrong answers to a general
public...
I was going to skip this discussion but just couldn't resist...

Just where did you guys go to school?
Do you know anything other than DFT?
(I guess RBJ also knows ASDF).

What the heck are you talking about?
Did it ever occur to you that there is no such thing as frequency
domain for a general type of (nonlinear) signals ?
It's only a useful abstraction that works well for linear systems and
signals theory.
Fundamental frequency is just the inverse of the fundamental, or pitch,
period, which characterizes the smallest distance between repeating
patterns in the time-domain waveform.

Fundamental, or pitch, period (usually) corresponds to the first
maximum in the autocorrelation function, unless you window is too
short, or your signal is too complex (has formants), or it is
non-stationary or not quite periodic, or all of the above.
Same can be said about AMDF, ASDF and whatever similar modification you
can think about (e.g. YIN).
Only with AMDF you are looking for the first minimum instead of first
maximum.

Better yet, go to http://www.soundmathtech.com/pitch for ICASSP 2002
paper and demo,

and, also, read US Patent Application  20030088401 at
http://www.uspto.gov/patft/

This is a standard reference on pitch estimation nowadays.

Looking at the low-or band-passed time domain
autocorrelation (max) or AMDF *Average Mag Diff Function
(min) for pitch in speech has been part of the tool set for
vocoding since about 1960.  Read some old books guys!  It is
source coded in several of the vocoding standards that are
available free on the net.

Doing it in the spectral domain is pretty much not done for
several reasons- mostly because it doesn't work very well
for the quick transitions that occur all the time in speech.

-- 
Chip Wood

"Rune Allnor" <allnor@tele.ntnu.no> wrote in message
news:1126167238.026411.242970@g43g2000cwa.googlegroups.com...
> > Carlos Moreno wrote: > > Rune Allnor wrote: > > > > >>>Where did you find that? I am not aware of any simple
relation between
> > >>>the time-domain autocorrelation function and the
pitch of the signal.
in article 1126189903.897731.112850@o13g2000cwo.googlegroups.com, fizteh89
at dt@soundmathtech.com wrote on 09/08/2005 10:31:

> Well, well, well... > > Nothing unusual... comp.dsp "experts" discussing something they > don't have a clue about
Dmitry, you live in New Jersey. why don't you take a drive down US46 to Little Ferry. hang a left (north) at Liberty Street and then another left on Alsan Way. there's a big building there with a sign that says "Eventide" on it. ask them if you can check out any of their Harmonizers (they own the trademark on that name), post 1994 (DSP4000 and later). if you like how it sound, see if you can talk to the prez there (Richard Factor). ask him who did their pitch detection and pitch shifting algorithms.
> and giving wrong answers to a general public... > I was going to skip this discussion but just couldn't resist...
i'm glad you didn't. there is still much for us to discuss.
> Just where did you guys go to school?
since high school, University of North Dakota (same place where Nyquist got his BSEE) and Northwestern.
> Do you know anything other than DFT?
maybe a little.
> (I guess RBJ also knows ASDF).
maybe a little.
> > What the heck are you talking about? > Did it ever occur to you that there is no such thing as frequency > domain for a general type of (nonlinear) signals ?
signals are signals. *systems* might be linear vs. non-linear.
> It's only a useful abstraction that works well for linear systems and > signals theory. > Fundamental frequency is just the inverse of the fundamental, or pitch, > period, which characterizes the smallest distance between repeating > patterns in the time-domain waveform.
and AMDF, ASDF, and autocorrelation gives some information about that. and as you point out below, there's some relationship.
> Fundamental, or pitch, period (usually) corresponds to the first > maximum in the autocorrelation function, unless you window is too > short, or your signal is too complex (has formants), or it is > non-stationary or not quite periodic,
that's usually the bugger. if it's perfectly periodic, pitch detection is pretty easy with AMDF, ASDF, autocorrelation. but there are still problems with PDAs and human perception. what if a stong 200 Hz waveform has a weak (say down by 60 dB) 100 Hz waveform added to it? mathematically, it's 100 Hz, but it sounds like 200 Hz.
> or all of the above. > Same can be said about AMDF, ASDF and whatever similar modification you > can think about (e.g. YIN). > Only with AMDF you are looking for the first minimum instead of first > maximum.
or with ASDF. not a lot different.
> Better yet, go to http://www.soundmathtech.com/pitch for ICASSP 2002 > paper and demo,
i could not get the MATLAB demo to work on my Mac. i need source and i doubt you're willing to give that up.
> and, also, read US Patent Application 20030088401 at > http://www.uspto.gov/patft/
i have more trouble decoding the patent app than your ICASSP paper, but it was clear to me that the method outlined in your paper had, for a variety of lags, a 3 term AMDF (an exceedingly low number of difference magnitudes to sum), a decreasing non-linear function (namely u(a-x) where u(x) is the unit step function), and a histogram to count, for a particular lag, how many of these AMDF sums were below the parameter, "a". i said that was some kind of glorified AMDF method, you said it wasn't. i also said that the non-linear function actually destroyed information (since it is not invertable) which means multiple waveforms can map to identical results. specifically, that achille's heel can be exploited by a demented waveform to fool your algorithm.
> This is a standard reference on pitch estimation nowadays.
that's premature, IMO. but it's all in the eye of the beholder. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
in article 1126167238.026411.242970@g43g2000cwa.googlegroups.com, Rune
Allnor at allnor@tele.ntnu.no wrote on 09/08/2005 04:13:

> I just doubt whether that information can be extracted from > the time-domain representation of the autocorrelation function. > It would make more sense to me to look at it in frequency domain.
Rune, actually, at least in my experience, it's the contrary. it's in the time-domain where you get a handle on the period. other than, maybe for speed of calculating the autocorrelation, the frequency domain is pretty useless. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
Rune Allnor wrote:
> If you got the impression that I think the autocorrelation > function does not contain information about the pitch, then > I phrased my post poorly. Of course that information is kept > somewhere in there. > > I just doubt whether that information can be extracted from > the time-domain representation of the autocorrelation function. > It would make more sense to me to look at it in frequency domain.
To me, the term "pitch" refers to a phenomena of human perception. I'm no expert on the subject, but I've run across articles that seem to suggest that the ear-brain combination resolves pitch as much or more on time domain than on frequency domain information. The frequency bandpass filters might be very broad, but the nerve "firing" rates and/or modulations are more closely aligned to the relative phases and/or periodicities of the signal bands, and the brain somehow sorts out all this information. IMHO. YMMV. -- rhn A.T nicholson d.O.t C-o-M
in article 1126216435.080023.89540@g43g2000cwa.googlegroups.com,
rhnlogic@yahoo.com at rhnlogic@yahoo.com wrote on 09/08/2005 17:53:
 
> To me, the term "pitch" refers to a phenomena of human perception.
that is true but for a large class of musical sounds, mostly "tones", the "pitch" of the note or tone (measured in octaves) is the base 2 log of the fundamental frequency relative to a standard frequency. but not all musical sounds are these nice quasi-periodic tones. getting the pitch of, say, a recorded belch or fart might be more difficult for the DSP than for the human ear. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
Robert, my comment wasn't aimed at you personally - I am completely
OK with your comments, except, maybe, for a few misconceptions of yours
and a strange attachment to ASDF (just joking). But comments made by
some other people just went astray...

> i said that was some kind of glorified AMDF method, you said it wasn't. i > also said that the non-linear function actually destroyed information (since > it is not invertable) which means multiple waveforms can map to identical > results. specifically, that achille's heel can be exploited by a demented > waveform to fool your algorithm.
This is one of your misconceptions. For periodicity/pitch detection one needs to lose as much information unrelated to signal periodicity as possible. I am quoting from Rabiner & Schafer's "Digital Processing of Speech Signals" (4.8 sub-chapter): "One of the major limitations of the autocorrelation representation is that in a sense it retains too much of the information in the speech signal... As a result... autocorrelation function has many peaks... " Then they go on to describe a center-clipping technique, which was specifically proposed to lose information in speech signal - a noninvertible transformation.
> i could not get the MATLAB demo to work on my Mac. i need source and i > doubt you're willing to give that up.
Sorry, I am between Unix and Windoze, no Macs around... But didn't I post some Matlab source code on comp.dsp? It's the same code I gave to PTO in provisional application, so you can be sure it works, maybe just not as well as some people would prefer. Sorry again... not giving out commercial-quality code at this time: I think I already gave out too much ...