comp.dsp | Pitch Estimation using Autocorrelation| page 2

Reply by Carlos Moreno ●September 7, 20052005-09-07

Rune Allnor wrote:

>>>Where did you find that? I am not aware of any simple relation between
>>>the time-domain autocorrelation function and the pitch of the signal.
>>
>>If the signal does have a sinusoidal component at period T, then when
>>correlating with the version of the signal shifted by T, there will be
>>a peak, corresponding to 1/T and all of the multiples (the harmonics).
>>In fact, when shifted by T/2, there will be a peak with negative value,
>>provided that there are no components of lower frequency.
> 
> 
> OK, I am sure you are right, provided the signal consists of a single
> sinusoidal. If there are more sinusoidals, or noise present...

Say that the signal X(t) consists of a sinusioid, s(t) plus the rest;
let's call the rest y(t).  X(t-T) consists of the same sinusoid s(t)
(since s(t-T) = s(t)), plus y(t-T).

But since y(t-T) is a linear combination (in an infinite-dimensional
base) of sinusoidal components WITHOUT the frequency component of s(t),
then y(t-T) and s(t) are uncorrelated.

So, the correlation between X(t) and X(t-T) is:

    oo
   /
   |
   |  X(t) * X(t-T) dt  =  {s(t) + y(t)} * {s(t) + y(t-T)}  =
   |
  /
  -oo

                        =  s^2(t) + s(t)*{y(t) + y(t-T)} + y(t)*y(t-T)

The first term is whatever value it is (greater than zero).  The
second term is zero, since s(t) and y (at whatever offset) are
uncorrelated.

And I think my mistake/oversight  (which I see that Jerry partly
clarified it) was to assume that because y(t) has no component of
the same frequency as s(t), corresponding to period T, the integral
of y(t)*y(t-T) would be zero, which is not the case  (I mean, even
putting aside the harmonics -- let's assume that y(t) has no
frequencies that are a multiple of s(t)'s frequency -- even then,
the integral is *not* zero).

Now, if y(t) is just noise, then of course, all the terms except
s^2(t) become zero when taking their integral from -oo to oo.

If there are no frequencies close to s(t)'s frequency, it seems
intuitive that the integral will be considerably less than the
integral of s^2(t) -- since s^2(t) is non-negative, coming from
the fact that it is the product of two perfectly aligned sinusoids.
If y(t) has no frequencies that are close, then the product will
always be frequencies that are far from aligned, and the integral
will tend to have a very low value.

As Jerry pointed out, also, the pitch is typically determined/
associated to the lowest frequency that is *predominant* -- under
that assumption, all the other terms in the correlation integral
are much lower (possibly negligible) than the term from s^2(t).

Again, this simply justifies (empirically/intuitively, which I
know, is dangerous  :-)) that the correlation should disclose
information about the pitch -- but that doesn't mean that it is
a good idea or that the autocorrelation method is well-suited
for the task.  Computing the DFT (via FFT) sounds definitely
like a better idea.

Carlos
--

Reply by rhnl...@yahoo.com ●September 8, 20052005-09-08

olivers wrote:
> From what I understand the first minimum of an autocorrelation function
> (say 1200 samples with a lag between 0 and 600) will give me the sample
> value which can be directly mapped to frequency and thus pitch.
>
> I have tried to extract this minima from my autocorrelation result with
> varied results. I tried using a C (third fret 5th string) on my guitar and
> got a sample value for the first minimum which varied between 68 and 75.
> From my conversion chart
> (http://grace.evergreen.edu/~arunc/intro_doc/node12.htm#SECTION00092000000000000000)
>
> this corresponds to a note which varies between d4 and e4 which is clearly
> incorrect.
>
> I thought maybe that I am doing something wrong in the extraction of the
> first minimum. Another article I have just read indicates that the minima
> represents half the period where the waveform is out of phase thus, the
> maxima indicates the period of the waveform and directly relates to the
> pitch. At a guess my value of 75 (half the period) which translates to 150
> is still wrong.
>
> I am identifying the first minimum by searching for the first change in
> sign indicating the first zero crossing. Is this how its done?

If the first minima is negative, it might be between the first two zero
crossings.  Or you might wan to look for a sign change in the first
derivative.


IMHO. YMMV.
-- 
rhn A.T nicholson D.o.T c-O-m

Reply by Rune Allnor ●September 8, 20052005-09-08

Carlos Moreno wrote:
> Rune Allnor wrote:
>
> >>>Where did you find that? I am not aware of any simple relation between
> >>>the time-domain autocorrelation function and the pitch of the signal.
> >>
> >>If the signal does have a sinusoidal component at period T, then when
> >>correlating with the version of the signal shifted by T, there will be
> >>a peak, corresponding to 1/T and all of the multiples (the harmonics).
> >>In fact, when shifted by T/2, there will be a peak with negative value,
> >>provided that there are no components of lower frequency.
> >
> >
> > OK, I am sure you are right, provided the signal consists of a single
> > sinusoidal. If there are more sinusoidals, or noise present...

[-- snip --]

> Again, this simply justifies (empirically/intuitively, which I
> know, is dangerous  :-)) that the correlation should disclose
> information about the pitch -- but that doesn't mean that it is
> a good idea or that the autocorrelation method is well-suited
> for the task.  Computing the DFT (via FFT) sounds definitely
> like a better idea.

If you got the impression that I think the autocorrelation
function does not contain information about the pitch, then
I phrased my post poorly. Of course that information is kept
somewhere in there.

I just doubt whether that information can be extracted from
the time-domain representation of the autocorrelation function.
It would make more sense to me to look at it in frequency domain.

Rune

Reply by fizteh89 ●September 8, 20052005-09-08

Well, well, well...

Nothing unusual... comp.dsp "experts" discussing something they
don't have a clue about and giving wrong answers to a general
public...
I was going to skip this discussion but just couldn't resist...

Just where did you guys go to school?
Do you know anything other than DFT?
(I guess RBJ also knows ASDF).

What the heck are you talking about?
Did it ever occur to you that there is no such thing as frequency
domain for a general type of (nonlinear) signals ?
It's only a useful abstraction that works well for linear systems and
signals theory.
Fundamental frequency is just the inverse of the fundamental, or pitch,
period, which characterizes the smallest distance between repeating
patterns in the time-domain waveform.

Fundamental, or pitch, period (usually) corresponds to the first
maximum in the autocorrelation function, unless you window is too
short, or your signal is too complex (has formants), or it is
non-stationary or not quite periodic, or all of the above.
Same can be said about AMDF, ASDF and whatever similar modification you
can think about (e.g. YIN).
Only with AMDF you are looking for the first minimum instead of first
maximum.

Better yet, go to http://www.soundmathtech.com/pitch for ICASSP 2002
paper and demo,

and, also, read US Patent Application  20030088401 at
http://www.uspto.gov/patft/

This is a standard reference on pitch estimation nowadays.

Reply by Chip Wood ●September 8, 20052005-09-08

Looking at the low-or band-passed time domain
autocorrelation (max) or AMDF *Average Mag Diff Function
(min) for pitch in speech has been part of the tool set for
vocoding since about 1960.  Read some old books guys!  It is
source coded in several of the vocoding standards that are
available free on the net.

Doing it in the spectral domain is pretty much not done for
several reasons- mostly because it doesn't work very well
for the quick transitions that occur all the time in speech.

-- 
Chip Wood

"Rune Allnor" <allnor@tele.ntnu.no> wrote in message
news:1126167238.026411.242970@g43g2000cwa.googlegroups.com...
>
> Carlos Moreno wrote:
> > Rune Allnor wrote:
> >
> > >>>Where did you find that? I am not aware of any simple
relation between
> > >>>the time-domain autocorrelation function and the
pitch of the signal.

Reply by robert bristow-johnson ●September 8, 20052005-09-08

in article 1126189903.897731.112850@o13g2000cwo.googlegroups.com, fizteh89
at dt@soundmathtech.com wrote on 09/08/2005 10:31:

> Well, well, well...
> 
> Nothing unusual... comp.dsp "experts" discussing something they
> don't have a clue about

Dmitry, you live in New Jersey.  why don't you take a drive down US46 to
Little Ferry.  hang a left (north) at Liberty Street and then another left
on Alsan Way.  there's a big building there with a sign that says "Eventide"
on it.  ask them if you can check out any of their Harmonizers (they own the
trademark on that name), post 1994 (DSP4000 and later).  if you like how it
sound, see if you can talk to the prez there (Richard Factor).  ask him who
did their pitch detection and pitch shifting algorithms.

> and giving wrong answers to a general public...
> I was going to skip this discussion but just couldn't resist...

i'm glad you didn't.  there is still much for us to discuss.

> Just where did you guys go to school?

since high school, University of North Dakota (same place where Nyquist got
his BSEE) and Northwestern.

> Do you know anything other than DFT?

maybe a little.

> (I guess RBJ also knows ASDF).

maybe a little.

> 
> What the heck are you talking about?
> Did it ever occur to you that there is no such thing as frequency
> domain for a general type of (nonlinear) signals ?

signals are signals.  *systems* might be linear vs. non-linear.

> It's only a useful abstraction that works well for linear systems and
> signals theory.
> Fundamental frequency is just the inverse of the fundamental, or pitch,
> period, which characterizes the smallest distance between repeating
> patterns in the time-domain waveform.

and AMDF, ASDF, and autocorrelation gives some information about that.  and
as you point out below, there's some relationship.

> Fundamental, or pitch, period (usually) corresponds to the first
> maximum in the autocorrelation function, unless you window is too
> short, or your signal is too complex (has formants), or it is
> non-stationary or not quite periodic,

that's usually the bugger.  if it's perfectly periodic, pitch detection is
pretty easy with AMDF, ASDF, autocorrelation.  but there are still problems
with PDAs and human perception.  what if a stong 200 Hz waveform has a weak
(say down by 60 dB) 100 Hz waveform added to it?  mathematically, it's 100
Hz, but it sounds like 200 Hz.

> or all of the above.
> Same can be said about AMDF, ASDF and whatever similar modification you
> can think about (e.g. YIN).
> Only with AMDF you are looking for the first minimum instead of first
> maximum.

or with ASDF.  not a lot different.

> Better yet, go to http://www.soundmathtech.com/pitch for ICASSP 2002
> paper and demo,

i could not get the MATLAB demo to work on my Mac.  i need source and i
doubt you're willing to give that up.

> and, also, read US Patent Application  20030088401 at
> http://www.uspto.gov/patft/

i have more trouble decoding the patent app than your ICASSP paper, but it
was clear to me that the method outlined in your paper had, for a variety of
lags, a 3 term AMDF (an exceedingly low number of difference magnitudes to
sum), a decreasing non-linear function (namely u(a-x) where u(x) is the unit
step function), and a histogram to count, for a particular lag, how many of
these AMDF sums were below the parameter, "a".

i said that was some kind of glorified AMDF method, you said it wasn't.  i
also said that the non-linear function actually destroyed information (since
it is not invertable) which means multiple waveforms can map to identical
results.  specifically, that achille's heel can be exploited by a demented
waveform to fool your algorithm.

> This is a standard reference on pitch estimation nowadays.

that's premature, IMO.  but it's all in the eye of the beholder.

-- 

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."

Reply by robert bristow-johnson ●September 8, 20052005-09-08

in article 1126167238.026411.242970@g43g2000cwa.googlegroups.com, Rune
Allnor at allnor@tele.ntnu.no wrote on 09/08/2005 04:13:

> I just doubt whether that information can be extracted from
> the time-domain representation of the autocorrelation function.
> It would make more sense to me to look at it in frequency domain.

Rune,

actually, at least in my experience, it's the contrary.  it's in the
time-domain where you get a handle on the period.  other than, maybe for
speed of calculating the autocorrelation, the frequency domain is pretty
useless.

-- 

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."

Reply by rhnl...@yahoo.com ●September 8, 20052005-09-08

Rune Allnor wrote:
> If you got the impression that I think the autocorrelation
> function does not contain information about the pitch, then
> I phrased my post poorly. Of course that information is kept
> somewhere in there.
>
> I just doubt whether that information can be extracted from
> the time-domain representation of the autocorrelation function.
> It would make more sense to me to look at it in frequency domain.

To me, the term "pitch" refers to a phenomena of human perception.

I'm no expert on the subject, but I've run across articles that
seem to suggest that the ear-brain combination resolves pitch as
much or more on time domain than on frequency domain information.
The frequency bandpass filters might be very broad, but the nerve
"firing" rates and/or modulations are more closely aligned to
the relative phases and/or periodicities of the signal bands, and
the brain somehow sorts out all this information.

IMHO. YMMV.
-- 
rhn A.T nicholson d.O.t C-o-M

Reply by robert bristow-johnson ●September 8, 20052005-09-08

in article 1126216435.080023.89540@g43g2000cwa.googlegroups.com,
rhnlogic@yahoo.com at rhnlogic@yahoo.com wrote on 09/08/2005 17:53:
 
> To me, the term "pitch" refers to a phenomena of human perception.

that is true but for a large class of musical sounds, mostly "tones", the
"pitch" of the note or tone (measured in octaves) is the base 2 log of the
fundamental frequency relative to a standard frequency.

but not all musical sounds are these nice quasi-periodic tones.  getting the
pitch of, say, a recorded belch or fart might be more difficult for the DSP
than for the human ear.

-- 

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."

Reply by fizteh89 ●September 9, 20052005-09-09

Robert, my comment wasn't aimed at you personally - I am completely
OK with your comments, except, maybe, for a few misconceptions of yours
and a strange attachment to ASDF (just joking). But comments made by
some other people just went astray...

> i said that was some kind of glorified AMDF method, you said it wasn't.  i
> also said that the non-linear function actually destroyed information (since
> it is not invertable) which means multiple waveforms can map to identical
> results.  specifically, that achille's heel can be exploited by a demented
> waveform to fool your algorithm.

This is one of your misconceptions. For periodicity/pitch detection one
needs to lose as much information unrelated to signal periodicity as
possible.
I am quoting from Rabiner & Schafer's "Digital Processing of Speech
Signals" (4.8 sub-chapter):
"One of the major limitations of the autocorrelation representation
is that in a sense it retains too much of the information in the speech
signal... As a result... autocorrelation function has many peaks... "
Then they go on to describe a center-clipping technique, which was
specifically proposed to lose information in speech signal - a
noninvertible transformation.

> i could not get the MATLAB demo to work on my Mac.  i need source and i
> doubt you're willing to give that up.

Sorry, I am between Unix and Windoze, no Macs around...
But didn't I post some Matlab source code on comp.dsp?  It's the
same code I gave to PTO in provisional application, so you can be sure
it works, maybe just not as well as some people would prefer. Sorry
again... not giving out commercial-quality code at this time: I think I
already gave out too much ...