comp.dsp | Pitch detection - for a newbi| page 4

Reply by JoergW ●January 10, 20072007-01-10

Hello,



I'd like to bring this discussion back to the subject (if I may):



Having read many publications about pitch tracking I pretty much fell from
my chair when I read US patent No. 5,973,252. If I'm not mistaken, this
patent describes the Auto Tune algorithm by Antares Audio Tech.



     http://www.uspto.gov/patft/index.html



This paper is definitely worth reading! The Pitch Tracking Method here is
basically ACF but with a couple of very interesting specialties.



  1.. The ACR is computed with a nice trick, requires only two MACs per lag
(per Sample)
  2..  For every lag the accumulated energy is subtracted from the ACF
result


I haven't tried this method but I'd imagine that this must work much better
(more reliable) than any other time-domain solution.



Any thoughts on this one?

Joerg



(true achievements are those that bring more benefits than recognition)



"robert bristow-johnson" <rbj@audioimagination.com> wrote in message
news:1168233377.173310.253460@s80g2000cwa.googlegroups.com...
>
> fizteh89 wrote:
> > > I am not particularly enthusiastic about your solution.
> >
> > Well, you should be if you do speech processing...
>
> always the salesman, no, Dmitry?
>
> > > The most common method for the pitch computation is by normalized
> > > autocorrelation in the time domain. Reason: it is dead simple and good
> > > enough for most of applications.
> >
> > It is NOT good enough, and everybody knows it...
>
> depending on how autocorrelation is done, there are problems with
> attack transients and plosives that AMDF might not have to the same
> extent.  also any pitch detection algorithm suffers, to some degree,
> the bane of the "octave ambiguity".  you can have tones that *sound*
> like some frequency (like 262 Hz for middle-C), yet are mathematically
> more accurately described as a tone an octave (or more) lower (like 131
> Hz).  some of this is unsolvable analytically, but perceptual
> heuristics might be helpful.
>
> > (Otherwise why would they invent all sorts of artificial nonlinear
> > tricks like center-clipping ?)  But again, it depends on your goals.
>
> center-clipping is not always useful.  center-clipping destroys
> information (since it is not an invertable function).  the information
> lost could be used to differentiate one period length from another
> (usually this is revealed as the "octave problem" but not always).
>
> > >
> > > For my tasks, I prefer frequency domain perceptually optimized
methods.
> > >
> >
> > Frequency domain pitch detection cannot be used for pitch tracking in
> > highly-nonstationary signals such as speech, and everybody knows it..
> >
> > > Modesty is the virtue of mediocrits, isn't it?
> >
> > Well, in order to critisize and downplay other people's achievements
> > you'd better first show some real contribution of your own. Do you have
> > something to show, Vladimir ?
>
> i do.  and probably so does Vladimir.
>
> r b-j
>

Reply by robert bristow-johnson ●January 10, 20072007-01-10

fizteh89 wrote:
> Vladimir Vassilevsky wrote:
> > 1. What is new in your method compared to AMDF (which was used in the
> > LPC10 vocoder designed in 80x) ?
>
> Problems with elementary math ?

you don't see the hole you're digging for yourself?  some people don't
need others to like them (it's not me, but i can sorta understand
that), but most people *do* desire others' respect.  you're not getting
it that way, Dmitry.

> Here is "periodicity histogram":
>
>   hist(k)=sum H(r - |x(i) - x(i+k)|)    (where H is Heaviside, or unit
> step, function)

a function of  |x(i) - x(i+k)| .

> Here is AMDF function:
>
>   amdf(k)=sum |x(i) - x(i+k)|

another function of  |x(i) - x(i+k)| .

and as long as we're not being to picky about the limits on the
summation...

> and here is autocorrelation:
>
>   corr(k) = sum x(i) * x(i+k)

   corr(k) = sum x(i) * x(i+k)
           = 1/2 sum |x(i)|^2   -  1/2 sum |x(i) - x(i+k)|^2

still another function of  |x(i) - x(i+k)| .

> Why don't you start calling AMDF function an autocorrelation and vice
> versa, cause they sorta look the same to you ?

dunno what Vladimir is saying about it, but there is a common theme to
all of these algorithms: find out how similar x(i) is to x(i+k) for a
given value (lag) of k.  for values of k where x(i) and x(i+k) are very
similar waveforms, |x(i) - x(i+k)| is low, amdf(k) is low, corr(k) is
high, and hist(k) is high.

> And BTW, you CANNOT reduce "periodicity histogram" to AMDF. Period.

your "periodicity histogram", hist(k), is a function of this difference
signal |x(i) - x(i+k)| just as amdf(k) or corr(k) is.  it's true, you
cannot "reduce" hist(k) to amdf(k) because hist(k) is applying a
non-linear function to |x(i) - x(i+k)| before summing (this non-linear
function, H(.) also destroys information because it is not one-to-one
or invertible and also requires a threshold parameter, r, that somehow
has to be meaningfully determined).  but the principles of all three
are the same:  postulate a lag, k, and see how similar the lagged
waveform is to the original by subtracting the lagged waveform from the
original.  negative errors count the same as positive errors.  add up
the errors (or some function of the errors) and the lag that gives you
the least sum of error (as reflected through such function before
summing) is a good guess for the period since |x(i) - x(i+k)| would be
small if k is a period.  all three algs must still worry about the
"octave problem" since a lag of 2 periods is expected to be as good as
a lag of 1 period and might be mathematically better (perhaps because
of inaudible sub-harmonics so the "2 periods" are really the "true"
period), even though the waveform still *sounds* like it should be just
the 1 period.  that's where different pitch detection algoritms have
their salient properties or features.

r b-j

Reply by robert bristow-johnson ●January 10, 20072007-01-10

JoergW wrote:
>
> Having read many publications about pitch tracking I pretty much fell from
> my chair when I read US patent No. 5,973,252. If I'm not mistaken, this
> patent describes the Auto Tune algorithm by Antares Audio Tech.

it probably is.  the middle initial "A." for the inventor is probably
for "Andy".  i never knew this guy's first name was "Harold".  also the
"Auburn Audio Technologies" must be the original parent corp to Antares
(hadn't previously heard of that either).

> This paper is definitely worth reading! The Pitch Tracking Method here is
> basically ACF but with a couple of very interesting specialties.
>
>
>
>   1.. The ACR is computed with a nice trick, requires only two MACs per lag
> (per Sample)

it's not one MAC?

>   2..  For every lag the accumulated energy is subtracted from the ACF
> result

he patented that???  prior art exists, at least using this same thing
for AMDF.

this is similar to producing these cross-product terms and running them
into a moving sum (or moving average) filter.  the old terms fall of
the edge of the delay line and you subtract them out of the sum and add
in the new term that pops into the delay line.  there has to be a
separate delay line for each lag's cross-product.  this is effectively
rectangular windowing the data in the sum.

the problem of rectangularly windowing, with the discontinuity, applied
to the autocorrelation sum is worse than if it is applied to the AMDF
sum.  if your period has an integer multiple that is slightly longer
than the length of the summation (that delay line), you could have an
autocorrelation peak that is at a slightly larger lag than where the
period really is (and choose that peak location as your period).  even
for highly periodic input.  you would not get that using AMDF for
periodic input.

> Any thoughts on this one?

them's are mine.

> (true achievements are those that bring more benefits than recognition)

unless your name is George W. Bush.

r b-j

Reply by JoergW ●January 10, 20072007-01-10

Hi Robert,

> it probably is.  the middle initial "A." for the inventor is probably
> for "Andy".  i never knew this guy's first name was "Harold".  also the
> "Auburn Audio Technologies" must be the original parent corp to Antares
> (hadn't previously heard of that either).

Yeah, I was wondering for quite some time how the ATR-1/Auto Tune ticks and
I started clicking about the patent when I read some of the 'company
history' on the Antares website.

> he patented that???  prior art exists, at least using this same thing
> for AMDF.

I agree for the ACF computation trick but I've never seen the E(L) >= 2H(L)
criteria before and I read a huge number of papers and patents (including
from you and the IVL gang). It's a pretty amazing idea (instead of looking
for Max/Minima in the ACF directly...).

> it's not one MAC?

Well, one MAC to accumulate the new X(n)*X(n+L) and another one to subtract
the old one from the delayline, right?


> this is similar to producing these cross-product terms and running them
> into a moving sum (or moving average) filter.  the old terms fall of
> the edge of the delay line and you subtract them out of the sum and add
> in the new term that pops into the delay line.  there has to be a
> separate delay line for each lag's cross-product.  this is effectively
> rectangular windowing the data in the sum.

Yes, I actually used that 'rectangular window trick' before to detect energy
peaks for transient detection.

> the problem of rectangularly windowing, with the discontinuity, applied
> to the autocorrelation sum is worse than if it is applied to the AMDF
> sum.  if your period has an integer multiple that is slightly longer
> than the length of the summation (that delay line), you could have an
> autocorrelation peak that is at a slightly larger lag than where the
> period really is (and choose that peak location as your period).  even
> for highly periodic input.  you would not get that using AMDF for
> periodic input.

Good to know. It's also interesting that Hildebrand sums up the double
period for the E(L) function. I'm sure there must be a god reason for that.
Of course also for H(L) one could accumulate more than one period...

BTW: That Pitch Tracker must be amazingly good if that pitch shifter of his
is really a WSSNONANK (Wavelength Synchronized Splicing, No Overlap, NO Add,
No kidding :-).

If he really does no overlap/X-fade at all for the splicing, his pitch
tracker must be REALLY good. I fed even polyphonic signals into Auto Tune
and didn't hear any artifacts.

Joerg

Reply by EagerToLearn ●January 12, 20072007-01-12

Has anyone seen this guys stuff:

Eartrainer: A Cross-Platform Eartraining Program for Musicians

http://zoo.cs.yale.edu/classes/cs490/02-03b/james.athey/

Previous 2 34Next

Pitch detection - for a newbi

Sign in

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group