Forums

Pitch detection - for a newbi

Started by EagerToLearn January 5, 2007
Hello,



I'd like to bring this discussion back to the subject (if I may):



Having read many publications about pitch tracking I pretty much fell from
my chair when I read US patent No. 5,973,252. If I'm not mistaken, this
patent describes the Auto Tune algorithm by Antares Audio Tech.



     http://www.uspto.gov/patft/index.html



This paper is definitely worth reading! The Pitch Tracking Method here is
basically ACF but with a couple of very interesting specialties.



  1.. The ACR is computed with a nice trick, requires only two MACs per lag
(per Sample)
  2..  For every lag the accumulated energy is subtracted from the ACF
result


I haven't tried this method but I'd imagine that this must work much better
(more reliable) than any other time-domain solution.



Any thoughts on this one?

Joerg



(true achievements are those that bring more benefits than recognition)



"robert bristow-johnson" <rbj@audioimagination.com> wrote in message
news:1168233377.173310.253460@s80g2000cwa.googlegroups.com...
> > fizteh89 wrote: > > > I am not particularly enthusiastic about your solution. > > > > Well, you should be if you do speech processing... > > always the salesman, no, Dmitry? > > > > The most common method for the pitch computation is by normalized > > > autocorrelation in the time domain. Reason: it is dead simple and good > > > enough for most of applications. > > > > It is NOT good enough, and everybody knows it... > > depending on how autocorrelation is done, there are problems with > attack transients and plosives that AMDF might not have to the same > extent. also any pitch detection algorithm suffers, to some degree, > the bane of the "octave ambiguity". you can have tones that *sound* > like some frequency (like 262 Hz for middle-C), yet are mathematically > more accurately described as a tone an octave (or more) lower (like 131 > Hz). some of this is unsolvable analytically, but perceptual > heuristics might be helpful. > > > (Otherwise why would they invent all sorts of artificial nonlinear > > tricks like center-clipping ?) But again, it depends on your goals. > > center-clipping is not always useful. center-clipping destroys > information (since it is not an invertable function). the information > lost could be used to differentiate one period length from another > (usually this is revealed as the "octave problem" but not always). > > > > > > > For my tasks, I prefer frequency domain perceptually optimized
methods.
> > > > > > > Frequency domain pitch detection cannot be used for pitch tracking in > > highly-nonstationary signals such as speech, and everybody knows it.. > > > > > Modesty is the virtue of mediocrits, isn't it? > > > > Well, in order to critisize and downplay other people's achievements > > you'd better first show some real contribution of your own. Do you have > > something to show, Vladimir ? > > i do. and probably so does Vladimir. > > r b-j >
fizteh89 wrote:
> Vladimir Vassilevsky wrote: > > 1. What is new in your method compared to AMDF (which was used in the > > LPC10 vocoder designed in 80x) ? > > Problems with elementary math ?
you don't see the hole you're digging for yourself? some people don't need others to like them (it's not me, but i can sorta understand that), but most people *do* desire others' respect. you're not getting it that way, Dmitry.
> Here is "periodicity histogram": > > hist(k)=sum H(r - |x(i) - x(i+k)|) (where H is Heaviside, or unit > step, function)
a function of |x(i) - x(i+k)| .
> Here is AMDF function: > > amdf(k)=sum |x(i) - x(i+k)|
another function of |x(i) - x(i+k)| . and as long as we're not being to picky about the limits on the summation...
> and here is autocorrelation: > > corr(k) = sum x(i) * x(i+k)
corr(k) = sum x(i) * x(i+k) = 1/2 sum |x(i)|^2 - 1/2 sum |x(i) - x(i+k)|^2 still another function of |x(i) - x(i+k)| .
> Why don't you start calling AMDF function an autocorrelation and vice > versa, cause they sorta look the same to you ?
dunno what Vladimir is saying about it, but there is a common theme to all of these algorithms: find out how similar x(i) is to x(i+k) for a given value (lag) of k. for values of k where x(i) and x(i+k) are very similar waveforms, |x(i) - x(i+k)| is low, amdf(k) is low, corr(k) is high, and hist(k) is high.
> And BTW, you CANNOT reduce "periodicity histogram" to AMDF. Period.
your "periodicity histogram", hist(k), is a function of this difference signal |x(i) - x(i+k)| just as amdf(k) or corr(k) is. it's true, you cannot "reduce" hist(k) to amdf(k) because hist(k) is applying a non-linear function to |x(i) - x(i+k)| before summing (this non-linear function, H(.) also destroys information because it is not one-to-one or invertible and also requires a threshold parameter, r, that somehow has to be meaningfully determined). but the principles of all three are the same: postulate a lag, k, and see how similar the lagged waveform is to the original by subtracting the lagged waveform from the original. negative errors count the same as positive errors. add up the errors (or some function of the errors) and the lag that gives you the least sum of error (as reflected through such function before summing) is a good guess for the period since |x(i) - x(i+k)| would be small if k is a period. all three algs must still worry about the "octave problem" since a lag of 2 periods is expected to be as good as a lag of 1 period and might be mathematically better (perhaps because of inaudible sub-harmonics so the "2 periods" are really the "true" period), even though the waveform still *sounds* like it should be just the 1 period. that's where different pitch detection algoritms have their salient properties or features. r b-j
JoergW wrote:
> > Having read many publications about pitch tracking I pretty much fell from > my chair when I read US patent No. 5,973,252. If I'm not mistaken, this > patent describes the Auto Tune algorithm by Antares Audio Tech.
it probably is. the middle initial "A." for the inventor is probably for "Andy". i never knew this guy's first name was "Harold". also the "Auburn Audio Technologies" must be the original parent corp to Antares (hadn't previously heard of that either).
> This paper is definitely worth reading! The Pitch Tracking Method here is > basically ACF but with a couple of very interesting specialties. > > > > 1.. The ACR is computed with a nice trick, requires only two MACs per lag > (per Sample)
it's not one MAC?
> 2.. For every lag the accumulated energy is subtracted from the ACF > result
he patented that??? prior art exists, at least using this same thing for AMDF. this is similar to producing these cross-product terms and running them into a moving sum (or moving average) filter. the old terms fall of the edge of the delay line and you subtract them out of the sum and add in the new term that pops into the delay line. there has to be a separate delay line for each lag's cross-product. this is effectively rectangular windowing the data in the sum. the problem of rectangularly windowing, with the discontinuity, applied to the autocorrelation sum is worse than if it is applied to the AMDF sum. if your period has an integer multiple that is slightly longer than the length of the summation (that delay line), you could have an autocorrelation peak that is at a slightly larger lag than where the period really is (and choose that peak location as your period). even for highly periodic input. you would not get that using AMDF for periodic input.
> Any thoughts on this one?
them's are mine.
> (true achievements are those that bring more benefits than recognition)
unless your name is George W. Bush. r b-j
Hi Robert,

> it probably is. the middle initial "A." for the inventor is probably > for "Andy". i never knew this guy's first name was "Harold". also the > "Auburn Audio Technologies" must be the original parent corp to Antares > (hadn't previously heard of that either).
Yeah, I was wondering for quite some time how the ATR-1/Auto Tune ticks and I started clicking about the patent when I read some of the 'company history' on the Antares website.
> he patented that??? prior art exists, at least using this same thing > for AMDF.
I agree for the ACF computation trick but I've never seen the E(L) >= 2H(L) criteria before and I read a huge number of papers and patents (including from you and the IVL gang). It's a pretty amazing idea (instead of looking for Max/Minima in the ACF directly...).
> it's not one MAC?
Well, one MAC to accumulate the new X(n)*X(n+L) and another one to subtract the old one from the delayline, right?
> this is similar to producing these cross-product terms and running them > into a moving sum (or moving average) filter. the old terms fall of > the edge of the delay line and you subtract them out of the sum and add > in the new term that pops into the delay line. there has to be a > separate delay line for each lag's cross-product. this is effectively > rectangular windowing the data in the sum.
Yes, I actually used that 'rectangular window trick' before to detect energy peaks for transient detection.
> the problem of rectangularly windowing, with the discontinuity, applied > to the autocorrelation sum is worse than if it is applied to the AMDF > sum. if your period has an integer multiple that is slightly longer > than the length of the summation (that delay line), you could have an > autocorrelation peak that is at a slightly larger lag than where the > period really is (and choose that peak location as your period). even > for highly periodic input. you would not get that using AMDF for > periodic input.
Good to know. It's also interesting that Hildebrand sums up the double period for the E(L) function. I'm sure there must be a god reason for that. Of course also for H(L) one could accumulate more than one period... BTW: That Pitch Tracker must be amazingly good if that pitch shifter of his is really a WSSNONANK (Wavelength Synchronized Splicing, No Overlap, NO Add, No kidding :-). If he really does no overlap/X-fade at all for the splicing, his pitch tracker must be REALLY good. I fed even polyphonic signals into Auto Tune and didn't hear any artifacts. Joerg
Has anyone seen this guys stuff:

Eartrainer: A Cross-Platform Eartraining Program for Musicians

http://zoo.cs.yale.edu/classes/cs490/02-03b/james.athey/