DSPRelated.com
Forums

Pitch Estimation using Autocorrelation

Started by olivers September 7, 2005
fizteh89 wrote:
> Well, well, well... > > Nothing unusual... comp.dsp "experts" discussing something they > don't have a clue about and giving wrong answers to a general > public... > I was going to skip this discussion but just couldn't resist... > > Just where did you guys go to school? > Do you know anything other than DFT? > (I guess RBJ also knows ASDF). > > What the heck are you talking about?
Dear Dmitry! Your nickname "fiztech89" gives me an idea that you have graduated school and university in Russia. But this fact doesn't give you any advantage to discuss in the incorrect manner. The people taking part at this discussion are staying at their points of view. And it doesn't matter where each of them has graduated school or university. My impression of Moscow Institute of Physics and Technology (Fiztech) is: it's too much theoretical and it's pretty far from real engineering. In my opinion the goal of engineering is solving problems. However it's not quite clear for me what kind of problem does the person initiated this discussion wants to solve. I can just suppose some. 1) That person needs a music transcription tool for his or her music exercises. 2) He or she wants to develop his (her) music transcription software. 3) He or she needs to develop and to test the period estimator for some other purpose than music transcription. It can be student work; or it can be medical or industrial application etc. In regard to music transcription a note of music is not necessary to be related to pitch. At http://www.mrc-cbu.cam.ac.uk/cnbh/web2005/teaching/sounds_movies/Pitchchroma&height.htm you may find an example of playing melody with fixed pitch. Although it intuitively seems everyone talks about the same thing but really that "thing" depends on the point of view. "Thing" is pitch. Pitch as someone wrote is what we hear. What we hear, Dmitry? Everyone at this discussion is correct from his point of view. Those points of view are different. Sorry Dmitry if you don't understand it. Truly Yours, Vladimir Malakhov.
fizteh89 wrote:
> Well, well, well... > > Nothing unusual... comp.dsp "experts" discussing something they > don't have a clue about and giving wrong answers to a general > public... > I was going to skip this discussion but just couldn't resist... > > Just where did you guys go to school? > Do you know anything other than DFT? > (I guess RBJ also knows ASDF). > > What the heck are you talking about?
Dear Dmitry! Your nickname "fiztech89" gives me an idea that you have graduated school and university in Russia. But this fact doesn't give you any advantage to discuss in the incorrect manner. The people taking part at this discussion are staying at their points of view. And it doesn't matter where each of them has graduated school or university. My impression of Moscow Institute of Physics and Technology (Fiztech) is: it's too much theoretical and it's pretty far from real engineering. In my opinion the goal of engineering is solving problems. However it's not quite clear for me what kind of problem does the person initiated this discussion wants to solve. I can just suppose some. 1) That person needs a music transcription tool for his or her music exercises. 2) He or she wants to develop his (her) music transcription software. 3) He or she needs to develop and to test the period estimator for some other purpose than music transcription. It can be student work; or it can be medical or industrial application etc. In regard to music transcription a note of music is not necessary to be related to pitch. At http://www.mrc-cbu.cam.ac.uk/cnbh/web2005/teaching/sounds_movies/Pitchchroma&height.htm you may find an example of playing melody with fixed pitch. Although it intuitively seems everyone talks about the same thing but really that "thing" depends on the point of view. "Thing" is pitch. Pitch as someone wrote is what we hear. What we hear, Dmitry? Everyone at this discussion is correct from his point of view. Those points of view are different. Sorry Dmitry if you don't understand it. Truly Yours, Vladimir Malakhov.
hi Dmitry,

in article 1126275173.443343.84600@g47g2000cwa.googlegroups.com, fizteh89 at
dt@soundmathtech.com wrote on 09/09/2005 10:12:

> Robert, my comment wasn't aimed at you personally - I am completely > OK with your comments, except, maybe, for a few misconceptions of yours > and a strange attachment to ASDF (just joking).
on today's DSPs, it's cheaper than AMDF and works approximately the same way. 25 years ago, when it took 50+ machine cycles to do a multiplication, you might say i have a strange attachment to AMDF because, at that time, abs() was cheaper than squaring.
> But comments made by some other people just went astray... > >> i said that was some kind of glorified AMDF method, you said it wasn't.
i reviewed the ICASSP paper again and, correcting the notation (except i refuse to call the unit step function "H(x)"), you are calculating m-1 f(x, k) = SUM{ u( r - sqrt( SUM{ (x[i+j*d] - x[i+k+j*d])^2 } ) ) } i j=0 which is equivalent to m-1 f(x, k) = SUM{ u( r^2 - SUM{ (x[i+j*d] - x[i+k+j*d])^2 } ) } i j=0 where u() is the unit step function. you suggest d=12, m=3, and r = 0.15 relative to the peak |x|. essentially you are counting the number of occurrences that m-1 SUM{ (x[i+j*d] - x[i+k+j*d])^2 } < r^2 j=0 for a particular value of k. the more the number of occurrences that the sum of squares of differences is less than r^2, the better the fit. this is a form of the ASDF. (sorry for saying it was a glorified AMDF, but my position is that the ASDF is a sorta modified AMDF and also that autocorrelation is equivalent to ASDF, but flipped upside-down.)
>> i also said that the non-linear function actually destroyed information >> (since it is not invertable) which means multiple waveforms can map to >> identical results. specifically, that achille's heel can be exploited >> by a demented waveform to fool your algorithm. > > This is one of your misconceptions. For periodicity/pitch detection one > needs to lose as much information unrelated to signal periodicity as > possible.
it's no misconception of mine. it is in the very definition of the waveform x(t) or x[i] that the property of periodicity is derived in the first place and the parameter of the period is estimated. changing that waveform irreversibly (using a non-invertable operator) throws away information. that information could be the very information that differentiates two waveforms that are an octave apart in pitch, but appear to be the same after application of the non-invertable operator. e.g. :
> I am quoting from Rabiner & Schafer's "Digital Processing of Speech > Signals" (4.8 sub-chapter): > "One of the major limitations of the autocorrelation representation > is that in a sense it retains too much of the information in the speech > signal... As a result... autocorrelation function has many peaks... " > Then they go on to describe a center-clipping technique, which was > specifically proposed to lose information in speech signal - a > noninvertible transformation.
it's a good example and one that makes my point. the center-clipping can lose information that is needed to differentiate two different waveforms that are one octave apart, yet appear identical after the center-clipping. imagine a sine wave at 500 Hz. waveform repeats every 2 ms. even after zero-clipping the waveform repeats every 2 ms. so any decent PDA should identify the period to be 2 ms and the fundamental frequency to be 500 Hz. the center-clipping will lose information around the zero crossings that happen every 1 ms. now suppose i give you a waveform, just like the 500 Hz sine but where there are two little positive "blips" or pulses added to the waveform at two adjacent zero crossings and then two little negative blips added at the following two zero crossings. then the whole thing repeats forever. this waveform repeats every 4 ms and is a 250 Hz waveform. but if the zero crossing blips are small enough to be clipped in the center- clipping process, that differentiating information is lost and the two post-clipped waveforms (from this one and from the pure sine) will look identical yet the fundamental frequency of the original two waveforms are one octave apart. now the way your PDA loses information is more sophisticated, but that simply means that one has to be more sophisticated in coming up with two different waveforms (one octave apart) that will result in identical histograms. if the period estimation is taken solely from the histogram (which i think is the point of your PDA), then there is no way to differentiate between these two waveforms that are one octave apart.
>> i could not get the MATLAB demo to work on my Mac. i need source and i >> doubt you're willing to give that up. > > Sorry, I am between Unix and Windoze, no Macs around... > But didn't I post some Matlab source code on comp.dsp?
i can't claim that i was paying attention at the time. could you repost it or find it at Google Groups?
> It's the same code I gave to PTO in provisional application, so you can > be sure it works, maybe just not as well as some people would prefer.
i'd be happy to try it out.
> Sorry again... not giving out commercial-quality code at this time: I > think I already gave out too much ...
that's the risk you take with patenting/publishing. you are probably familiar with the IVL and Brian Gibson's patents. same risk. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
>you are probably familiar with the IVL and Brian Gibson's patents.
>From US patent 4,688,464:
1. An apparatus, for determining the pitch of a substantially periodic audio input signal having one or more sinusoidal components forming a series of peaks of a given polarity that are separated by at least one peak of opposite polarity and that define at least one cycle of said signal, comprising: a first means for producing a plurality of overlapping, sample timing intervals from a portion of said input signal; a second means for producing an indication of the period of said input signal from said plurality of sample timing intervals; and a third means for converting said indication of said period of said input signal into a determination of the pitch of said input signal. Are you talking about this? To me, this pretty much looks like an invalid patent claim. There is no substance to it: it just mentions a few common sense operations, unless there is some hidden meaning to some of these words. But I don't want to pass a judgment on the patent as a whole, just this particular first claim. Talking about patent quality problems...
Ah, now you are mixing the objective, measurable by machine,
fundamental frequency with the perception, human-measurable
only, pitch.  Please don't do that.  There are many Ph.D
theses that look at the relationship between the two, which
is considerable, but not a one-to-one mapping in
non-sinusoid signals.  I have even tricked the human sensing
pitch, using sinusoidal frequencies with different
modulation and phase.

Since "pitch" is shorter and easier to say than "fundamental
frequency", the term "pitch estimate" came into vogue, but
what you are really doing is "fundamental frequency
estimation".  Experienced engineers do the translation
automatically, but some novices can be confused.

-- 
Chip Wood


"robert bristow-johnson" <rbj@audioimagination.com> wrote in
message news:BF460B15.A3EF%

> that's usually the bugger. if it's perfectly periodic,
pitch detection is
> pretty easy with AMDF, ASDF, autocorrelation. but there
are still problems
> with PDAs and human perception. what if a stong 200 Hz
waveform has a weak
> (say down by 60 dB) 100 Hz waveform added to it?
mathematically, it's 100
> Hz, but it sounds like 200 Hz. >
in article 1126291325.830575.238960@g47g2000cwa.googlegroups.com, fizteh89
at dt@soundmathtech.com wrote on 09/09/2005 14:42:

>> you are probably familiar with the IVL and Brian Gibson's patents. > >> From US patent 4,688,464: > > 1. An apparatus, for determining the pitch of a substantially periodic > audio input signal having one or more sinusoidal components forming a > series of peaks of a given polarity that are separated by at least one > peak of opposite polarity and that define at least one cycle of said > signal, comprising: > > a first means for producing a plurality of overlapping, sample timing > intervals from a portion of said input signal; > > a second means for producing an indication of the period of said input > signal from said plurality of sample timing intervals; and > > a third means for converting said indication of said period of said > input signal into a determination of the pitch of said input signal. > > Are you talking about this?
yup. there are many other (later) patents. try an advanced search with an/IVL in the search field.
> To me, this pretty much looks like an invalid patent claim. There is no > substance to it:
i'll bet their lawyers are bigger than your lawyers.
> it just mentions a few common sense operations, unless > there is some hidden meaning to some of these words. But I don't want > to pass a judgment on the patent as a whole, just this particular first > claim. > Talking about patent quality problems...
that's a whole 'nother discussion. no doubt, i've felt that IVL has patented some stuff that doesn't pass the "obviousness" test or even the prior art test. but there is evidence of them locking horns with Mark of the Unicorn and winning. unfortunately there is some advice i gave to MOTU that i thought would help, yet IVL is listing it in their list of references (pat. #6,336,092), so i s'pose that means they are pretty confident that they have disposed of it. i still believe that IVL is possibly extracting royalties for stuff that is either obvious or prior art or both (kinda like Bill Gates patenting binary numbers and arithmetic), but i wouldn't want to take them on. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
in article dfsldd$f3d$1@avnika.corp.mot.com, Chip Wood at
chip.wood@motorola.com wrote on 09/09/2005 14:48:

> Ah, now you are mixing the objective, measurable by machine, > fundamental frequency with the perception, human-measurable > only, pitch. Please don't do that.
did you see my response to rhn?: in article BF467255.A41C%rbj@audioimagination.com, robert bristow-johnson at rbj@audioimagination.com wrote on 09/08/2005 22:57:
> in article 1126216435.080023.89540@g43g2000cwa.googlegroups.com, > rhnlogic@yahoo.com at rhnlogic@yahoo.com wrote on 09/08/2005 17:53: > >> To me, the term "pitch" refers to a phenomena of human perception. > > that is true but for a large class of musical sounds, mostly "tones", the > "pitch" of the note or tone (measured in octaves) is the base 2 log of the > fundamental frequency relative to a standard frequency. > > but not all musical sounds are these nice quasi-periodic tones. getting the > pitch of, say, a recorded belch or fart might be more difficult for the DSP > than for the human ear.
in article dfsldd$f3d$1@avnika.corp.mot.com, Chip Wood at chip.wood@motorola.com wrote on 09/09/2005 14:48:
> There are many Ph.D > theses that look at the relationship between the two, which > is considerable, but not a one-to-one mapping in > non-sinusoid signals.
if it's a "tonal" (perhaps a better technical term would be "quasi-periodic") musical note, there is a very large correlation between fundamental frequency of a waveform and perceived pitch unless all of the lower odd harmonics of the waveform are at a very low relative energy. adding a very weak 220 Hz waveform to a 440 Hz waveform will sound like A above middle C (440) but, mathematically, it is a 220 Hz waveform.
> I have even tricked the human sensing > pitch, using sinusoidal frequencies with different > modulation and phase.
was the result quasi-periodic?
> Since "pitch" is shorter and easier to say than "fundamental > frequency", the term "pitch estimate" came into vogue, but > what you are really doing is "fundamental frequency > estimation".
agreed, and there are non-harmonic musical sounds (i dunno if i would apply the term "tones" to them) that hardly have a fundamental frequency, yet we have a sense of pitch to them. and i am not addressing that case. only the quasi-periodic case.
> Experienced engineers do the translation > automatically, but some novices can be confused.
i guess i'm not too worried about it. most of the musical application of PDAs is so some guy can plug in his guitar to a box and he plays a note and the PDA figures out what the note is. sometimes for pitch to MIDI, sometimes for effects or even synthesis. and what we are doing is trying to determine the fundamental frequency (with caveats) and base 2 log it. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
robert bristow-johnson wrote:
> i guess i'm not too worried about it. most of the musical application of > PDAs is so some guy can plug in his guitar to a box and he plays a note and > the PDA figures out what the note is. sometimes for pitch to MIDI, > sometimes for effects or even synthesis. and what we are doing is trying to > determine the fundamental frequency (with caveats) and base 2 log it.
Or tell him how far out of tune his instrument is from some selected tuning and/or intonation. IMHO. YMMV. -- rhn A.T nicholson d.O.t C-o-M
<malakhov@cas.ru> wrote in message
news:1126282396.391336.145990@o13g2000cwo.googlegroups.com...

> However it's not quite clear for me what kind of problem > does the > person initiated this discussion wants to solve. > I can just suppose some. > 1) That person needs a music transcription tool for his or > her music > exercises. > 2) He or she wants to develop his (her) music > transcription software. > 3) He or she needs to develop and to test the period > estimator for > some other purpose than music transcription. It can be > student work; > or it can be medical or industrial application etc. >
I recently saw a software package advertised for sale by a telemarketer that claimed to measure a singer's pitch in real-time and graphically show the singer where his/her pitch was relative to where it should be. Guaranteed to make scalded cats sound like Caruso.
fizteh89 wrote:
> Well, well, well... > > Nothing unusual... comp.dsp "experts" discussing something they > don't have a clue about and giving wrong answers to a general > public... > I was going to skip this discussion but just couldn't resist... > > Just where did you guys go to school? > Do you know anything other than DFT?
I don't think you will find anyone on comp.dsp, regulars or others, who claim they are experts at anything. Most have an interest in DSP, and some even have some experience from various types of work. As for the posts I have come up with in this thread, you are right in both your stated and implicit accounts: I haven't gone to one of those high-prestige schools or universities, and I am no expert in speech processing. I have done a little bit of DSP with acoustics over the years, but not speach or audio. What I *do* know (apart from the DFT), is that anyone who claims to have the One True Answer to a problem is in deep trouble. They might be right in the short term, that they have provided some trick or algorithm or way to solve some problem. On the long term, however, such people tend to become liabilities to their companies and organizations, since they tend to get involved with studying their own Glorious Image in the mirror, not finding it worth their while to keep up with the dilettantes that otherwise swamp their profession. One way of actually keeping up with the profession is to ask the "silly" questions and hopefully get the elaborate answers. If it is "obvious" to an expert, true or self- proclaimed, on speech processing that the pitch is encoded in the time-domain autocorrelation function, it is not at all so for the generalist or somebody who have other fields of interest. So why don't you share your highly valued expertise with the amateurs, perhaps even on comp.dsp, and show where they are wrong and why? How would anybody know about your achievements if you do not share them? How would anyone recognize your genious if they do not understand the complexity of a problem worthy of *your* attention? There are lots of posers in the world; 13 on the dozen, if not more. Only the educated is capable of recognizing true genious. It is in your own interest to educate, not mock, the inept. Rune