Forums

Pitch Estimation using Autocorrelation

Started by olivers September 7, 2005
in article dg7ftb$sc4$1@avnika.corp.mot.com, Chip Wood at
chip.wood@motorola.com wrote on 09/13/2005 17:21:

> Hey guys, it is not rocket science. "Pitch' and "loudness" > are terms of perception that are measurable only by asking a > human. "Fundamental frequency" (or its inverse "Fundamental > period"), and "Sound Pressure Level (SPL)" are physical > entities measurable by machines. There is much correlation > (basically a log function) between each pair, BUT many other > elements also factor in to their relationships.
not to disagree with the specific point (multiple physical parameters *can* contribute to the sense of "pitch" or "loudness"), but, Chip, *we* are machines, biological machines. what is humanly perceptual and artificially perceptual differs in the machine, but perhaps not in the programming. if both are programmed the same, they come up with the same answer. not to imply that we have the programming down pat on this, *but* when we in the audio/music engineering discipline say "pitch detection", it is almost always understood to be a parameter that is something like 12*log2(f0/fr) where f0 is the fundamental frequency of a quasi-periodic "tone" or "note" or whatever is the waveform that comes out of 95% of pitched musical instruments including the *singing* human voice. (fr is the reference frequency and the result above gives you the number of semitones of pitch displacement from the reference.)
> As I said before, many a Ph.D dissertation have been written and many > full professorships have been attained studying these other relationships.
not very impressive. Ph.Ds have been written about lotsa stuff. some pretty worthless.
> Want to start another discussion- What is sound "quality"?
probably a few dissertations written about that, also. i know there have been AES preprints about that, never could understand what-the-hell they were talking about. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
Chip Wood wrote:

   ...

Right on!

> Want to start another discussion- What is sound "quality"?
Sure! What's to say? Jerry -- Engineering is the art of making what you want from things you can get. �����������������������������������������������������������������������
fizteh89 wrote:

> For starters I can suggest automatically substituting "pitch" with > "fundamental frequency" or, better yet, "fundamental period" > (or "glottal period", if you want), wherever you see a discussion > related to speech processing. > This will greatly reduce the amount of confusion.
This is, in fact, why I got confused in the first place. The "fundamental period" is usually (i.e. in most other applications than speech) related to a sinusoidal, most of the time the one at the basic frequency, if harmonics are present. In frequency domain, the "pitch", as understood as "the basic frequency", is more or less easily detected. In the voced model, there seems to be no sinusoidal as such. The difference between a "pulse train" and "a sinusoidal" is significant here. Rune
fizteh89 wrote:
> For starters I can suggest automatically substituting "pitch" with > "fundamental frequency" or, better yet, "fundamental period" > (or "glottal period", if you want), wherever you see a discussion > related to speech processing. > This will greatly reduce the amount of confusion.
No, it will not. If you read my first post in this thread, I based my respose on that substitution and got a hell of a beating. There is a vast difference between the fundamental period of a train of pulses, and one or two sinusoidals. In fact, it is a standard exercise in DSP intro classes to compute the Fourier series of a pulse train. The sinudoidal that "fills in" the whole time domain is clearly visible in the Fourier domain. If there are more than one sinusoidal present (two or three suffice) and some noise, there is little reason to believe that one would beable to see anything useful in a time-domain autocorrelation function. In frequency domain, yes, but not in time domain. Then consider the pulse train. The pulses are easily detected in time domain, but due to the hign number of harmonics, they become diffuse in frequency domain. So this is a case where the model of the signal generator is cruical to select the relevant processing tools. This is the kind of thing the high-prestige schools and universities do not teach, which relevance one only learns by hands-on experience. Rune
Chip Wood wrote:
> Glad you got out your books on speech. Now get them out for > musical acoustics. The exact same paradigm of a periodic, > usually full of harmonics (depends which ones on the > instrument) or aperiodic source (usually whitish noise) > driving a cavity resonance exists for all acoustical > instruments whether it be voice, violin, trumpet, drum, or > piano. It is not unique to voice, it is the fundamental > principle of all sound created by wo/man.
Sorry, you are wrong. You are right in that the resonance is what drives the pitch of most sources, it is not what drives the pitch of the human voice. As I am sure you know, the (vocal) sound of the human voice is modeled by the convolution of the impulse train and the impulse response of the vocal tract. If you were right in that it is the resonant behaviour of the vocal tract that determines the pitch, the vocal tract would have to change volume by orders of magnitude in order to modulate the putch by a couple of octaves. The volume of the vocal tract does not change much. In fact, it os perfectly possible to modulate the pitch without mudulating the vocal tract at all. Just try humming with your mouth shut. The property of the human voice that I find to be almost unique (apart from voices of other animals) is that it is the excitation period of the impulse train that determines the pitch. Resonance has nothing to do with pitch. The control of the resonant behaviour of the vocal help shape the formants. It does not determine the pitch. Rune
Rune Allnor wrote:

   ...

> The property of the human voice that I find to be > almost unique (apart from voices of other animals) > is that it is the excitation period of the impulse > train that determines the pitch.
Unique? I think it is always true that the forcing function determines the frequency. Even with a trumpet. If that needs explaining, I'll try. ... Jerry -- Engineering is the art of making what you want from things you can get. �����������������������������������������������������������������������
Jerry Avins wrote:
> Rune Allnor wrote: > > ... > > > The property of the human voice that I find to be > > almost unique (apart from voices of other animals) > > is that it is the excitation period of the impulse > > train that determines the pitch. > > Unique? I think it is always true that the forcing function determines > the frequency. Even with a trumpet. If that needs explaining, I'll try.
Brass instruments are the only other system I can think of, where this exitation mechanism has a part of determining the pitch. A good trumpeter can use the tention of his lips to determine the pitch of a signal horn (bugle?) without using valves. As I remember, the "natural tones" of a B trumet were c, g, c1, e1, g1, c2 ... (details of musical notations may be wrong). These are the tones a trumpeter hits by adjusting his "lip service" with no use of valves. I was able to modulate rougly 1/2 to one full tone to each side of a "natural". Unintentionally, most of the time... But the lip action only sets up the resonance of the air inside the instrument. The pitch in a trumpet, unlike in the voice, is determined by the physical size of a resonant cavity. Hence the valves that couple the air into various elongations. Rune
Rune Allnor wrote:
> If there are more than one sinusoidal present (two or > three suffice) and some noise, there is little reason to believe > that one would be able to see anything useful in a time-domain > autocorrelation function. In frequency domain, yes, but not > in time domain. >
Adding to Your words I like to give an example. There is a demo sound (noise with pitch) at: http://www.mrc-cbu.cam.ac.uk/cnbh/ web2005/teaching/sounds_movies/ Sounds/fixed16.wav The repetitive nature of this sound one can detect perceptually. The time-domain processing also finds stable repetition period. The frequency-domain information seems to be useless for this sound signal period detection. But if stable "period" exists then it should be "frequency" somewhere. Am I wrong? So, I can find (and I did) the repetition period of this sound in pure spectral way. I'm far from idea the frequency-domain processing is always better, I only want to say the spectral analysis has a good ability for development. Vladimir Malakhov.
Rune Allnor wrote:
> If there are more than one sinusoidal present (two or > three suffice) and some noise, there is little reason to believe > that one would be able to see anything useful in a time-domain > autocorrelation function. In frequency domain, yes, but not > in time domain. >
Adding to Your words I like to give an example. There is a demo sound (noise with pitch) at: http://www.mrc-cbu.cam.ac.uk/cnbh/ web2005/teaching/sounds_movies/ Sounds/fixed16.wav The repetitive nature of this sound one can detect perceptually. The time-domain processing also finds stable repetition period. The frequency-domain information seems to be useless for this sound signal period detection. But if stable "period" exists then it should be "frequency" somewhere. Am I wrong? So, I can find (and I did) the repetition period of this sound in pure spectral way. I'm far from idea the frequency-domain processing is always better, I only want to say the spectral analysis has a good ability for development. Vladimir Malakhov.
Rune Allnor wrote:
> Jerry Avins wrote: > >>Rune Allnor wrote: >> >> ... >> >> >>>The property of the human voice that I find to be >>>almost unique (apart from voices of other animals) >>>is that it is the excitation period of the impulse >>>train that determines the pitch. >> >>Unique? I think it is always true that the forcing function determines >>the frequency. Even with a trumpet. If that needs explaining, I'll try. > > > Brass instruments are the only other system I can think of, where > this exitation mechanism has a part of determining the pitch.
Organ pipe, flute, recorder, clarinet, ...
> A good trumpeter can use the tention of his lips to determine > the pitch of a signal horn (bugle?) without using valves.
The bore of the mouthpiece of a Baroque trumpet is much larger than the bore in a modern one. That allows the mouth volume to influence the pitch also. Baroque trumpets had a more irregular flare, thus lowering the Q and making the pitch more easily pulled.
> As I remember, the "natural tones" of a B trumet were c, g, c1, e1, > g1, c2 ... (details of musical notations may be wrong). These are > the tones a trumpeter hits by adjusting his "lip service" with no > use of valves. I was able to modulate rougly 1/2 to one full tone > to each side of a "natural". Unintentionally, most of the time... > > But the lip action only sets up the resonance of the air inside > the instrument. The pitch in a trumpet, unlike in the voice, > is determined by the physical size of a resonant cavity. > Hence the valves that couple the air into various elongations.
Just like other oscillators that are part of a resonant element. To separate formant and pitch, the oscillator must be uncoupled from the resonance. That seems to be the case with vocal cords (tuned by varying the tension) and formants (resonances in acoustically remote cavities). Jerry -- Engineering is the art of making what you want from things you can get. �����������������������������������������������������������������������