comp.dsp | Pitch Estimation using Autocorrelation| page 5

Reply by robert bristow-johnson ●September 13, 20052005-09-13

in article dg7ftb$sc4$1@avnika.corp.mot.com, Chip Wood at
chip.wood@motorola.com wrote on 09/13/2005 17:21:

> Hey guys, it is not rocket science.  "Pitch' and "loudness"
> are terms of perception that are measurable only by asking a
> human.  "Fundamental frequency" (or its inverse "Fundamental
> period"), and "Sound Pressure Level (SPL)" are physical
> entities measurable by machines.  There is much correlation
> (basically a log function) between each pair, BUT many other
> elements also factor in to their relationships.

not to disagree with the specific point (multiple physical parameters *can*
contribute to the sense of "pitch" or "loudness"), but, Chip, *we* are
machines, biological machines.  what is humanly perceptual and artificially
perceptual differs in the machine, but perhaps not in the programming.  if
both are programmed the same, they come up with the same answer.

not to imply that we have the programming down pat on this, *but* when we in
the audio/music engineering discipline say "pitch detection", it is almost
always understood to be a parameter that is something like 12*log2(f0/fr)
where f0 is the fundamental frequency of a quasi-periodic "tone" or "note"
or whatever is the waveform that comes out of 95% of pitched musical
instruments including the *singing* human voice.  (fr is the reference
frequency and the result above gives you the number of semitones of pitch
displacement from the reference.)

> As I said before, many a Ph.D dissertation have been written and many
> full professorships have been attained studying these other relationships.

not very impressive.  Ph.Ds have been written about lotsa stuff.  some
pretty worthless.

> Want to start another discussion-  What is sound "quality"?

probably a few dissertations written about that, also.  i know there have
been AES preprints about that, never could understand what-the-hell they
were talking about.

-- 

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."

Reply by Jerry Avins ●September 13, 20052005-09-13

Chip Wood wrote:

   ...

Right on!

> Want to start another discussion-  What is sound "quality"?

Sure! What's to say?

Jerry
-- 
Engineering is the art of making what you want from things you can get.
&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;

Reply by Rune Allnor ●September 14, 20052005-09-14

fizteh89 wrote:

> For starters I can suggest automatically substituting "pitch" with
> "fundamental frequency" or, better yet, "fundamental period"
> (or "glottal period", if you want), wherever you see a discussion
> related to speech processing.
> This will greatly reduce the amount of confusion.

This is, in fact, why I got confused in the first place.
The "fundamental period" is usually (i.e. in most other
applications than speech) related to a sinusoidal, most
of the time the one at the basic frequency, if harmonics
are present.

In frequency domain, the "pitch", as understood as "the basic
frequency", is more or less easily detected.

In the voced model, there seems to be no sinusoidal as such.
The difference between a "pulse train" and "a sinusoidal"
is significant here. 

Rune

Reply by Rune Allnor ●September 14, 20052005-09-14

fizteh89 wrote:
> For starters I can suggest automatically substituting "pitch" with
> "fundamental frequency" or, better yet, "fundamental period"
> (or "glottal period", if you want), wherever you see a discussion
> related to speech processing.
> This will greatly reduce the amount of confusion.

No, it will not.

If you read my first post in this thread, I based my respose on
that substitution and got a hell of a beating. There is a vast
difference between the fundamental period of a train of pulses,
and one or two sinusoidals.

In fact, it is a standard exercise in DSP intro classes to compute
the Fourier series of a pulse train. The sinudoidal that "fills
in" the whole time domain is clearly visible in the Fourier
domain. If there are more than one sinusoidal present (two or
three suffice) and some noise, there is little reason to believe
that one would beable to see anything useful in a time-domain
autocorrelation function. In frequency domain, yes, but not
in time domain.

Then consider the pulse train. The pulses are easily detected
in time domain, but due to the hign number of harmonics, they
become diffuse in frequency domain.

So this is a case where the model of the signal generator
is cruical to select the relevant processing tools. This is
the kind of thing the high-prestige schools and universities
do not teach, which relevance one only learns by hands-on
experience. 

Rune

Reply by Rune Allnor ●September 14, 20052005-09-14

Chip Wood wrote:
> Glad you got out your books on speech.  Now get them out for
> musical acoustics.   The exact same paradigm of a periodic,
> usually full of harmonics (depends which ones on the
> instrument) or aperiodic source (usually whitish noise)
> driving a cavity resonance exists for all acoustical
> instruments whether it be voice, violin, trumpet, drum, or
> piano.  It is not unique to voice, it is the fundamental
> principle of all sound created by wo/man.

Sorry, you are wrong.

You are right in that the resonance is what drives the
pitch of most sources, it is not what drives the pitch
of the human voice.

As I am sure you know, the (vocal) sound of the human
voice is modeled by the convolution of the impulse
train and the impulse response of the vocal tract.

If you were right in that it is the resonant behaviour
of the vocal tract that determines the pitch, the vocal
tract would have to change volume by orders of magnitude
in order to modulate the putch by a couple of octaves.

The volume of the vocal tract does not change much.
In fact, it os perfectly possible to modulate the
pitch without mudulating the vocal tract at all.
Just try humming with your mouth shut.

The property of the human voice that I find to be
almost unique (apart from voices of other animals)
is that it is the excitation period of the impulse
train that determines the pitch.

Resonance has nothing to do with pitch. The control
of the resonant behaviour of the vocal help shape
the formants. It does not determine the pitch. 

Rune

Reply by Jerry Avins ●September 14, 20052005-09-14

Rune Allnor wrote:

   ...

> The property of the human voice that I find to be
> almost unique (apart from voices of other animals)
> is that it is the excitation period of the impulse
> train that determines the pitch.

Unique? I think it is always true that the forcing function determines 
the frequency. Even with a trumpet. If that needs explaining, I'll try.
   ...

Jerry
-- 
Engineering is the art of making what you want from things you can get.
&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;

Reply by Rune Allnor ●September 14, 20052005-09-14

Jerry Avins wrote:
> Rune Allnor wrote:
>
>    ...
>
> > The property of the human voice that I find to be
> > almost unique (apart from voices of other animals)
> > is that it is the excitation period of the impulse
> > train that determines the pitch.
>
> Unique? I think it is always true that the forcing function determines
> the frequency. Even with a trumpet. If that needs explaining, I'll try.

Brass instruments are the only other system I can think of, where
this exitation mechanism has a part of determining the pitch.
A good trumpeter can use the tention of his lips to determine
the pitch of a signal horn (bugle?) without using valves.

As I remember, the "natural tones" of a B trumet were c, g, c1, e1,
g1, c2 ... (details of musical notations may be wrong). These are
the tones a trumpeter hits by adjusting his "lip service" with no
use of valves. I was able to modulate rougly 1/2 to one full tone
to each side of a "natural". Unintentionally, most of the time...

But the lip action only sets up the resonance of the air inside
the instrument. The pitch in a trumpet, unlike in the voice,
is determined by the physical size of a resonant cavity.
Hence the valves that couple the air into various elongations.

Rune

Reply by ●September 14, 20052005-09-14

Rune Allnor wrote:
> If there are more than one sinusoidal present (two or
> three suffice) and some noise, there is little reason to believe
> that one would be able to see anything useful in a time-domain
> autocorrelation function. In frequency domain, yes, but not
> in time domain.
>
Adding to Your words I like to give an example.
There is a demo sound (noise with pitch) at:
http://www.mrc-cbu.cam.ac.uk/cnbh/
web2005/teaching/sounds_movies/
Sounds/fixed16.wav
The repetitive nature of this sound one can detect perceptually.
The time-domain processing also finds stable repetition period.
The frequency-domain information seems to be useless for
this sound signal period detection. But if stable "period" exists
then it should be "frequency" somewhere. Am I wrong?
So, I can find (and I did) the repetition period of this sound
in pure spectral way. I'm far from idea the frequency-domain
processing is always better, I only want to say the spectral
analysis has a good ability for development. 

Vladimir Malakhov.

Reply by ●September 14, 20052005-09-14

Rune Allnor wrote:
> If there are more than one sinusoidal present (two or
> three suffice) and some noise, there is little reason to believe
> that one would be able to see anything useful in a time-domain
> autocorrelation function. In frequency domain, yes, but not
> in time domain.
>
Adding to Your words I like to give an example.
There is a demo sound (noise with pitch) at:
http://www.mrc-cbu.cam.ac.uk/cnbh/
web2005/teaching/sounds_movies/
Sounds/fixed16.wav
The repetitive nature of this sound one can detect perceptually.
The time-domain processing also finds stable repetition period.
The frequency-domain information seems to be useless for
this sound signal period detection. But if stable "period" exists
then it should be "frequency" somewhere. Am I wrong?
So, I can find (and I did) the repetition period of this sound
in pure spectral way. I'm far from idea the frequency-domain
processing is always better, I only want to say the spectral
analysis has a good ability for development. 

Vladimir Malakhov.

Reply by Jerry Avins ●September 14, 20052005-09-14

Rune Allnor wrote:
> Jerry Avins wrote:
> 
>>Rune Allnor wrote:
>>
>>   ...
>>
>>
>>>The property of the human voice that I find to be
>>>almost unique (apart from voices of other animals)
>>>is that it is the excitation period of the impulse
>>>train that determines the pitch.
>>
>>Unique? I think it is always true that the forcing function determines
>>the frequency. Even with a trumpet. If that needs explaining, I'll try.
> 
> 
> Brass instruments are the only other system I can think of, where
> this exitation mechanism has a part of determining the pitch.

Organ pipe, flute, recorder, clarinet, ...

> A good trumpeter can use the tention of his lips to determine
> the pitch of a signal horn (bugle?) without using valves.

The bore of the mouthpiece of a Baroque trumpet is much larger than the 
bore in a modern one. That allows the mouth volume to influence the 
pitch also. Baroque trumpets had a more irregular flare, thus lowering 
the Q and making the pitch more easily pulled.

> As I remember, the "natural tones" of a B trumet were c, g, c1, e1,
> g1, c2 ... (details of musical notations may be wrong). These are
> the tones a trumpeter hits by adjusting his "lip service" with no
> use of valves. I was able to modulate rougly 1/2 to one full tone
> to each side of a "natural". Unintentionally, most of the time...
> 
> But the lip action only sets up the resonance of the air inside
> the instrument. The pitch in a trumpet, unlike in the voice,
> is determined by the physical size of a resonant cavity.
> Hence the valves that couple the air into various elongations.

Just like other oscillators that are part of a resonant element. To 
separate formant and pitch, the oscillator must be uncoupled from the 
resonance. That seems to be the case with vocal cords (tuned by varying 
the tension) and formants (resonances in acoustically remote cavities).

Jerry
-- 
Engineering is the art of making what you want from things you can get.
&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;