DSPRelated.com
Forums

Pitch Estimation using Autocorrelation

Started by olivers September 7, 2005
in article 1127218541.142621.319580@g49g2000cwa.googlegroups.com, Rune
Allnor at allnor@tele.ntnu.no wrote on 09/20/2005 08:15:

> > Speedy wrote: > >> In a previous mail Rune wrote: >>> You are right in that the resonance is what drives the >>> pitch of most sources, it is not what drives the pitch >>> of the human voice. >> >> and in another: >> >>> The bow keeps the string resonating, the vibration of >>> the string being amplified in the resonance cave of the violin >>> to produce the sound. >> >> I think you are contradicting yourself here. The string is the source >> and it determines the pitch. The resonance cave of the violin is what >> shapes the source signal and acts as a filter. It is not the resonance >> that determines the pitch. Unless your are using the term "resonance" >> in a rather confusing way and I have misunderstood you. > > No, I am not confusing anybody. I am sticking to the technical > terms. The string is not the source in a violin. The bow or > finger that cuases it to vibrate, is. > > Now, it seems to me as if the music and speech people find it > easier (and understandably so) to use a model where the the > string of a violin or or membrane of a drum is the source. > > Thise does introduce quite a bit of confusion when they talk > with people who are used to more elborate models.
i think you guys might want to change the Subject: header for this. it is not really about Pitch Estimation using Autocorrelation anymore. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
Rune Allnor wrote:
> > ... Name me one purely "periodic > > source" without an accompaning "resonant system", > > The human vocal cords. There is nothing that resonate in > the glottis (if that's the name of where the cords are > located). "Resonance" is a very specific technical term > that is based on the idea that energy that is rapidly > fed to a linear system (like some sort of impulse), > takes a long time to fade away. The "resonant system" > has long memory, if you like. > > The vocal cords don't satisfy those criteria. They tend > to restrict pressurized air that are pushed out of the > lungs. The cords are somewhat rigid, so they don't give > way just like that. When they do, a small pulse of air > is released, thus lowering the pressure on the inside > of the cords just enough for the cords to seal tight. > This is the same principle that a trumpet player uses > to get his horn to sound. Ask your local trumpet > player to demonstrate what happens inside his mouthpiece. > "Playing without the horn" was basic part of trumpet > training in my days. > > The tension applied to the vocal chords, as well as the > lips in the trumpet demonstartion, determines how well > they seal, what force it takes to have them slip, > and thus the period between each released pulse. > > Once the pressure inside the cords is lovered there > is no vibration whatosever in the vocal chords or > lips. > > Thus, they are not a resonant system. Nonlinear, not > resonant.
Interesting. So, in your terminology, a non-linear system with a periodic solution (constantly pressurized spring loaded gateway: vocal chord, lips, reed, safety valve, etc.) is called something other than resonant. Or is this just a bias towards systems with linear solutions (which are often just approximations to the actual real-world systems)? And (to get back near the original topic) because a system is non-linear but periodic, does that somehow imply that autocorrelation is a better estimator than frequency spectrum of some fundamental system characteristic? IMHO. YMMV. -- rhn A.T nicholson d.O.t C-o-M
rhnlogic@yahoo.com wrote:
> Rune Allnor wrote: > >>>... Name me one purely "periodic >>>source" without an accompaning "resonant system", >> >>The human vocal cords. There is nothing that resonate in >>the glottis (if that's the name of where the cords are >>located). "Resonance" is a very specific technical term >>that is based on the idea that energy that is rapidly >>fed to a linear system (like some sort of impulse), >>takes a long time to fade away. The "resonant system" >>has long memory, if you like. >> >>The vocal cords don't satisfy those criteria. They tend >>to restrict pressurized air that are pushed out of the >>lungs. The cords are somewhat rigid, so they don't give >>way just like that. When they do, a small pulse of air >>is released, thus lowering the pressure on the inside >>of the cords just enough for the cords to seal tight. >>This is the same principle that a trumpet player uses >>to get his horn to sound. Ask your local trumpet >>player to demonstrate what happens inside his mouthpiece. >>"Playing without the horn" was basic part of trumpet >>training in my days. >> >>The tension applied to the vocal chords, as well as the >>lips in the trumpet demonstartion, determines how well >>they seal, what force it takes to have them slip, >>and thus the period between each released pulse. >> >>Once the pressure inside the cords is lovered there >>is no vibration whatosever in the vocal chords or >>lips. >> >>Thus, they are not a resonant system. Nonlinear, not >>resonant. > > > Interesting. So, in your terminology, a non-linear system > with a periodic solution (constantly pressurized spring > loaded gateway: vocal chord, lips, reed, safety valve, etc.) > is called something other than resonant. Or is this just > a bias towards systems with linear solutions (which are often > just approximations to the actual real-world systems)?
I needed a while to see what Rune was driving at, but I mostly have to agree with him here. A relaxation oscillator doesn't involve resonance as I see it, despite being tunable. A simple example is a Schmitt-trigger inverter with a capacitor from input to ground and a resistor from input to output. The frequency is k/RC, where k is a property of the device, but there is no resonance.
> And (to get back near the original topic) because a system is > non-linear but periodic, does that somehow imply that > autocorrelation is a better estimator than frequency spectrum > of some fundamental system characteristic?
If so, I don't see why. Jerry -- Engineering is the art of making what you want from things you can get. �����������������������������������������������������������������������
in article p-SdnWdQoprsxK3eRVn-vw@rcn.net, Jerry Avins at jya@ieee.org wrote
on 09/20/2005 14:48:

> rhnlogic@yahoo.com wrote:
...
>> And (to get back near the original topic) because a system is >> non-linear but periodic, does that somehow imply that >> autocorrelation is a better estimator than frequency spectrum >> of some fundamental system characteristic? > > If so, I don't see why.
i'm not sure of what the details of a "frequency spectrum estimator" of the fundamental frequency are. a disadvantage of any estimator is to make assumptions or to require conditions other than what is fundamental. there should be no assumption of anything other than some concept of periodicity. the AMDF and ASDF methods make no other assumption, and autocorrelation is essentially ASDF turned upside-down and offset upward a little. it doesn't fix every problem (what if the input is not periodic at all, but somehow we still hear a pitch or what if a -80 dB 220Hz waveform is added to a 0 dB 440 Hz waveform - which is it, 220 or 440Hz?) but it avoids many that come from poorer algorithms (like counting zero-crossings). -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
In comp.dsp robert bristow-johnson <rbj@audioimagination.com> wrote:
> in article dg9ph1$469$1@avnika.corp.mot.com, Chip Wood at > chip.wood@motorola.com wrote on 09/14/2005 14:18:
> 1. input is not quasi-periodic. so no fundamental can be determined. then > the perceived pitch may be very difficult for a silicon based "machine" > (since you don't grant that we are machines) to be determined, if not > impossible at the moment. and people may very well disagree as to what the > pitch is or even if it has a pitch.
> 2. input is quasi-periodic, but there are "octave problems" which happen > when all of the lower odd harmonics are so weak that the human hears it as > one octave higher than it really is mathematically.
The input can be quite periodic and the pitch have nothing to do with the fundamental or the period. Whereas the pitch of the harmonic complex (1400-1600-1800)Hz is clear and is 200Hz, the pitch of the harmonic complex made up of (1500-1700-1900)Hz is neither 100 nor 200 Hz. It's an ambiguous pitch with a value of about 188 or 212 Hz. Didier -- Didier A Depireux ddepi001@umaryland.edu didier@isr.umd.edu 20 Penn Str - S218E http://neurobiology.umaryland.edu/depireux.htm Anatomy and Neurobiology Phone: 410-706-1272 (lab) University of Maryland -1273 (off) Baltimore MD 21201 USA Fax: 1-410-706-2512
Yes, it's true that our hearing ability (or inability) is vastly more
superior
to our vocal capability, as far as sound production/perception is
concerned.

While we can hear and (subjectively) interpret all of those strange
computer-generated artificial sounds (harmonic/inharmonic complexes,
periodic/aperiodic series of clicks etc.), we are very limited in our
ability to use our vocal apparatus to produce anything other than a
series of glottal impulses and/or turbulent noise subsequently filtered
by vocal tract.
After all, our vocal apparatus is just an air-filled tube also used for
eating and drinking, as opposed to Auditory Cortex...

Auditory researchers who study psychoacoustics should therefore have no
trouble at all achieving their goals in terms of numbers of published
papers  :)

But, as far as speech (and monophonic audio, for the most part) is
concerned,
"pitch" MUST be automatically substituted with "fundamental
frequency" (or "fundamental period") in order to avoid confusion,
at least when some kind of objective measurement is discussed (e.g.
"pitch estimation using autocorrelation").

Please cite the source.  Any triplet of harmonics 200Hz
apart, odd or even multiples, should have a fundamental
frequency of 200Hz and the perceived pitch should be
similar.   If anything, I would suspect the odd triplet to
have a pitch near 100Hz, the brain assuming that the even
values of 1600 and 1800 are simply missing since many
instruments produce only odd harmonics and the vocal tract
resonates at the odd multiples of 500, 1500, 2500 formants
for a male neutral vowel, but the listener rarely misses
identifying the pitch at the fundamental frequency of around
100Hz.

-- 
Chip Wood

"Didier A. Depireux" <didier@umd.edu> wrote in message

> The input can be quite periodic and the pitch have nothing
to do with the
> fundamental or the period. Whereas the pitch of the
harmonic complex
> (1400-1600-1800)Hz is clear and is 200Hz, the pitch of the
harmonic complex
> made up of (1500-1700-1900)Hz is neither 100 nor 200 Hz.
It's an ambiguous
> pitch with a value of about 188 or 212 Hz.
-2512
fizteh89 wrote:
> Yes, it's true that our hearing ability (or inability) is vastly more > superior > to our vocal capability, as far as sound production/perception is > concerned. > > While we can hear and (subjectively) interpret all of those strange > computer-generated artificial sounds (harmonic/inharmonic complexes, > periodic/aperiodic series of clicks etc.), we are very limited in our > ability to use our vocal apparatus to produce anything other than a > series of glottal impulses and/or turbulent noise subsequently filtered > by vocal tract. > After all, our vocal apparatus is just an air-filled tube also used for > eating and drinking, as opposed to Auditory Cortex... > > Auditory researchers who study psychoacoustics should therefore have no > trouble at all achieving their goals in terms of numbers of published > papers :) > > But, as far as speech (and monophonic audio, for the most part) is > concerned, > "pitch" MUST be automatically substituted with "fundamental > frequency" (or "fundamental period") in order to avoid confusion, > at least when some kind of objective measurement is discussed (e.g. > "pitch estimation using autocorrelation").
Not at all always true, given that much monophonic audio and speech these days is not sent directly from vocal apparatus to human hearing by air, but quite often includes a bunch of transducers, wires, silicon and optical fiber in the path, wherein the fitering might completely eliminate the fundamental frequency energy, and data compression might obscure even the fundamental period information. One of the interesting problems which I've been studying is that of determining the pitch produced by lowest keys of a cheap spinet piano, given that the strings produce an inharmonic spectrum, and thus there is no true fundamental period in this spectrum anywhere near the pitch, and through a telco grade microphone which rolls off several octaves above the note in question, so there is no fundamental frequency energy in the data stream. In both of the above cases, pitch might be something different from the fundamental frequency or period of the data at various points in the channel. IMHO. YMMV. -- rhn A.T nicholson d.O.t C-o-M
in article dhgv8m$pie$1@grapevine.wam.umd.edu, Didier A. Depireux at
didier@umd.edu wrote on 09/29/2005 10:55:

> In comp.dsp robert bristow-johnson <rbj@audioimagination.com> wrote: > >> 1. input is not quasi-periodic. so no fundamental can be determined. then >> the perceived pitch may be very difficult for a silicon based "machine" >> (since you don't grant that we are machines) to be determined, if not >> impossible at the moment. and people may very well disagree as to what the >> pitch is or even if it has a pitch. > >> 2. input is quasi-periodic, but there are "octave problems" which happen >> when all of the lower odd harmonics are so weak that the human hears it as >> one octave higher than it really is mathematically. > > The input can be quite periodic and the pitch have nothing to do with the > fundamental or the period. Whereas the pitch of the harmonic complex > (1400-1600-1800)Hz is clear and is 200Hz,
so the 7th 8th and 9th harmonic suffice to identify the fundamental in our perception, but...
> the pitch of the harmonic complex > made up of (1500-1700-1900)Hz is neither 100 nor 200 Hz.
and the 15th, 17th, and 19th harmonic is not.
> It's an ambiguous pitch with a value of about 188 or 212 Hz.
same as what Chip asks below. where do the 188 and 212 come from? in article dhh9ce$s5u$1@avnika.corp.mot.com, Chip Wood at chip.wood@motorola.com wrote on 09/29/2005 13:47:
> Please cite the source.
i agree with that.
> Any triplet of harmonics 200Hz > apart, odd or even multiples, should have a fundamental > frequency of 200Hz and the perceived pitch should be > similar.
i do not think that 300, 500, and 700 Hz will be heard as a 200 Hz tone.
> If anything, I would suspect the odd triplet to > have a pitch near 100Hz,
i agree with that.
> the brain assuming that the even > values of 1600 and 1800 are simply missing since many > instruments produce only odd harmonics and the vocal tract > resonates at the odd multiples of 500, 1500, 2500 formants > for a male neutral vowel, but the listener rarely misses > identifying the pitch at the fundamental frequency of around 100Hz.
and that. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
robert bristow-johnson wrote:
> in article dhgv8m$pie$1@grapevine.wam.umd.edu, Didier A. Depireux at > didier@umd.edu wrote on 09/29/2005 10:55:
...
>>The input can be quite periodic and the pitch have nothing to do with the >>fundamental or the period. Whereas the pitch of the harmonic complex >>(1400-1600-1800)Hz is clear and is 200Hz, > > > so the 7th 8th and 9th harmonic suffice to identify the fundamental in our > perception, but... > > >>the pitch of the harmonic complex >>made up of (1500-1700-1900)Hz is neither 100 nor 200 Hz. > > > and the 15th, 17th, and 19th harmonic is not. > > >>It's an ambiguous pitch with a value of about 188 or 212 Hz. > > > same as what Chip asks below. where do the 188 and 212 come from? > > in article dhh9ce$s5u$1@avnika.corp.mot.com, Chip Wood at > chip.wood@motorola.com wrote on 09/29/2005 13:47: > > >>Please cite the source. > > > i agree with that. > > >> Any triplet of harmonics 200Hz >>apart, odd or even multiples, should have a fundamental >>frequency of 200Hz and the perceived pitch should be >>similar. > > > i do not think that 300, 500, and 700 Hz will be heard as a 200 Hz tone.
Of course not. They're not harmonics of 200 Hz.
>> If anything, I would suspect the odd triplet to >>have a pitch near 100Hz, > > > i agree with that.
Well, it sounds reasonable. (That means it's what I would have guessed.) Try it; you might be as surprised as I was. (You might not. My tinnitus might be confusing me.) I suspect that the frequencies involved are too remote from 100 Hz for my ear to recreate the missing fundamental, and the nearly-but-not-quite 200 Hz pitch comes from the 200-Hz differences. ... Jerry -- Engineering is the art of making what you want from things you can get. &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;