DSPRelated.com
Forums

Pitch Estimation using Autocorrelation

Started by olivers September 7, 2005
Jerry Avins wrote:
..
>>> the pitch of the harmonic complex >>> made up of (1500-1700-1900)Hz is neither 100 nor 200 Hz. >> >> >> >> and the 15th, 17th, and 19th harmonic is not. >> >> >>> It's an ambiguous pitch with a value of about 188 or 212 Hz. >> >> >> >> same as what Chip asks below. where do the 188 and 212 come from? >>
Its a subtle psycho-acoustic phenomenon, and indeed a very good demonstration of the difference between mathematical periodicity and perceived pitch. On paper, yes the theoretical fundamental is 100, but that is not what we hear. Rather, we hear most strongly the two primary difference tones of 1700-1500, and 1900-1700, AND the difference tone 1900-1500, i.e we hear a strong pseudo-fundamental of 200 Hz (sometime termed the "resultant tone") reinforced by the weaker octave at 400. All you have to do is synthesise sine tones of the frequencies specified, and listen to the result. Note that the specified frequencies give two somewhat wide major seconds (1500 * 5/4 = 1875, so 1900 is a "sharp major third"); so that the resultant tone is not in tune (not harmonically related to the generating tones by a simple ratio). The ear ultimately is not a mathematical thing, but something else! Thus, I assert that the resultant tone is not an ambiguous pitch, it is exactly 200Hz. We could equally postulate that the frequencies 1500,1700 and 1900 are the 30,34th and 38th harmonics of a low 50Hz fundamental. The ear is not this ingenious however, and simply hears the resultant tone(s), and then decides whether or not it is harmonically true to the "harmonics". Musicians soon learn about resultant tones, and would not claim that the ptich of the sum of those frequencies was 200Hz (~= low Aflat), but simply that the sound is a cluster of two high, wide major seconds. Note that moving the head even slightly when listening to such a sine cluster will give weird phase effects (especially if you are playing back over two speakers), which can easily make the sounds seem to swim about in pitch a bit. To hear the resultant tone clearly, you need to play the generating tones quite loudly, which can induce that other strange trick of the ear wherein the pitch seems to drop as tones get louder. Richard Dobson
Richard Dobson wrote:
> Its a subtle psycho-acoustic phenomenon, and indeed a very good > demonstration of the difference between mathematical periodicity and > perceived pitch.
Another neat demonstration of the difference is the Risset scale which methematically is a repeating sequence of rising fundamentals, but is perceived as a continuously rising (non-repeating) pitch. The auditory cortex has of course evolved to do something useful - extract articulatory information from speech, and the basics of this task is best served by detecting *changes* in fundamental and resonance rather than absolutes. There are however languages where pitch determines meaning, and speakers of those languanges are more likely to possess the ability of perfect pitch which is usually lost (presumably during the phase of speech aquisition) for speakers of language where change in pitch is more important than absolutes. Ben Bridgwater
Ben Bridgwater wrote:

..
> The auditory cortex has of course evolved to do something useful - > extract articulatory information from speech, and the basics of this > task is best served by detecting *changes* in fundamental and resonance > rather than absolutes. There are however languages where pitch > determines meaning, and speakers of those languanges are more likely to > possess the ability of perfect pitch which is usually lost (presumably > during the phase of speech aquisition) for speakers of language where > change in pitch is more important than absolutes.
I find myself incredulous at this: can you specify a language in which ~absolute~ pitch determines meaning? That would imply that the same phrase uttered by male or female, adult or child, will have a different meaning! E.g. "Hello" at A=440 is different from "Hello" at C=256! Surely you mean "pitch-inflection" (= change in pitch)? I am aware of examples of this, IIRC Mandarin(?). And also for example: "Me" ( pitch inflects downwards) = "!" "Me" (pitch inflects upwards) = "?" Richard Dobson
Richard Dobson wrote:

> Ben Bridgwater wrote: > > .. > >> The auditory cortex has of course evolved to do something useful - >> extract articulatory information from speech, and the basics of this >> task is best served by detecting *changes* in fundamental and >> resonance rather than absolutes. There are however languages where >> pitch determines meaning, and speakers of those languanges are more >> likely to possess the ability of perfect pitch which is usually lost >> (presumably during the phase of speech aquisition) for speakers of >> language where change in pitch is more important than absolutes. > > > > I find myself incredulous at this: can you specify a language in which > ~absolute~ pitch determines meaning? That would imply that the same > phrase uttered by male or female, adult or child, will have a different > meaning!
No - it just means that you need to control the pitch to get the correct meaning.
> > E.g. "Hello" at A=440 is different from "Hello" at C=256! > > > Surely you mean "pitch-inflection" (= change in pitch)? I am aware of > examples of this, IIRC Mandarin(?).
Tonal languages use both inflection as well as absolute (that is to say flat, but perhaps relative to the speakers base frequency) pitch to convey meaning. Using an example for Moira Yip's book "Tone", in Cantonese the syllable "yau" (rhymes with how), has six different meanings depending on pitch: high level = worry high rising = paint (noun) mid level = thin low level = again very low level = oil low rising = have Ben Bridgwater
Ben Bridgwater wrote:

..
> Tonal languages use both inflection as well as absolute (that is to say > flat, but perhaps relative to the speakers base frequency) pitch to > convey meaning. > > Using an example for Moira Yip's book "Tone", in Cantonese the syllable > "yau" (rhymes with how), has six different meanings depending on pitch: > > high level = worry > high rising = paint (noun) > mid level = thin > low level = again > very low level = oil > low rising = have
I think we have a collision of terminology here. As a musician I would describe your examples above as demonstrating a difference of ~register~ (pitch is both too specific and registrally undefined),and definitely relative to the speakers base frequency (which may however itself drift on account of a variety of factors). To a musician "absolute pitch" is fully synonymous with "perfect pitch", defined as the ability to identify the musical name (pitch class) for a single presented tone (and a complete nuisance for someone playing a transposing instrument). You seem to use the term "flat" as a synonym for "absolute". Now, "flat" has a well-known meaning to musicians (I suggest the word "fixed" instead), whereas it seems here to signify a "flat trajectory" (your "level" qualifier above) as distinct from an inflected pitch trajectory. I would describe your table above as illustrating a register-dependent meaning, which is related to "relative pitch", which recognises the interval between two pitches, but not the specific fundamental frequencies or pitch-class names of the two notes. Strangely enough, I have read reports of musicians with perfect pitch who can correctly name the two notes presented in succession, but cannot say for certain which is the higher (octave ambiguity), whereas a musician with good relative pitch would have no such difficulty. It seems to me therefore from your example that relative pitch is still the critical requirement for Cantonese speakers, to discriminate low/mid/high, and inflection up and down, but that knowing that "yau" is pitched on C=523Hz (unlikely for a male!) is not required. My sugegstion therefore is that you confine use of the term "absolute" to those situations, if they exist, where the literal fundamental frequency of a sound is semantically significant. That will avoid any confusion of meaning between speech scientists and musicians! I have a favourite analogy between pitch cognition and visual cognition. Perfect pitch corresponds to colour perception - we know this is blue and that is yellow, but may not be sure which is brighter; whereas relative pitch corresponds to the greyscale image where we do no know which is which colour (or even what colour!), but can easily tell which is the brighter. Richard Dobson
Ben Bridgwater wrote:
> Richard Dobson wrote: > >> Ben Bridgwater wrote: >> >> .. >> >>> The auditory cortex has of course evolved to do something useful - >>> extract articulatory information from speech, and the basics of this >>> task is best served by detecting *changes* in fundamental and >>> resonance rather than absolutes. There are however languages where >>> pitch determines meaning, and speakers of those languanges are more >>> likely to possess the ability of perfect pitch which is usually lost >>> (presumably during the phase of speech aquisition) for speakers of >>> language where change in pitch is more important than absolutes. >> >> >> >> >> I find myself incredulous at this: can you specify a language in which >> ~absolute~ pitch determines meaning? That would imply that the same >> phrase uttered by male or female, adult or child, will have a >> different meaning! > > > No - it just means that you need to control the pitch to get the correct > meaning. > >> >> E.g. "Hello" at A=440 is different from "Hello" at C=256! >> >> >> Surely you mean "pitch-inflection" (= change in pitch)? I am aware of >> examples of this, IIRC Mandarin(?). > > > Tonal languages use both inflection as well as absolute (that is to say > flat, but perhaps relative to the speakers base frequency) pitch to > convey meaning. > > Using an example for Moira Yip's book "Tone", in Cantonese the syllable > "yau" (rhymes with how), has six different meanings depending on pitch: > > high level = worry > high rising = paint (noun) > mid level = thin > low level = again > very low level = oil > low rising = have > > Ben Bridgwater
Cantonese doesn't rely on absolute pitch at all. Its is completely relative. Depending who you ask there can be anywhere from 7 to 11 "tones". However, they are either sliding or flat relative pitches. A Chinese character represents one syllable, and each syllable has a tone associated with it. A single character said in isolation is highly ambiguous, because its relative tone cannot be determined. Thus, a single word is usually expressed in a polysyllabic way, to remove (or at least greatly reduce) the ambiguity. Tones have a considerable impact on what works well for song lyrics. Bottom line: no absolute pitch in Cantonese (or Mandarin for that matter). Regards, Steve
In comp.dsp Chip Wood <chip.wood@motorola.com> wrote:
> Please cite the source. Any triplet of harmonics 200Hz > apart, odd or even multiples, should have a fundamental > frequency of 200Hz and the perceived pitch should be > similar. If anything, I would suspect the odd triplet to > have a pitch near 100Hz, the brain assuming that the even > values of 1600 and 1800 are simply missing since many > instruments produce only odd harmonics and the vocal tract > resonates at the odd multiples of 500, 1500, 2500 formants > for a male neutral vowel, but the listener rarely misses > identifying the pitch at the fundamental frequency of around > 100Hz.
It's quite a standard result. As explained in http://homepage.mac.com/cariani/CarianiWebsite/PitchEquivPlot.gif and in the paper they published on the subject, Cariani PA, Delgutte B. J Neurophysiol. 1996 Sep;76(3):1698-716. Neural correlates of the pitch of complex tones. I. Pitch and pitch salience. the pitch of most sounds, except for alternating click trains around 200Hz for which there's an ambiguity, can be predicted from the largest peak of the autocorrelation of the half-wave rectified waveform. I used the same result to find that neurons in the inferior colliculus (part of the auditory pathway) fire in a way that's correlated with the perceived pitch, and that correlation disappears in the frequency range for which humans loose the percept of pitch too (considering 3 harmonics etc). That poster is on http://www.isr.umd.edu/CAAR/posters/ARO97.pdf Finally the result I quoted is shown as a function of the fundamental in Fig 5.11 of the book by Fastl and Zwicker called "Psychoacoustics". They show how the pitch of a 3-tone complex made up of (f, f+300, f+600)Hz changes as a function of f. Look also at the upper right part of Fig.2 of the paper in http://arxiv.org/abs/nlin.CD/0210065 And finally finally, check it out yourself! Didier -- Didier A Depireux ddepi001@umaryland.edu didier@isr.umd.edu 20 Penn Str - S218E http://neurobiology.umaryland.edu/depireux.htm Anatomy and Neurobiology Phone: 410-706-1272 (lab) University of Maryland -1273 (off) Baltimore MD 21201 USA Fax: 1-410-706-2512
Steve Underwood wrote:

> Ben Bridgwater wrote:
>> Using an example for Moira Yip's book "Tone", in Cantonese the >> syllable "yau" (rhymes with how), has six different meanings depending >> on pitch: >> >> high level = worry >> high rising = paint (noun) >> mid level = thin >> low level = again >> very low level = oil >> low rising = have >> >> Ben Bridgwater > > > Cantonese doesn't rely on absolute pitch at all. Its is completely > relative. Depending who you ask there can be anywhere from 7 to 11 > "tones". However, they are either sliding or flat relative pitches. A > Chinese character represents one syllable, and each syllable has a tone > associated with it. A single character said in isolation is highly > ambiguous, because its relative tone cannot be determined. Thus, a > single word is usually expressed in a polysyllabic way, to remove (or at > least greatly reduce) the ambiguity. > > Tones have a considerable impact on what works well for song lyrics. > > Bottom line: no absolute pitch in Cantonese (or Mandarin for that matter). > > Regards, > Steve
Interesting - thanks, Steve. I guess this does support my original point that the auditory cortex has little use for absolutes (fundamental frequency) for speech, and that it's not surprising that the perceptual phenomena of pitch is distinct from the mathematical one of fundamental frequency. Ben
In comp.dsp robert bristow-johnson <rbj@audioimagination.com> wrote:

> so the 7th 8th and 9th harmonic suffice to identify the fundamental in our > perception, but...
> > the pitch of the harmonic complex > > made up of (1500-1700-1900)Hz is neither 100 nor 200 Hz.
> and the 15th, 17th, and 19th harmonic is not.
Loook at your numbers, they are so high! The brain prefers to think of (1500,1700,1900)Hz as the 8th, 9th and 10th harmonics of 188Hz, or the 7th, 8th and 9th harmonics of 212Hz than the 15th, 17th and 19th harmonics of 100Hz, which are very high and non-consecutive numbers.
> > It's an ambiguous pitch with a value of about 188 or 212 Hz.
> same as what Chip asks below. where do the 188 and 212 come from?
I just looked for the peak of the autocorrelation of the half-wave rectified waveform. Also, I have done the above experiment, so I know from experience the pitch that people perceive (actually something like 212.8Hz, IIRC). Didier -- Didier A Depireux ddepi001@umaryland.edu didier@isr.umd.edu 20 Penn Str - S218E http://neurobiology.umaryland.edu/depireux.htm Anatomy and Neurobiology Phone: 410-706-1272 (lab) University of Maryland -1273 (off) Baltimore MD 21201 USA Fax: 1-410-706-2512
In comp.dsp Richard Dobson <richarddobson@blueyonder.co.uk> wrote:

> Thus, I assert that the resultant tone is not an ambiguous pitch, it is exactly > 200Hz.
I didn't assert anything from theoretical arguments, I did the experiment and I found 2 ambiguous pitches, as had many others before me. Just do it yourself, but playing a pure tone at 200Hz in between repetition of the (1500,1700,1900)Hz complex, and then inserting 212.8Hz instead. You will hear a much better match with 212.8 if you are like 90% of psychophysics subjects. I can't really argue with reality. Note that the pitch is rather weak and ambiguous, as I said, so instead, use (1400,1600,1800)Hz (pitch = 200Hz) and then shift to (1450,1650,1850)Hz. YOu will get a pitch that's still fairly strong at 205Hz. Add frequencies if you want (2050Hz) and the pitch will get a little stronger. Didier -- Didier A Depireux ddepi001@umaryland.edu didier@isr.umd.edu 20 Penn Str - S218E http://neurobiology.umaryland.edu/depireux.htm Anatomy and Neurobiology Phone: 410-706-1272 (lab) University of Maryland -1273 (off) Baltimore MD 21201 USA Fax: 1-410-706-2512