# Pitch Estimation using Autocorrelation

Started by September 7, 2005
```From what I understand the first minimum of an autocorrelation function
(say 1200 samples with a lag between 0 and 600) will give me the sample
value which can be directly mapped to frequency and thus pitch.

I have tried to extract this minima from my autocorrelation result with
varied results. I tried using a C (third fret 5th string) on my guitar and
got a sample value for the first minimum which varied between 68 and 75.
From my conversion chart
(http://grace.evergreen.edu/~arunc/intro_doc/node12.htm#SECTION00092000000000000000)

this corresponds to a note which varies between d4 and e4 which is clearly
incorrect.

I thought maybe that I am doing something wrong in the extraction of the
first minimum. Another article I have just read indicates that the minima
represents half the period where the waveform is out of phase thus, the
maxima indicates the period of the waveform and directly relates to the
pitch. At a guess my value of 75 (half the period) which translates to 150
is still wrong.

I am identifying the first minimum by searching for the first change in

Any help will be great!

Thanks

This message was sent using the Comp.DSP web interface on
www.DSPRelated.com
```
```olivers wrote:
> I am identifying the first minimum by searching for the first change in

-1 < 0

--
Jim Thomas            Principal Applications Engineer  Bittware, Inc
jthomas@bittware.com  http://www.bittware.com    (603) 226-0404 x536
Sometimes experience is the only teacher that works - Mike Rosing
```
```Autocorrelation measures the degree of similarity of a signal with a
delayed version of itself.  Therefore you should look not for a
minimum, but for the maximum.

This will often, but not always correspond to the fundamental
frequency; sometimes you'll get a harmonic or sub-harmonic, depending
on the shape of the spectrum.  Whitening the signal by center-clipping
is effective in minimizing this problem.  I don't have a reference
handy, but it's commonly done in speech analysis.

cheers,
jerry

```
```olivers wrote:
> From what I understand the first minimum of an autocorrelation function
> (say 1200 samples with a lag between 0 and 600) will give me the sample
> value which can be directly mapped to frequency and thus pitch.

Where did you find that? I am not aware of any simple relation between
the time-domain autocorrelation function and the pitch of the signal.

> I have tried to extract this minima from my autocorrelation result with
> varied results. I tried using a C (third fret 5th string) on my guitar and
> got a sample value for the first minimum which varied between 68 and 75.
> From my conversion chart
>
(http://grace.evergreen.edu/~arunc/intro_doc/node12.htm#SECTION00092000000000000000)
>
> this corresponds to a note which varies between d4 and e4 which is clearly
> incorrect.

Try to compute the DFT of the autocorrelation function, and see if you
can
find the peak in the spectrum.

> I thought maybe that I am doing something wrong in the extraction of the
> first minimum. Another article I have just read indicates that the minima
> represents half the period where the waveform is out of phase thus, the
> maxima indicates the period of the waveform and directly relates to the
> pitch. At a guess my value of 75 (half the period) which translates to 150
> is still wrong.
>
> I am identifying the first minimum by searching for the first change in
> sign indicating the first zero crossing. Is this how its done?

IF the pitch can be extracted from the time-domain autocorrelation
function (I am not sure it can, but I may be wrong) it would be based
on the peak in the autocorrelation function. For the guitar string,
try to look at the peaks in the power spectrum, i.e. the DFT of the
autocorrelation function.

Rune

```
```Rune Allnor wrote:
> olivers wrote:
>
>>From what I understand the first minimum of an autocorrelation function
>>(say 1200 samples with a lag between 0 and 600) will give me the sample
>>value which can be directly mapped to frequency and thus pitch.
>
>
> Where did you find that? I am not aware of any simple relation between
> the time-domain autocorrelation function and the pitch of the signal.

If the signal does have a sinusoidal component at period T, then when
correlating with the version of the signal shifted by T, there will be
a peak, corresponding to 1/T and all of the multiples (the harmonics).
In fact, when shifted by T/2, there will be a peak with negative value,
provided that there are no components of lower frequency.

to reply to him suggesting that the DFT would be more appropriate for
that.

Computing a DFT using FFT is much *much* faster than computing the
autocorrelation *function*  (in fact, computing an FFT is faster than
computing *one sample* of the autocorrelation function if you compute
the A.C. the straightforward way)

Carlos
--
```
```Carlos Moreno wrote:
> Rune Allnor wrote:
> > olivers wrote:
> >
> >>From what I understand the first minimum of an autocorrelation
function
> >>(say 1200 samples with a lag between 0 and 600) will give me the
sample
> >>value which can be directly mapped to frequency and thus pitch.
> >
> >
> > Where did you find that? I am not aware of any simple relation between
> > the time-domain autocorrelation function and the pitch of the signal.
>
> If the signal does have a sinusoidal component at period T, then when
> correlating with the version of the signal shifted by T, there will be
> a peak, corresponding to 1/T and all of the multiples (the harmonics).
> In fact, when shifted by T/2, there will be a peak with negative value,
> provided that there are no components of lower frequency.

OK, I am sure you are right, provided the signal consists of a single
sinusoidal. If there are more sinusoidals, or noise present...

Rune

```
```Carlos Moreno wrote:
> Rune Allnor wrote:
> >
> > Where did you find that? I am not aware of any simple relation between
> > the time-domain autocorrelation function and the pitch of the signal.
>
> If the signal does have a sinusoidal component at period T, then when
> correlating with the version of the signal shifted by T, there will be
> a peak, corresponding to 1/T and all of the multiples (the harmonics).
> In fact, when shifted by T/2, there will be a peak with negative value,
> provided that there are no components of lower frequency.

there need be no presumption of having a sinusoidal component with
period T.  there need only be a presumption that the signal is periodic
with period T.

if the window of summation is wide enough, the autocorrelation function
can be directly related to the Average Squared Difference Function
(ASDF) which is a pretty straight-forward approach to determining the
pitch or period of a (quasi)periodic signal.

Rx[k] =  mean{|x|^2}   -   1/2 * ASDF(x, k)

where the ASDF comes to a minimum (say at multiples of T), the
autocorrelation becomes maximum.  since the ASDF can never be less than
zero, the autocorrelation can never be greater than the power or
mean{|x|^2}.

--

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."

```
```Rune Allnor wrote:

...

> OK, I am sure you are right, provided the signal consists of a single
> sinusoidal. If there are more sinusoidals, or noise present...

As I understand it, that should be a single _dominant_ sinusoid (and its
harmonics).

Jerry
--
Engineering is the art of making what you want from things you can get.
&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;
```
```Jerry Avins wrote:

> Rune Allnor wrote:
>
>   ...
>
>> OK, I am sure you are right, provided the signal consists of a single
>> sinusoidal. If there are more sinusoidals, or noise present...
>
>
> As I understand it, that should be a single _dominant_ sinusoid (and its
> harmonics).
>
> Jerry

Most instrumental and sung music is quite rich in harmonics -- to the
point where the fundamental cannot be counted on to have the majority of
the energy, or even be there at all (bells, IIRC, have the fundamental
entirely suppressed, yet our brains synthesize it out of the harmonics).

--

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com
```
```Tim Wescott wrote:
> Jerry Avins wrote:
>
>> Rune Allnor wrote:
>>
>>   ...
>>
>>> OK, I am sure you are right, provided the signal consists of a single
>>> sinusoidal. If there are more sinusoidals, or noise present...
>>
>>
>>
>> As I understand it, that should be a single _dominant_ sinusoid (and
>> its harmonics).
>>
>> Jerry
>
>
> Most instrumental and sung music is quite rich in harmonics -- to the
> point where the fundamental cannot be counted on to have the majority of
> the energy, or even be there at all (bells, IIRC, have the fundamental
> entirely suppressed, yet our brains synthesize it out of the harmonics).

Sure, but autocorrelation doesn't necessarily fail to find the missing
fundamental. Imagine (or draw) a square wave from which the fundamental
has been removed. The period of the suppressed fundamental clearly
remains the period of that waveform.

Jerry
--
Engineering is the art of making what you want from things you can get.
&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;
```
```in article 1128485829.779840.257640@f14g2000cwb.googlegroups.com,
rhnlogic@yahoo.com at rhnlogic@yahoo.com wrote on 10/05/2005 00:17:

> robert bristow-johnson wrote:
>> rhnlogic@yahoo.com at rhnlogic@yahoo.com wrote on 10/04/2005 18:02:
>>>
>>> Why, in this prediction method, is the first large peak used, instead
>>> of another slightly larger peak higher in frequency if available?
>>
>> i tried to explain it.  imagine two periodic waveforms (they don't have to
>> be a single sinusoid, just periodic) that are exactly one octave apart.
let
>> the lower frequency waveform be, say, -60 dB as loud as the higher
frequency
>> waveform.  now add them together.  which waveform do you think you are
>> hearing when you hear the sum?
>>
>> what you have is a new periodic waveform that is at the same fundamental
>> frequency as the lower waveform, but all of the odd harmonics are much,
much
>> lower in amplitude than the evens.  if you do autocorrelation, which peak
is
>> bigger, the first peak (corresponding to 1/2 the period) or the second
peak
>> (corresponding to the period)?
>
> Hmmm...  By "first large peak", do you mean the first in an
ascending
> frequency sort, or the first in an ascending period sort?  If the
> latter, then I'm mostly in agreement with you.

well, i guess it's more the latter.  i mean ascending "lag", which is a
measure of time difference between the two signals being correlated.  the
first really good peak corresponds to a lag that is equal to the period of
the periodic tone.  but if there is a very small amplitude subharmonic (so
small in amplitude that you can't hear it) of one octave lower (twice the
period) added, the second peak will be bigger but only slightly.

> In my experiments with resonant sounds, I do not use the chosen
> autocorrelation peak directly, but use it to calculate an integer
> divider for the largest interpolated frequency peak for the pitch
> estimate.  But I haven't yet done experiments to find out which pitch
> would be preferred by a trained ear on any slightly inharmonic tones.

significantly inharmonic tones, i make no claim regarding.  that's another
can of worms.

off to AES show in NYC (it's 4 am).  see you guys there or thereafter.

--

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."

```
```robert bristow-johnson wrote:
> rhnlogic@yahoo.com at rhnlogic@yahoo.com wrote on 10/04/2005 18:02:
> > robert bristow-johnson wrote:
> > ...
> >>> Didier A Depireux wrote:
> >>>
> >>>> the pitch of most sounds, except for alternating click trains
around
> >>>> 200Hz
> >>>> for which there's an ambiguity, can be predicted from the
largest peak
> >>>> of the autocorrelation of the half-wave rectified waveform.
> >
> > Why, in this prediction method, is the waveform half-wave rectified
> > before autocorrelation? (which seems to throw away information...)
>
> i might wonder the same question.
>
> >> ... except for the "octave problem", the pitch of
> >> these sounds can be predicted by the first very large peak of the
> >> autocorrelation, where this "first large peak" is either the
largest peak
> >> or within a certain limit of difference in amplitude from the largest
peak
> >> (other than the peak at the zero lag).
> >
> > Why, in this prediction method, is the first large peak used, instead
> > of another slightly larger peak higher in frequency if available?
>
> i tried to explain it.  imagine two periodic waveforms (they don't have to
> be a single sinusoid, just periodic) that are exactly one octave apart.  let
> the lower frequency waveform be, say, -60 dB as loud as the higher frequency
> waveform.  now add them together.  which waveform do you think you are
> hearing when you hear the sum?
>
> what you have is a new periodic waveform that is at the same fundamental
> frequency as the lower waveform, but all of the odd harmonics are much, much
> lower in amplitude than the evens.  if you do autocorrelation, which peak is
> bigger, the first peak (corresponding to 1/2 the period) or the second peak
> (corresponding to the period)?

Hmmm...  By "first large peak", do you mean the first in an ascending
frequency sort, or the first in an ascending period sort?  If the
latter, then I'm mostly in agreement with you.  If the former, then
I still don't understand.

In my experiments with resonant sounds, I do not use the chosen
autocorrelation peak directly, but use it to calculate an integer
divider for the largest interpolated frequency peak for the pitch
estimate.  But I haven't yet done experiments to find out which pitch
would be preferred by a trained ear on any slightly inharmonic tones.

IMHO. YMMV.
--
rhn A.T nicholson d.O.t C-o-M

```
```in article 1128463351.547524.153790@g47g2000cwa.googlegroups.com,
rhnlogic@yahoo.com at rhnlogic@yahoo.com wrote on 10/04/2005 18:02:

> robert bristow-johnson wrote:
> ...
>>> Didier A Depireux wrote:
>>>
>>>> the pitch of most sounds, except for alternating click trains
around
>>>> 200Hz
>>>> for which there's an ambiguity, can be predicted from the largest
peak
>>>> of the autocorrelation of the half-wave rectified waveform.
>
> Why, in this prediction method, is the waveform half-wave rectified
> before autocorrelation? (which seems to throw away information...)

i might wonder the same question.

>> ... except for the "octave problem", the pitch of
>> these sounds can be predicted by the first very large peak of the
>> autocorrelation, where this "first large peak" is either the
largest peak
>> or within a certain limit of difference in amplitude from the largest peak
>> (other than the peak at the zero lag).
>
> Why, in this prediction method, is the first large peak used, instead
> of another slightly larger peak higher in frequency if available?

i tried to explain it.  imagine two periodic waveforms (they don't have to
be a single sinusoid, just periodic) that are exactly one octave apart.  let
the lower frequency waveform be, say, -60 dB as loud as the higher frequency
waveform.  now add them together.  which waveform do you think you are
hearing when you hear the sum?

what you have is a new periodic waveform that is at the same fundamental
frequency as the lower waveform, but all of the odd harmonics are much, much
lower in amplitude than the evens.  if you do autocorrelation, which peak is
bigger, the first peak (corresponding to 1/2 the period) or the second peak
(corresponding to the period)?

> Thanks.

FWIW.

--

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."

```
```robert bristow-johnson wrote:
...
> > Didier A Depireux wrote:
> >
> >> the pitch of most sounds, except for alternating click trains around
> >> 200Hz
> >> for which there's an ambiguity, can be predicted from the largest
peak
> >> of the autocorrelation of the half-wave rectified waveform.

Why, in this prediction method, is the waveform half-wave rectified
before autocorrelation? (which seems to throw away information...)

> ... except for the "octave problem", the pitch of
> these sounds can be predicted by the first very large peak of the
> autocorrelation, where this "first large peak" is either the largest
peak
> or
> within a certain limit of difference in amplitude from the largest peak
> (other than the peak at the zero lag).

Why, in this prediction method, is the first large peak used, instead
of another slightly larger peak higher in frequency if available?

Thanks.
--
rhn A.T nicholson d.O.t C-o-M

```
```Richard Dobson wrote:
> Didier A. Depireux wrote:
> > As the protocols say, "at confortable levels", i.e. the levels
were
> > randomized, and the loudness of the sounds were perceived to be between
> > about 60 and 70 dB.
>
> I still don't get what your experiment is trying to establish. What is the
point
> of randomizing levels? If you want to eliminate pitch-shift effects due to
SPL,
> you need to ensure delivery levels at the documented neutral point of 60dBSPL.

As far as some real-world implementations are concerned, this is
close to an excessive level of detail for DSP pitch estimation.
In many voice or music measurement or characterization applications,
the microphone is uncalibrated and/or the gain level is unknown (random
cell phone gate-way'd through some VOIP connection, stage mic that's
been moved by the performer and eq'd by the sound-guy, etc.).  A
statistical distribution of possible pitch perceptions in various
generic situations would seem to be more useful than something that
only applies to 60dB +- 0.5dB.

IMHO. YMMV.
--
rhn A.T nicholson d.O.t C-o-M

```
```Didier A. Depireux wrote:

> In comp.dsp Richard Dobson <richarddobson@blueyonder.co.uk> wrote:
>
>>Didier A. Depireux wrote:
..
> As the protocols say, "at confortable levels", i.e. the levels were
> randomized, and the loudness of the sounds were perceived to be between
> about 60 and 70 dB.
>

I still don't get what your experiment is trying to establish. What is the point
of randomizing levels? If you want to eliminate pitch-shift effects due to SPL,
you need to ensure delivery levels at the documented neutral point of 60dBSPL.
And this level should not be left to "perception", but measured at the
listening
position using a proper SPL meter. Randomising level will merely have the effect
of swamping your eperimental data with errors generated by the procedure. With
your SPL range of 10dB,  pitch-shift error will be around +-8Cents, according
the the reference I Cited.

As I see it, the problem you have is that unless the reference 200Hz tone is the
same level as the resultant tone of the cluster (i.e. much lower level than the
cluster itself), pitch-perception aretefacts will arise.
..
> The experiment I described have been performed many times before me, by
> people much better at psychophysics than me. We randomized the levels of the
> tone complex and the pure tone, or course. We also played the sounds on high
> quality headphones. Any non-linearity in your speaker will induce a pitch at
> 200Hz, since the envelope of the sound itself is periodic with a period of
> 200Hz, quite different from the pitch of the sound.
..

Well, I assume we are all assuming pro-quality equipment here! It would take a
pretty ropey speaker to generate such an artefact, whereas it is a given for the
highly non-linear standard human ear. I have Tannoy dual-concentrics here, and
AKG K270 phones. Not the most expensive, but very good!

The main argument I have with all this however is simply: those three tones do
not have "a pitch"! Who/what says they should? They are still low enough
in
freqency, and far enough apart, to be clearly audible as three distinct high
tones, though too inharmonic (out of tune) to fit any 12tone ET scale (three
erratic piccolos?). I suppose to a musically untrained ear they might seem to
blend into some quasi-bell-like mass, which would be more likely of course if
they are given a common exponential decay envelope etc. The 200Hz resultant tone
is of course inharmonic to all three high tones; this will nicely reinforce the
bell suggestion, and thus persuade subjects that the sound must have a pitch.
Just deliver a single 200Hz tone successively at 60dBSPL and 70dBSPL and you
will hear a pitch difference; that then defines the maximum accuracy of any
experimental results.

Richard Dobson
```
```in article 1128108957.941068.99840@g47g2000cwa.googlegroups.com, fizteh89 at
dt@soundmathtech.com wrote on 09/30/2005 15:35:

> Didier A Depireux wrote:
>
>> the pitch of most sounds, except for alternating click trains around 200Hz
>> for which there's an ambiguity, can be predicted from the largest peak of
>> the autocorrelation of the half-wave rectified waveform.
>
> This statement of yours is pure nonsense.

i dunno if it is *pure* nonsense, but the statement is not correct for
generalized musical tones.  except for the "octave problem", the pitch of
these sounds can be predicted by the first very large peak of the
autocorrelation, where this "first large peak" is either the largest peak
or
within a certain limit of difference in amplitude from the largest peak
(other than the peak at the zero lag).

the "octave problem" can be created when two synchronous waveforms,
exactly
one octave apart, are added.  if the lower frequency tone is extremely small
relative to the higher tone, no one will hear it and only the higher tone
will determine what we *think* the pitch is.  but, strictly mathematically,
the period of the combined tones is at the inaudible lower tone and that is
where the highest peak will be unless one does something about it in the
PDA.

--

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."

```
```In comp.dsp fizteh89 <dt@soundmathtech.com> wrote:
> Didier A Depireux wrote:

> >the pitch of most sounds, except for alternating click trains around 200Hz
> >for which there's an ambiguity, can be predicted from the largest peak of
> >the autocorrelation of the half-wave rectified waveform.

> This statement of yours is pure nonsense.

> One can easily have two different periodic waveforms with identical
> upper (positive) halfs and different lower (negative) halfs, where the
> period of the first waveform is, for example, twice the period of the
> second waveform, leading to octave difference in pitch perception, but,
> according to your statement above, they should have identical pitches.

the polarity of parts of a waveform, so that for instance the pitch of a
sequence of clicks equals its frequency, but if you alternate the polarity
of the clicks (+-+-) you get a change by an octave for certain frequency
range.

I came to this newsgroup looking for answers about non-linear filtering, and
got distracted when I saw the word "pitch" in the title of one of the

Didier

--
Didier A Depireux         ddepi001@umaryland.edu  didier@isr.umd.edu
20 Penn Str - S218E   http://neurobiology.umaryland.edu/depireux.htm
Anatomy and Neurobiology                   Phone: 410-706-1272 (lab)
University of Maryland                                   -1273 (off)
Baltimore MD 21201 USA                           Fax: 1-410-706-2512
```
```In comp.dsp Richard Dobson <richarddobson@blueyonder.co.uk> wrote:
> Didier A. Depireux wrote:

> I understand what you are at here. But:
> What levels are you playing these two sounds at?  And for how long?

As the protocols say, "at confortable levels", i.e. the levels were
randomized, and the loudness of the sounds were perceived to be between

> proposition is that the pitch rise you are measuring is really an indirect
> measure of the perceived flattening of the single 200 tone. I suggest therefore

The experiment I described have been performed many times before me, by
people much better at psychophysics than me. We randomized the levels of the
tone complex and the pure tone, or course. We also played the sounds on high
quality headphones. Any non-linearity in your speaker will induce a pitch at
200Hz, since the envelope of the sound itself is periodic with a period of
200Hz, quite different from the pitch of the sound.

> will be heard by most people (including me) to have something like a
> quarter-tone difference. But it is the louder tone being perceived as lower,
> rather than the quieter tone being perceived as higher.  I was under the
> impression that this is a well-documented phenomenon.

Yes, we know about these things. I don't know that usenet is the best place
to get peer-review, so I definitely didn't give protocol details here.

Didier

--
Didier A Depireux         ddepi001@umaryland.edu  didier@isr.umd.edu
20 Penn Str - S218E   http://neurobiology.umaryland.edu/depireux.htm
Anatomy and Neurobiology                   Phone: 410-706-1272 (lab)
University of Maryland                                   -1273 (off)
Baltimore MD 21201 USA                           Fax: 1-410-706-2512
```
```rhnlogic@yahoo.com wrote:
..
>>I have just done a test with Csound, and it matters a great deal how
>>loud the single 200Hz tone is.
>
>
> The experiment wasn't about whether 200 Hz was a match, but whether
> (at some volume level or duration) something around 218 Hz sounded
> like a better match to most people.
>
> It's the difference between how the brain evaluates an exact match
> in very high harmonics, and an approximate match of assumed much lower
> harmonics.

But without knowledge of the SPL at which these sounds are presented, this
"experiment" tells us virtually nothing, other than proving what is
known: that pitch perception changes with SPL.  This really ~must~ be taken into
account when performing such tests, otherwise the test is invalidated and,
worse, may be used as the basis for an unsound theory of composite pitch
perception. In short, "how the brain evaluates" is affected by SPL.
"At some
volume level or duration" is far too vague and lax to be the basis for a
scientific conclusion.

I now have references for the pitch shift phenomenon:

"Acoustics and Pshycoacoustics", D.M. Howard,J. Angus, Focal Press 2001
(diagram and text, Page 135)
cites source as:

"The Science of Sound", T.D. Rossing, Addison Wesley, 1989.

In brief: only at 60dBSPL are all pitches heard without distortion.  At 90dBSPL
a 200 Hz tone is heard about 20Cents lower (my "almost a quater-tone"),
and a
4KHz tone is heard about 20 Cents higher.

At lower levels, e.g. 40dBSPL, the same 200Hz tone is heard  about 12Cents
higher, and the 4KHz tone about 25Cents lower. The printed diagram indicates
that a tone of 2KHz is heard without shift, al all SPLs. For everything below
and above, there is a "bow-tie" pattern centred on 0Cents shift, at
60dBSPL.
This corresponds to a "normal" speech listening level. That is, it is not
very
loud at all, and it would be very easy to present sine tones well above 60dBSPL;
this is what I suspect is the case here. We do not even know whether the sounds
are presented over speakers or headphones.

Now as far as I can tell, this experiment has only been done with direct tones,
not with resultant tones from a sinusoid cluster, so there is very likely new
research that can usefully be done here (and highly relevant to orchestral wind
players such as myself!); but without precise consideration of the delivery SPL,
any such experiment is IMO fatally flawed. Once we know that these experiments
have been conducted at a delivery level of 60dBSPL, we can start to draw valid
conclusions from the data.

Richard Dobson
```