comp.dsp | Pitch Estimation using Autocorrelation| page 12

Reply by fizteh89 ●September 30, 20052005-09-30

Didier A Depireux wrote:

>the pitch of most sounds, except for alternating click trains around 200Hz
>for which there's an ambiguity, can be predicted from the largest peak of
>the autocorrelation of the half-wave rectified waveform.

This statement of yours is pure nonsense.

One can easily have two different periodic waveforms with identical
upper (positive) halfs and different lower (negative) halfs, where the
period of the first waveform is, for example, twice the period of the
second waveform, leading to octave difference in pitch perception, but,
according to your statement above, they should have identical pitches.

Reply by Richard Dobson ●September 30, 20052005-09-30

Didier A. Depireux wrote:

> I didn't assert anything from theoretical arguments, I did the experiment
> and I found 2 ambiguous pitches, as had many others before me. Just do it
> yourself, but playing a pure tone at 200Hz in between repetition of the
> (1500,1700,1900)Hz complex,  and then inserting 212.8Hz instead. You will
> hear a much better match with 212.8 if you are like 90% of psychophysics
> subjects. I can't really argue with reality. 
> 

I understand what you are at here. But:
What levels are you playing these two sounds at?  And for how long?

I have just done a test with Csound, and it matters a great deal how loud the 
single 200Hz tone is. If it is the same peak amplitude as the mix of the three, 
it is pretty loud, and this can easily induce the phenomenon of perceived 
flattening (which i mentioned in a previous post). Indeed, if I listen to a 
plain 200Hz tone for about 5 seconds, ear fatigue creates the illusion of a 
slight fall in pitch towards the end!

However, if I scale the 200Hz tone down so that it is about 1/3d the level of 
the sum of three hf sines:

(Csound score:)
;  time  dur  amp   freq
i1   0   5 [0.5/3] 200  ; 200 sine, amplitude 0.5 / 3 where 1 = 0dBFS
i1   5   5 [0.5/3] 1500 ; 3 sines summed, total amp 0.5
i1   5   5 [0.5/3] 1700
i1   5   5 [0.5/3] 1900

Then the difference between the solo sine and the perceived resultant tone is 
for me pretty much the same. With a loud 200 tone, with the flattening I have 
described, the following resultant tone will indeed sound higher. But my 
proposition is that the pitch rise you are measuring is really an indirect 
measure of the perceived flattening of the single 200 tone. I suggest therefore 
that your experiment include as variables the duration of the tones, and 
relative sound pressure levels; I would be most surprised if this does not show 
up some differences such as I have described.

For example, just playing a loud 200 tone followed by a quiet 200Hz tone:

i1 0 5 0.5 200
i1 5 5 [0.5/3] 200 ; about 10dB lower

will be heard by most people (including me) to have something like a 
quarter-tone difference. But it is the louder tone being perceived as lower, 
rather than the quieter tone being perceived as higher.  I was under the 
impression that this is a well-documented phenomenon.

What I  would also very much like to know is whether a subject exposed to high 
SPLs for a long time (heavy rocker, etc) exhibits the same perception, or 
whether ear damage of that kind affects it.

Richard Dobson

Reply by rhnl...@yahoo.com ●September 30, 20052005-09-30

Richard Dobson wrote:
> Didier A. Depireux wrote:
> > I didn't assert anything from theoretical arguments, I did the experiment
> > and I found 2 ambiguous pitches, as had many others before me. Just do it
> > yourself, but playing a pure tone at 200Hz in between repetition of the
> > (1500,1700,1900)Hz complex,  and then inserting 212.8Hz instead. You will
> > hear a much better match with 212.8 if you are like 90% of psychophysics
> > subjects. I can't really argue with reality.
> >
>
> I understand what you are at here. But:
> What levels are you playing these two sounds at?  And for how long?
>
> I have just done a test with Csound, and it matters a great deal how
> loud the single 200Hz tone is.

The experiment wasn't about whether 200 Hz was a match, but whether
(at some volume level or duration) something around 218 Hz sounded
like a better match to most people.

It's the difference between how the brain evaluates an exact match
in very high harmonics, and an approximate match of assumed much lower
harmonics.


IMHO. YMMV.
-- 
rhn A.T nicholson d.O.t C-o-M

Reply by Richard Dobson ●October 1, 20052005-10-01

rhnlogic@yahoo.com wrote:
..
>>I have just done a test with Csound, and it matters a great deal how
>>loud the single 200Hz tone is.
> 
> 
> The experiment wasn't about whether 200 Hz was a match, but whether
> (at some volume level or duration) something around 218 Hz sounded
> like a better match to most people.
> 
> It's the difference between how the brain evaluates an exact match
> in very high harmonics, and an approximate match of assumed much lower
> harmonics.

But without knowledge of the SPL at which these sounds are presented, this 
"experiment" tells us virtually nothing, other than proving what is already 
known: that pitch perception changes with SPL.  This really ~must~ be taken into 
account when performing such tests, otherwise the test is invalidated and, 
worse, may be used as the basis for an unsound theory of composite pitch 
perception. In short, "how the brain evaluates" is affected by SPL. "At some 
volume level or duration" is far too vague and lax to be the basis for a 
scientific conclusion.

I now have references for the pitch shift phenomenon:

"Acoustics and Pshycoacoustics", D.M. Howard,J. Angus, Focal Press 2001
   (diagram and text, Page 135)
   cites source as:

"The Science of Sound", T.D. Rossing, Addison Wesley, 1989.

In brief: only at 60dBSPL are all pitches heard without distortion.  At 90dBSPL 
a 200 Hz tone is heard about 20Cents lower (my "almost a quater-tone"),  and a 
4KHz tone is heard about 20 Cents higher.

At lower levels, e.g. 40dBSPL, the same 200Hz tone is heard  about 12Cents 
higher, and the 4KHz tone about 25Cents lower. The printed diagram indicates 
that a tone of 2KHz is heard without shift, al all SPLs. For everything below 
and above, there is a "bow-tie" pattern centred on 0Cents shift, at 60dBSPL. 
This corresponds to a "normal" speech listening level. That is, it is not very 
loud at all, and it would be very easy to present sine tones well above 60dBSPL; 
this is what I suspect is the case here. We do not even know whether the sounds 
are presented over speakers or headphones.

Now as far as I can tell, this experiment has only been done with direct tones, 
not with resultant tones from a sinusoid cluster, so there is very likely new 
research that can usefully be done here (and highly relevant to orchestral wind 
players such as myself!); but without precise consideration of the delivery SPL, 
any such experiment is IMO fatally flawed. Once we know that these experiments 
have been conducted at a delivery level of 60dBSPL, we can start to draw valid 
conclusions from the data.

Richard Dobson

Reply by Didier A. Depireux ●October 3, 20052005-10-03

In comp.dsp Richard Dobson <richarddobson@blueyonder.co.uk> wrote:
> Didier A. Depireux wrote:

> I understand what you are at here. But:
> What levels are you playing these two sounds at?  And for how long?

As the protocols say, "at confortable levels", i.e. the levels were
randomized, and the loudness of the sounds were perceived to be between
about 60 and 70 dB. 

> proposition is that the pitch rise you are measuring is really an indirect 
> measure of the perceived flattening of the single 200 tone. I suggest therefore 

The experiment I described have been performed many times before me, by
people much better at psychophysics than me. We randomized the levels of the 
tone complex and the pure tone, or course. We also played the sounds on high
quality headphones. Any non-linearity in your speaker will induce a pitch at
200Hz, since the envelope of the sound itself is periodic with a period of
200Hz, quite different from the pitch of the sound. 

> will be heard by most people (including me) to have something like a 
> quarter-tone difference. But it is the louder tone being perceived as lower, 
> rather than the quieter tone being perceived as higher.  I was under the 
> impression that this is a well-documented phenomenon.

Yes, we know about these things. I don't know that usenet is the best place
to get peer-review, so I definitely didn't give protocol details here. 

						Didier

-- 
Didier A Depireux         ddepi001@umaryland.edu  didier@isr.umd.edu
20 Penn Str - S218E   http://neurobiology.umaryland.edu/depireux.htm
Anatomy and Neurobiology                   Phone: 410-706-1272 (lab)
University of Maryland                                   -1273 (off)
Baltimore MD 21201 USA                           Fax: 1-410-706-2512

Reply by Didier A. Depireux ●October 3, 20052005-10-03

In comp.dsp fizteh89 <dt@soundmathtech.com> wrote:
> Didier A Depireux wrote:

> >the pitch of most sounds, except for alternating click trains around 200Hz
> >for which there's an ambiguity, can be predicted from the largest peak of
> >the autocorrelation of the half-wave rectified waveform.

> This statement of yours is pure nonsense.

> One can easily have two different periodic waveforms with identical
> upper (positive) halfs and different lower (negative) halfs, where the
> period of the first waveform is, for example, twice the period of the
> second waveform, leading to octave difference in pitch perception, but,
> according to your statement above, they should have identical pitches.

Let me think about this. As you know, you can get funny thing by changing
the polarity of parts of a waveform, so that for instance the pitch of a
sequence of clicks equals its frequency, but if you alternate the polarity
of the clicks (+-+-) you get a change by an octave for certain frequency
range. 

I came to this newsgroup looking for answers about non-linear filtering, and
got distracted when I saw the word "pitch" in the title of one of the
threads!

						Didier

-- 
Didier A Depireux         ddepi001@umaryland.edu  didier@isr.umd.edu
20 Penn Str - S218E   http://neurobiology.umaryland.edu/depireux.htm
Anatomy and Neurobiology                   Phone: 410-706-1272 (lab)
University of Maryland                                   -1273 (off)
Baltimore MD 21201 USA                           Fax: 1-410-706-2512

Reply by robert bristow-johnson ●October 3, 20052005-10-03

in article 1128108957.941068.99840@g47g2000cwa.googlegroups.com, fizteh89 at
dt@soundmathtech.com wrote on 09/30/2005 15:35:

> Didier A Depireux wrote:
> 
>> the pitch of most sounds, except for alternating click trains around 200Hz
>> for which there's an ambiguity, can be predicted from the largest peak of
>> the autocorrelation of the half-wave rectified waveform.
> 
> This statement of yours is pure nonsense.

i dunno if it is *pure* nonsense, but the statement is not correct for
generalized musical tones.  except for the "octave problem", the pitch of
these sounds can be predicted by the first very large peak of the
autocorrelation, where this "first large peak" is either the largest peak or
within a certain limit of difference in amplitude from the largest peak
(other than the peak at the zero lag).

the "octave problem" can be created when two synchronous waveforms, exactly
one octave apart, are added.  if the lower frequency tone is extremely small
relative to the higher tone, no one will hear it and only the higher tone
will determine what we *think* the pitch is.  but, strictly mathematically,
the period of the combined tones is at the inaudible lower tone and that is
where the highest peak will be unless one does something about it in the
PDA.

-- 

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."

Reply by Richard Dobson ●October 3, 20052005-10-03

Didier A. Depireux wrote:

> In comp.dsp Richard Dobson <richarddobson@blueyonder.co.uk> wrote:
> 
>>Didier A. Depireux wrote:
..
> As the protocols say, "at confortable levels", i.e. the levels were
> randomized, and the loudness of the sounds were perceived to be between
> about 60 and 70 dB. 
> 

I still don't get what your experiment is trying to establish. What is the point 
of randomizing levels? If you want to eliminate pitch-shift effects due to SPL, 
you need to ensure delivery levels at the documented neutral point of 60dBSPL. 
And this level should not be left to "perception", but measured at the listening 
position using a proper SPL meter. Randomising level will merely have the effect 
of swamping your eperimental data with errors generated by the procedure. With 
your SPL range of 10dB,  pitch-shift error will be around +-8Cents, according 
the the reference I Cited.

As I see it, the problem you have is that unless the reference 200Hz tone is the 
same level as the resultant tone of the cluster (i.e. much lower level than the 
cluster itself), pitch-perception aretefacts will arise.
..
> The experiment I described have been performed many times before me, by
> people much better at psychophysics than me. We randomized the levels of the 
> tone complex and the pure tone, or course. We also played the sounds on high
> quality headphones. Any non-linearity in your speaker will induce a pitch at
> 200Hz, since the envelope of the sound itself is periodic with a period of
> 200Hz, quite different from the pitch of the sound. 
..

Well, I assume we are all assuming pro-quality equipment here! It would take a 
pretty ropey speaker to generate such an artefact, whereas it is a given for the 
highly non-linear standard human ear. I have Tannoy dual-concentrics here, and 
AKG K270 phones. Not the most expensive, but very good!

The main argument I have with all this however is simply: those three tones do 
not have "a pitch"! Who/what says they should? They are still low enough in 
freqency, and far enough apart, to be clearly audible as three distinct high 
tones, though too inharmonic (out of tune) to fit any 12tone ET scale (three 
erratic piccolos?). I suppose to a musically untrained ear they might seem to 
blend into some quasi-bell-like mass, which would be more likely of course if 
they are given a common exponential decay envelope etc. The 200Hz resultant tone 
is of course inharmonic to all three high tones; this will nicely reinforce the 
bell suggestion, and thus persuade subjects that the sound must have a pitch.
Just deliver a single 200Hz tone successively at 60dBSPL and 70dBSPL and you 
will hear a pitch difference; that then defines the maximum accuracy of any 
experimental results.

Richard Dobson

Reply by rhnl...@yahoo.com ●October 3, 20052005-10-03

Richard Dobson wrote:
> Didier A. Depireux wrote:
> > As the protocols say, "at confortable levels", i.e. the levels were
> > randomized, and the loudness of the sounds were perceived to be between
> > about 60 and 70 dB.
>
> I still don't get what your experiment is trying to establish. What is the point
> of randomizing levels? If you want to eliminate pitch-shift effects due to SPL,
> you need to ensure delivery levels at the documented neutral point of 60dBSPL.

As far as some real-world implementations are concerned, this is
close to an excessive level of detail for DSP pitch estimation.
In many voice or music measurement or characterization applications,
the microphone is uncalibrated and/or the gain level is unknown (random
cell phone gate-way'd through some VOIP connection, stage mic that's
been moved by the performer and eq'd by the sound-guy, etc.).  A
statistical distribution of possible pitch perceptions in various
generic situations would seem to be more useful than something that
only applies to 60dB +- 0.5dB.

IMHO. YMMV.
-- 
rhn A.T nicholson d.O.t C-o-M

Reply by rhnl...@yahoo.com ●October 4, 20052005-10-04

robert bristow-johnson wrote:
...
> > Didier A Depireux wrote:
> >
> >> the pitch of most sounds, except for alternating click trains around
> >> 200Hz
> >> for which there's an ambiguity, can be predicted from the largest peak
> >> of the autocorrelation of the half-wave rectified waveform.

Why, in this prediction method, is the waveform half-wave rectified
before autocorrelation? (which seems to throw away information...)

> ... except for the "octave problem", the pitch of
> these sounds can be predicted by the first very large peak of the
> autocorrelation, where this "first large peak" is either the largest peak
> or
> within a certain limit of difference in amplitude from the largest peak
> (other than the peak at the zero lag).

Why, in this prediction method, is the first large peak used, instead
of another slightly larger peak higher in frequency if available?


Thanks.
-- 
rhn A.T nicholson d.O.t C-o-M

Previous 10 111213 Next

Pitch Estimation using Autocorrelation

Sign in

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group