comp.dsp | Pitch Estimation using Autocorrelation| page 3

Reply by ●September 9, 20052005-09-09

fizteh89 wrote:
> Well, well, well...
>
> Nothing unusual... comp.dsp "experts" discussing something they
> don't have a clue about and giving wrong answers to a general
> public...
> I was going to skip this discussion but just couldn't resist...
>
> Just where did you guys go to school?
> Do you know anything other than DFT?
> (I guess RBJ also knows ASDF).
>
> What the heck are you talking about?

Dear Dmitry!
Your nickname "fiztech89" gives me an idea that you have graduated
school and university in Russia. But this fact doesn't give you
any advantage to discuss in the incorrect manner. The people taking
part at this discussion are staying at their points of view. And it
doesn't matter where each of them has graduated school or university.

My impression of Moscow Institute of Physics and Technology
(Fiztech) is: it's too much theoretical and it's pretty far from
real engineering.
In my opinion the goal of engineering is solving problems.
However it's not quite clear for me what kind of problem does the
person initiated this discussion wants to solve.
I can just suppose some.
1) That person needs a music transcription tool for his or her music
exercises.
2) He or she wants to develop his (her) music transcription software.
3) He or she needs to develop and to test the period estimator for
some other purpose than music transcription. It can be student work;
or it can be medical or industrial application etc.

In regard to music transcription a note of music is not necessary to
be related to pitch. At
http://www.mrc-cbu.cam.ac.uk/cnbh/web2005/teaching/sounds_movies/Pitchchroma&height.htm
you may find an example of playing melody with fixed pitch.

Although it intuitively seems everyone talks about the same thing
but really that "thing" depends on the point of view. "Thing" is
pitch. Pitch as someone wrote is what we hear.
What we hear, Dmitry?
Everyone at this discussion is correct from his point of view.
Those points of view are different. Sorry Dmitry if you don't
understand it. 

Truly Yours,
Vladimir Malakhov.

Reply by ●September 9, 20052005-09-09

fizteh89 wrote:
> Well, well, well...
>
> Nothing unusual... comp.dsp "experts" discussing something they
> don't have a clue about and giving wrong answers to a general
> public...
> I was going to skip this discussion but just couldn't resist...
>
> Just where did you guys go to school?
> Do you know anything other than DFT?
> (I guess RBJ also knows ASDF).
>
> What the heck are you talking about?

Dear Dmitry!
Your nickname "fiztech89" gives me an idea that you have graduated
school and university in Russia. But this fact doesn't give you
any advantage to discuss in the incorrect manner. The people taking
part at this discussion are staying at their points of view. And it
doesn't matter where each of them has graduated school or university.

My impression of Moscow Institute of Physics and Technology
(Fiztech) is: it's too much theoretical and it's pretty far from
real engineering.
In my opinion the goal of engineering is solving problems.
However it's not quite clear for me what kind of problem does the
person initiated this discussion wants to solve.
I can just suppose some.
1) That person needs a music transcription tool for his or her music
exercises.
2) He or she wants to develop his (her) music transcription software.
3) He or she needs to develop and to test the period estimator for
some other purpose than music transcription. It can be student work;
or it can be medical or industrial application etc.

In regard to music transcription a note of music is not necessary to
be related to pitch. At
http://www.mrc-cbu.cam.ac.uk/cnbh/web2005/teaching/sounds_movies/Pitchchroma&height.htm
you may find an example of playing melody with fixed pitch.

Although it intuitively seems everyone talks about the same thing
but really that "thing" depends on the point of view. "Thing" is
pitch. Pitch as someone wrote is what we hear.
What we hear, Dmitry?
Everyone at this discussion is correct from his point of view.
Those points of view are different. Sorry Dmitry if you don't
understand it. 

Truly Yours,
Vladimir Malakhov.

Reply by robert bristow-johnson ●September 9, 20052005-09-09

hi Dmitry,

in article 1126275173.443343.84600@g47g2000cwa.googlegroups.com, fizteh89 at
dt@soundmathtech.com wrote on 09/09/2005 10:12:

> Robert, my comment wasn't aimed at you personally - I am completely
> OK with your comments, except, maybe, for a few misconceptions of yours
> and a strange attachment to ASDF (just joking).

on today's DSPs, it's cheaper than AMDF and works approximately the same
way.  25 years ago, when it took 50+ machine cycles to do a multiplication,
you might say i have a strange attachment to AMDF because, at that time,
abs() was cheaper than squaring.

> But comments made by some other people just went astray...
> 
>> i said that was some kind of glorified AMDF method, you said it wasn't.

i reviewed the ICASSP paper again and, correcting the notation (except i
refuse to call the unit step function "H(x)"), you are calculating


                                m-1
    f(x, k) = SUM{ u( r - sqrt( SUM{ (x[i+j*d] - x[i+k+j*d])^2 } ) ) }
               i                j=0


which is equivalent to

                            m-1
    f(x, k) = SUM{ u( r^2 - SUM{ (x[i+j*d] - x[i+k+j*d])^2 } ) }
               i            j=0


where u() is the unit step function.  you suggest d=12, m=3, and r = 0.15
relative to the peak |x|.  essentially you are counting the number of
occurrences that


     m-1
     SUM{ (x[i+j*d] - x[i+k+j*d])^2 }  <  r^2
     j=0


for a particular value of k.  the more the number of occurrences that the
sum of squares of differences is less than r^2, the better the fit.  this is
a form of the ASDF.  (sorry for saying it was a glorified AMDF, but my
position is that the ASDF is a sorta modified AMDF and also that
autocorrelation is equivalent to ASDF, but flipped upside-down.)

>> i also said that the non-linear function actually destroyed information
>> (since it is not invertable) which means multiple waveforms can map to
>> identical results.  specifically, that achille's heel can be exploited
>> by a demented waveform to fool your algorithm.
> 
> This is one of your misconceptions. For periodicity/pitch detection one
> needs to lose as much information unrelated to signal periodicity as
> possible.

it's no misconception of mine.  it is in the very definition of the waveform
x(t) or x[i] that the property of periodicity is derived in the first place
and the parameter of the period is estimated.  changing that waveform
irreversibly (using a non-invertable operator) throws away information.
that information could be the very information that differentiates two
waveforms that are an octave apart in pitch, but appear to be the same after
application of the non-invertable operator.

e.g. :

> I am quoting from Rabiner & Schafer's "Digital Processing of Speech
> Signals" (4.8 sub-chapter):
> "One of the major limitations of the autocorrelation representation
> is that in a sense it retains too much of the information in the speech
> signal... As a result... autocorrelation function has many peaks... "
> Then they go on to describe a center-clipping technique, which was
> specifically proposed to lose information in speech signal - a
> noninvertible transformation.

it's a good example and one that makes my point.  the center-clipping can
lose information that is needed to differentiate two different waveforms
that are one octave apart, yet appear identical after the center-clipping.

imagine a sine wave at 500 Hz.  waveform repeats every 2 ms.  even after
zero-clipping the waveform repeats every 2 ms.  so any decent PDA should
identify the period to be 2 ms and the fundamental frequency to be 500 Hz.

the center-clipping will lose information around the zero crossings that
happen every 1 ms.  now suppose i give you a waveform, just like the 500 Hz
sine but where there are two little positive "blips" or pulses added to the
waveform at two adjacent zero crossings and then two little negative blips
added at the following two zero crossings.  then the whole thing repeats
forever.  this waveform repeats every 4 ms and is a 250 Hz waveform.  but if
the zero crossing blips are small enough to be clipped in the center-
clipping process, that differentiating information is lost and the two
post-clipped waveforms (from this one and from the pure sine) will look
identical yet the fundamental frequency of the original two waveforms are
one octave apart.

now the way your PDA loses information is more sophisticated, but that
simply means that one has to be more sophisticated in coming up with two
different waveforms (one octave apart) that will result in identical
histograms.  if the period estimation is taken solely from the histogram
(which i think is the point of your PDA), then there is no way to
differentiate between these two waveforms that are one octave apart.



>> i could not get the MATLAB demo to work on my Mac.  i need source and i
>> doubt you're willing to give that up.
> 
> Sorry, I am between Unix and Windoze, no Macs around...
> But didn't I post some Matlab source code on comp.dsp?

i can't claim that i was paying attention at the time.  could you repost it
or find it at Google Groups?

> It's the same code I gave to PTO in provisional application, so you can
> be sure it works, maybe just not as well as some people would prefer.

i'd be happy to try it out.

> Sorry again... not giving out commercial-quality code at this time: I
> think I already gave out too much ...

that's the risk you take with patenting/publishing.  you are probably
familiar with the IVL and Brian Gibson's patents.  same risk.

-- 

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."

Reply by fizteh89 ●September 9, 20052005-09-09

>you are probably familiar with the IVL and Brian Gibson's patents.

>From US patent 4,688,464:

1.	An apparatus, for determining the pitch of a substantially periodic
audio input signal having one or more sinusoidal components forming a
series of peaks of a given polarity that are separated by at least one
peak of opposite polarity and that define at least one cycle of said
signal, comprising:

a first means for producing a plurality of overlapping, sample timing
intervals from a portion of said input signal;

a second means for producing an indication of the period of said input
signal from said plurality of sample timing intervals; and

a third means for converting said indication of said period of said
input signal into a determination of the pitch of said input signal.

Are you talking about this?
To me, this pretty much looks like an invalid patent claim. There is no
substance to it: it just mentions a few common sense operations, unless
there is some hidden meaning to some of these words. But I don't want
to pass a judgment on the patent as a whole, just this particular first
claim.
Talking about patent quality problems...

Reply by Chip Wood ●September 9, 20052005-09-09

Ah, now you are mixing the objective, measurable by machine,
fundamental frequency with the perception, human-measurable
only, pitch.  Please don't do that.  There are many Ph.D
theses that look at the relationship between the two, which
is considerable, but not a one-to-one mapping in
non-sinusoid signals.  I have even tricked the human sensing
pitch, using sinusoidal frequencies with different
modulation and phase.

Since "pitch" is shorter and easier to say than "fundamental
frequency", the term "pitch estimate" came into vogue, but
what you are really doing is "fundamental frequency
estimation".  Experienced engineers do the translation
automatically, but some novices can be confused.

-- 
Chip Wood


"robert bristow-johnson" <rbj@audioimagination.com> wrote in
message news:BF460B15.A3EF%

> that's usually the bugger.  if it's perfectly periodic,
pitch detection is
> pretty easy with AMDF, ASDF, autocorrelation.  but there
are still problems
> with PDAs and human perception.  what if a stong 200 Hz
waveform has a weak
> (say down by 60 dB) 100 Hz waveform added to it?
mathematically, it's 100
> Hz, but it sounds like 200 Hz.
>

Reply by robert bristow-johnson ●September 9, 20052005-09-09

in article 1126291325.830575.238960@g47g2000cwa.googlegroups.com, fizteh89
at dt@soundmathtech.com wrote on 09/09/2005 14:42:

>> you are probably familiar with the IVL and Brian Gibson's patents.
> 
>> From US patent 4,688,464:
> 
> 1.    An apparatus, for determining the pitch of a substantially periodic
> audio input signal having one or more sinusoidal components forming a
> series of peaks of a given polarity that are separated by at least one
> peak of opposite polarity and that define at least one cycle of said
> signal, comprising:
> 
> a first means for producing a plurality of overlapping, sample timing
> intervals from a portion of said input signal;
> 
> a second means for producing an indication of the period of said input
> signal from said plurality of sample timing intervals; and
> 
> a third means for converting said indication of said period of said
> input signal into a determination of the pitch of said input signal.
> 
> Are you talking about this?

yup.

there are many other (later) patents.  try an advanced search with an/IVL in
the search field.

> To me, this pretty much looks like an invalid patent claim. There is no
> substance to it:

i'll bet their lawyers are bigger than your lawyers.

> it just mentions a few common sense operations, unless
> there is some hidden meaning to some of these words. But I don't want
> to pass a judgment on the patent as a whole, just this particular first
> claim.
> Talking about patent quality problems...

that's a whole 'nother discussion.  no doubt, i've felt that IVL has
patented some stuff that doesn't pass the "obviousness" test or even the
prior art test.  but there is evidence of them locking horns with Mark of
the Unicorn and winning.  unfortunately there is some advice i gave to MOTU
that i thought would help, yet IVL is listing it in their list of references
(pat. #6,336,092), so i s'pose that means they are pretty confident that
they have disposed of it.  i still believe that IVL is possibly extracting
royalties for stuff that is either obvious or prior art or both (kinda like
Bill Gates patenting binary numbers and arithmetic), but i wouldn't want to
take them on.

-- 

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."

Reply by robert bristow-johnson ●September 9, 20052005-09-09

in article dfsldd$f3d$1@avnika.corp.mot.com, Chip Wood at
chip.wood@motorola.com wrote on 09/09/2005 14:48:

> Ah, now you are mixing the objective, measurable by machine,
> fundamental frequency with the perception, human-measurable
> only, pitch.  Please don't do that.

did you see my response to rhn?:

in article BF467255.A41C%rbj@audioimagination.com, robert bristow-johnson at
rbj@audioimagination.com wrote on 09/08/2005 22:57:

> in article 1126216435.080023.89540@g43g2000cwa.googlegroups.com,
> rhnlogic@yahoo.com at rhnlogic@yahoo.com wrote on 09/08/2005 17:53:
> 
>> To me, the term "pitch" refers to a phenomena of human perception.
> 
> that is true but for a large class of musical sounds, mostly "tones", the
> "pitch" of the note or tone (measured in octaves) is the base 2 log of the
> fundamental frequency relative to a standard frequency.
> 
> but not all musical sounds are these nice quasi-periodic tones.  getting the
> pitch of, say, a recorded belch or fart might be more difficult for the DSP
> than for the human ear.



in article dfsldd$f3d$1@avnika.corp.mot.com, Chip Wood at
chip.wood@motorola.com wrote on 09/09/2005 14:48:

>  There are many Ph.D
> theses that look at the relationship between the two, which
> is considerable, but not a one-to-one mapping in
> non-sinusoid signals.

if it's a "tonal" (perhaps a better technical term would be
"quasi-periodic") musical note, there is a very large correlation between
fundamental frequency of a waveform and perceived pitch unless all of the
lower odd harmonics of the waveform are at a very low relative energy.
adding a very weak 220 Hz waveform to a 440 Hz waveform will sound like A
above middle C (440) but, mathematically, it is a 220 Hz waveform.

>  I have even tricked the human sensing
> pitch, using sinusoidal frequencies with different
> modulation and phase.

was the result quasi-periodic?

> Since "pitch" is shorter and easier to say than "fundamental
> frequency", the term "pitch estimate" came into vogue, but
> what you are really doing is "fundamental frequency
> estimation".

agreed, and there are non-harmonic musical sounds (i dunno if i would apply
the term "tones" to them) that hardly have a fundamental frequency, yet we
have a sense of pitch to them.  and i am not addressing that case.  only the
quasi-periodic case.

>  Experienced engineers do the translation
> automatically, but some novices can be confused.

i guess i'm not too worried about it.  most of the musical application of
PDAs is so some guy can plug in his guitar to a box and he plays a note and
the PDA figures out what the note is.  sometimes for pitch to MIDI,
sometimes for effects or even synthesis.  and what we are doing is trying to
determine the fundamental frequency (with caveats) and base 2 log it.

-- 

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."

Reply by rhnl...@yahoo.com ●September 9, 20052005-09-09

robert bristow-johnson wrote:
> i guess i'm not too worried about it.  most of the musical application of
> PDAs is so some guy can plug in his guitar to a box and he plays a note and
> the PDA figures out what the note is.  sometimes for pitch to MIDI,
> sometimes for effects or even synthesis.  and what we are doing is trying to
> determine the fundamental frequency (with caveats) and base 2 log it.

Or tell him how far out of tune his instrument is from some selected
tuning and/or intonation.


IMHO. YMMV.
-- 
rhn A.T nicholson d.O.t C-o-M

Reply by John E. Hadstate ●September 10, 20052005-09-10

<malakhov@cas.ru> wrote in message
news:1126282396.391336.145990@o13g2000cwo.googlegroups.com...

> However it's not quite clear for me what kind of problem
> does the
> person initiated this discussion wants to solve.
> I can just suppose some.
> 1) That person needs a music transcription tool for his or
> her music
> exercises.
> 2) He or she wants to develop his (her) music
> transcription software.
> 3) He or she needs to develop and to test the period
> estimator for
> some other purpose than music transcription. It can be
> student work;
> or it can be medical or industrial application etc.
>

I recently saw a software package advertised for sale by a
telemarketer that claimed to measure a singer's pitch in
real-time and graphically show the singer where his/her
pitch was relative to where it should be.  Guaranteed to
make scalded cats sound like Caruso.

Reply by Rune Allnor ●September 10, 20052005-09-10

fizteh89 wrote:
> Well, well, well...
>
> Nothing unusual... comp.dsp "experts" discussing something they
> don't have a clue about and giving wrong answers to a general
> public...
> I was going to skip this discussion but just couldn't resist...
>
> Just where did you guys go to school?
> Do you know anything other than DFT?

I don't think you will find anyone on comp.dsp, regulars or
others, who claim they are experts at anything. Most have an
interest in DSP, and some even have some experience from
various types of work.

As for the posts I have come up with in this thread, you are
right in both your stated and implicit accounts: I haven't
gone to one of those high-prestige schools or universities,
and I am no expert in speech processing. I have done a little
bit of DSP with acoustics over the years, but not speach or
audio.

What I *do* know (apart from the DFT), is that anyone who
claims to have the One True Answer to a problem is in deep
trouble. They might be right in the short term, that they
have provided some trick or algorithm or way to solve some
problem. On the long term, however, such people tend to become
liabilities to their companies and organizations, since they
tend to get involved with studying their own Glorious Image
in the mirror, not finding it worth their while to keep up
with the dilettantes that otherwise swamp their profession.

One way of actually keeping up with the profession is to
ask the "silly" questions and hopefully get the elaborate
answers. If it is "obvious" to an expert, true or self-
proclaimed, on speech processing that the pitch is encoded
in the time-domain autocorrelation function, it is not at
all so for the generalist or somebody who have other fields
of interest.

So why don't you share your highly valued expertise with
the amateurs, perhaps even on comp.dsp, and show where
they are wrong and why? How would anybody know about
your achievements if you do not share them? How would
anyone recognize your genious if they do not understand
the complexity of a problem worthy of *your* attention?

There are lots of posers in the world; 13 on the dozen,
if not more. Only the educated is capable of recognizing
true genious. It is in your own interest to educate,
not mock, the inept. 

Rune

Previous 1 234 5 6 Next

Pitch Estimation using Autocorrelation

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group