comp.dsp | Pitch detection| page 2

Reply by Richard Owlett ●April 5, 20052005-04-05

Jerry Avins wrote:
> dt@soundmathtech.com wrote:
> 
>> Didn't I tell everybody here that pitch detection problem is solved ?
> 
> 
> Don't hold your breath. PID loops were obsolete since about 1975, but 
> nobody seems to have noticed.
> 
>> It seems that some people are unable to learn, or they are deliberatly
>> trying to mislead general public asking questions here.
>>
>> Go to http://www.soundmathtech.com/p&#4294967295;itch for more information.
> 
> 
> Not Found
> The requested URL /p&#4294967295;itch was not found on this server
> 
> Something funny with the link as it appeared.
> 


Clicking on it sends you to
http://www.soundmathtech.com/p%C2%ADitch
*NOT*
http://www.soundmathtech.com/pitch/  :{


> http://www.soundmathtech.com/pitch/ works.
> 
>> Also, US Patent Application at http://www.uspto.gov/patft
>> (Pub. No. 20030088401)
> 
> 
> http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PG01&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.html&r=1&f=G&l=50&s1=%2220030088401%22.PGNR.&OS=DN/20030088401&RS=DN/20030088401 
> 
> 
>> And one more time: FORGET ABOUT USING FFT for pitch detection !!!
>>
>> Dmitry Terez
> 
> 
> Jerry

Reply by Ronald H. Nicholson Jr. ●April 6, 20052005-04-06

In article <BE78340D.5EE0%rbj@audioimagination.com>,
robert bristow-johnson  <rbj@audioimagination.com> wrote:
>in article d2tb8d$fvh$1@blue.rahul.net, Ronald H. Nicholson Jr. at
>rhn@mauve.rahul.net wrote on 04/05/2005 02:29:
>i do not entirely understand why some weaker harmonic should be the one to
>care about in a pitch-detection problem.  can you explain?

If the lower harmonics are below the range of normal human hearing (or
the microphone or speakers used to transmit the signal), why should they
be relevant to pitch detection?

>that i don't understand.  the ASDF or AMDF (or an auto-correlation derived
>from it) will be minimum (or max for autocorrelation) at what looks like the
>period.  

The period might be irrelevant to the pitch.

>this sounds like the old "octave problem".  take a perfectly periodic 200 Hz
>waveform (doesn't need to be a sine).  the pitch detector says it's 200 Hz.
>now add to it a 100 Hz waveform that is -80 dB relative to the 200 Hz
>waveform.  what's the fundamental frequency?  mathematically?  or
>perceptually?

That's a perfectly harmonic example, not an inharmonic one.  Try again
with a 99.9 Hz 2nd harmonic added to the 200 Hz 4th harmonic (assuming
a missing fundamental somewhere just below 50 Hz).  Won't the AMSD hunt
for some inaudible and nonsensical period somewhere around 0.00005005005 Hz?

>> Others here have mentioned using long zero-padded fft's as one alternative
>> for certain types of pitch detection/measurement.
>
>i wouldn't do that.  you still need to deal with the possibility of missing
>or weak harmonics (inc. fundamental).  i agree with Dmitry Terez about not
>using FFT.

That assumes the pitch (even with a missing fundamental) is related
to the period, and not some other near sub-multiple of the more audible
higher harmonics.  An FFT might be better at accurately finding one
of the higher harmonics, and then one can then divide down to get the
missing fundamental.

IMHO. YMMV.
-- 
Ron Nicholson   rhn AT nicholson DOT com   http://www.nicholson.com/rhn/ 
#include <canonical.disclaimer>        // only my own opinions, etc.

Reply by robert bristow-johnson ●April 6, 20052005-04-06

Ronald H. Nicholson Jr. wrote:
> In article <BE78340D.5EE0%rbj@audioimagination.com>,
> robert bristow-johnson  <rbj@audioimagination.com> wrote:
> >in article d2tb8d$fvh$1@blue.rahul.net, Ronald H. Nicholson Jr. at
> >rhn@mauve.rahul.net wrote on 04/05/2005 02:29:
> >i do not entirely understand why some weaker harmonic should be the
one to
> >care about in a pitch-detection problem.  can you explain?
>
> If the lower harmonics are below the range of normal human hearing
(or
> the microphone or speakers used to transmit the signal), why should
they
> be relevant to pitch detection?

i think your brain sorta fills in the missing fundamental if there is a
2nd, 3rd, 4th, etc harmonic of a tone.  try it with MATLAB or the code
of your choice.

> >that i don't understand.  the ASDF or AMDF (or an auto-correlation
derived
> >from it) will be minimum (or max for autocorrelation) at what looks
like the
> >period.
>
> The period might be irrelevant to the pitch.

for quasi-periodic tones, i don't think it is irrelevant.

> >this sounds like the old "octave problem".  take a perfectly
periodic 200 Hz
> >waveform (doesn't need to be a sine).  the pitch detector says it's
200 Hz.
> >now add to it a 100 Hz waveform that is -80 dB relative to the 200
Hz
> >waveform.  what's the fundamental frequency?  mathematically?  or
> >perceptually?
>
> That's a perfectly harmonic example, not an inharmonic one.  Try
again
> with a 99.9 Hz 2nd harmonic added to the 200 Hz 4th harmonic
(assuming
> a missing fundamental somewhere just below 50 Hz).  Won't the AMDF
hunt
> for some inaudible and nonsensical period somewhere around
0.00005005005 Hz?

no, because you won't be looking down there (or at that long of a lag
for the AMDF).  there will be a pretty nice null at around 1/100 sec
(depends exactly where the other harmonics are) and the AMDF will call
that the period.  you said 99.9 was 2nd and 200 was 4th, implying f0 =
50 Hz.  but if there is no 150 or 250, 350, etc.  there is no reason
for the AMDF or your hearing to interpret that as 50 Hz and your 99.9
will be view as the fundamental.

> >> Others here have mentioned using long zero-padded fft's as one
alternative
> >> for certain types of pitch detection/measurement.
> >
> >i wouldn't do that.  you still need to deal with the possibility of
missing
> >or weak harmonics (inc. fundamental).  i agree with Dmitry Terez
about not
> >using FFT.
>
> That assumes the pitch (even with a missing fundamental) is related
> to the period, and not some other near sub-multiple of the more
audible
> higher harmonics.  An FFT might be better at accurately finding one
> of the higher harmonics, and then one can then divide down to get the
> missing fundamental.

i don't get it.  why would getting the higher harmonics and then
dividing the frequency down to an implied missing fundamental be better
than determining the period of some quasi-periodic signal for measure
of pitch?

> IMHO. YMMV.

yeah, i don't think i would get as much mileage in this.

r b-j

Reply by Jon Harris ●April 6, 20052005-04-06

"robert bristow-johnson" <rbj@audioimagination.com> wrote in message
news:1112764006.059594.279510@z14g2000cwz.googlegroups.com...
> Ronald H. Nicholson Jr. wrote:
> > In article <BE78340D.5EE0%rbj@audioimagination.com>,
> > robert bristow-johnson  <rbj@audioimagination.com> wrote:
> > >in article d2tb8d$fvh$1@blue.rahul.net, Ronald H. Nicholson Jr. at
> > >rhn@mauve.rahul.net wrote on 04/05/2005 02:29:
> > >i do not entirely understand why some weaker harmonic should be the
> one to
> > >care about in a pitch-detection problem.  can you explain?
> >
> > If the lower harmonics are below the range of normal human hearing
> (or
> > the microphone or speakers used to transmit the signal), why should
> they
> > be relevant to pitch detection?
>
> i think your brain sorta fills in the missing fundamental if there is a
> 2nd, 3rd, 4th, etc harmonic of a tone.  try it with MATLAB or the code
> of your choice.

Or check out these examples on the web:
http://www.ee.calpoly.edu/~jbreiten/audio/missfund/
http://physics.mtsu.edu/~wmr/julianna.html

Reply by ben ●April 7, 20052005-04-07

In article <BE78340D.5EE0%rbj@audioimagination.com>, robert
bristow-johnson <rbj@audioimagination.com> wrote:

> i wouldn't do that.  you still need to deal with the possibility of missing
> or weak harmonics (inc. fundamental).  i agree with Dmitry Terez about not
> using FFT.
...[to detect pictch]

is this correct? when you say not to use fft i take it you mean fourier
transforms in general not just the fast fourier transform specifically?
it's a little bit of a shock to me this -- i asked elsewhere about
pitch detection (which is the same thing as frequency or tone alanlysis
/ extraction right?) and was told 'fourier transform' by numerous
people -- that seemed to be *the* one and only answer. i don't know
much about fourier transforms but extracting the frequency / tone is
the reason i've started reading and learning about them -- am i wasting
my time learing about fourier transforms bearing in mind i'd like to be
able to take some recorded sound and split it up into it's
notes/tones/frequencies whatever the correct word is, particularly for
speech recognition? (i'm told that speech is much like  a song --
there's multiple tones occuring at the same time)

any info much appreciated (not so much specifically about speech
recognition but generally about splitting sound into its frequencies),

thanks very much, ben

Reply by Peter K. ●April 7, 20052005-04-07

ben wrote:

> it's a little bit of a shock to me this -- i asked elsewhere about
> pitch detection (which is the same thing as frequency or tone
> alanlysis / extraction right?)

No.  The pitch detection problem is different from the frequency
estimation problem.  The main difference (as far as I know; others
probably have a different opinion) is the psycho-acoustic effects that
need to be taken into account.

Otherwise, you might just as well do what Julius Smith recommends:

http://www-ccrma.stanford.edu/~jos/pasp/Plucked_Struck_String_Pitch_Estimation.html

The "Optional" section is just an approximation to the maximum
likelihood estimator of the fundamental frequency (as you may be
alluding to).

> and was told 'fourier transform' by numerous
> people -- that seemed to be *the* one and only answer. i don't know
> much about fourier transforms but extracting the frequency / tone is
> the reason i've started reading and learning about them -- am i
> wasting my time learing about fourier transforms bearing in mind i'd
> like to be able to take some recorded sound and split it up into it's
> notes/tones/frequencies whatever the correct word is, particularly
> for speech recognition? (i'm told that speech is much like  a song --
> there's multiple tones occuring at the same time)
>
> any info much appreciated (not so much specifically about speech
> recognition but generally about splitting sound into its
> frequencies),

There are several others on this newsgroup who know far more than I do
about pitch estimation/detection.  I'd suggest you formulate some
sensible questions (this post is a good start) and educate yourself!

Ciao,

Peter K.

Reply by ben ●April 7, 20052005-04-07

In article <1112872929.385589.268080@z14g2000cwz.googlegroups.com>,
Peter K. <p.kootsookos@iolfree.ie> wrote:

> ben wrote:
> 
> > it's a little bit of a shock to me this -- i asked elsewhere about
> > pitch detection (which is the same thing as frequency or tone
> > alanlysis / extraction right?)
> 
> No.  The pitch detection problem is different from the frequency
> estimation problem.  The main difference (as far as I know; others
> probably have a different opinion) is the psycho-acoustic effects that
> need to be taken into account.
> 
> Otherwise, you might just as well do what Julius Smith recommends:
> 
>
> http://www-ccrma.stanford.edu/~jos/pasp/Plucked_Struck_String_Pitch_Estimation.
> html
> 
> The "Optional" section is just an approximation to the maximum
> likelihood estimator of the fundamental frequency (as you may be
> alluding to).
> 
> > and was told 'fourier transform' by numerous
> > people -- that seemed to be *the* one and only answer. i don't know
> > much about fourier transforms but extracting the frequency / tone is
> > the reason i've started reading and learning about them -- am i
> > wasting my time learing about fourier transforms bearing in mind i'd
> > like to be able to take some recorded sound and split it up into it's
> > notes/tones/frequencies whatever the correct word is, particularly
> > for speech recognition? (i'm told that speech is much like  a song --
> > there's multiple tones occuring at the same time)
> >
> > any info much appreciated (not so much specifically about speech
> > recognition but generally about splitting sound into its
> > frequencies),
> 
> There are several others on this newsgroup who know far more than I do
> about pitch estimation/detection.  I'd suggest you formulate some
> sensible questions (this post is a good start) and educate yourself!
> 
> Ciao,
> 
> Peter K.
> 

thanks very much for the reply and info.

i think what you're saying is the difference between frequency
detection and pitch detection is the difference between the technical,
actual (measured very mechanically) frequencies, and the perceived by
humans frequencies (pitches) -- post complicated brain processing --
our perception of sounds. i think the technical, actual frequencies,
not human perception frequencies (if that makes sense/is correct) will
do fine for what i want.

what i want to do is the very first stage of analysing sound to go
about speech recognition (and i know there's a hell of a lot to speech
recognition). this first stage, which will provide data for much much
more further analysis after this stage, needs to split the sound, a
series of amplitudes, into the frequencies that are occuring (and when
they're occuring -- i know that's a bit of an issue but anyway...)
without going into details, should i be looking into fourier transforms
to do this first stage that i've described or something else? until i
saw this thread i didn't think there was any doubt but i'm not sure.
(it looks like fourier transforms are still the way but want to check).

so i just want to split the sounds up into the various frequencies to
be able to provide the later in the line of analysis steps appropriate
data to process. fourier transforms or something else needed/best for
this first stage?

thanks, ben.

Reply by Jerry Avins ●April 7, 20052005-04-07

ben wrote:

   ...

> is this correct? when you say not to use fft i take it you mean fourier
> transforms in general not just the fast fourier transform specifically?

To touch on one point only. An FFT is just a fast way to compute a 
Fourier transform. If its result weren't identical to all the other ways 
of computing it, it wouldn't be a Fourier transform.

"Don't walk there."

"Does that mean in any shoes, or just in running shoes?"

Jerry
-- 
Engineering is the art of making what you want from things you can get.
&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;

Reply by Andor ●April 7, 20052005-04-07

ben wrote:
...
> what i want to do is the very first stage of analysing sound to go
> about speech recognition (and i know there's a hell of a lot to
speech
> recognition). this first stage, which will provide data for much much
> more further analysis after this stage, needs to split the sound, a
> series of amplitudes, into the frequencies that are occuring (and
when
> they're occuring -- i know that's a bit of an issue but anyway...)

It's good to keep an eye on that issue.

> without going into details, should i be looking into fourier
transforms
> to do this first stage that i've described or something else? until i
> saw this thread i didn't think there was any doubt but i'm not sure.
> (it looks like fourier transforms are still the way but want to
check).
>
> so i just want to split the sounds up into the various frequencies to
> be able to provide the later in the line of analysis steps
appropriate
> data to process. fourier transforms or something else needed/best for
> this first stage?

I'm also no speech processing guy (perhaps you should go over to
comp.speech or comp.speech.research for more qualified responses). I
know that speech processing people like to estimate the frequency
components of speech indirectly by computing LPC coefficients (that is
what vocoders such as MELP, CELP etc. do). These coefficients do not
give you frequency information directly (you have to factor the
polynomial first), but it seems that certain vocal tract parameters can
be deduced directly from these coefficients (without having to take it
from the frequency domain). I think modeling the vocal tract should be
more interesting than frequency estimation for speech recognition.

If that is not the case, pure frequency estimation can be done via FFT
--- consider windowing, averaging and overlapping to improve the raw
FFT data.

FWIW.

Regards,
Andor

Reply by robert bristow-johnson ●April 7, 20052005-04-07

in article 070420051129510384%x@x.x, ben at x@x.x wrote on 04/07/2005 06:30:

> In article <BE78340D.5EE0%rbj@audioimagination.com>, robert
> bristow-johnson <rbj@audioimagination.com> wrote:
> 
>> i wouldn't do that.  you still need to deal with the possibility of missing
>> or weak harmonics (inc. fundamental).  i agree with Dmitry Terez about not
>> using FFT.
> ...[to detect pitch]
> 
> is this correct? when you say not to use fft i take it you mean fourier
> transforms in general not just the fast fourier transform specifically?
> it's a little bit of a shock to me this -- i asked elsewhere about
> pitch detection (which is the same thing as frequency or tone alanlysis
> / extraction right?) and was told 'fourier transform' by numerous
> people -- that seemed to be *the* one and only answer.

whatever linear operation you do in the frequency domain can be constructed
as an equivalent time-domain operation.

i guess we need to talk a little about what "pitch detection" means.  we
have a perceptual meaning that is hard to describe for sounds in general.
if i recorded a fart into a sampling keyboard and then played on the keys a
recognizable melody (say "Mary had a little lamb"), you might likely hear a
sense of pitch for each note, but i would have trouble defining clearly how
that fart gives you a sense of pitch for that note.

but highly tonal instruments are different.  *then* we are pretty clear that
the pitch of the note is directly related to the fundamental frequency, f0,
of the quasi-periodic function that is the note's waveform (which is the
reciprocal of the period).

now, in the spectrum, you will see spikes that are equally spaced and
integer multiples of that fundamental frequency.  each spike represents a
harmonic and the height of if is the strength of that harmonic.  now, we
could use a comb filter to isolate those spikes.  there are two basic kinds
of comb filters, one that puts in a null every f1 Hz and one that puts in a
peak every f1 Hz.  now if we use the first one and vary f1 until it happens
upon f0 or a submultiple of f0, then the output of that comb filter will be
minimum.  that is essentially what the AMDF or ASDF algorithm does in the
time domain.  it's the same thing but in two different domains.  and
autocorrellation is directly related to the ASDF.

> i don't know
> much about fourier transforms but extracting the frequency / tone is
> the reason i've started reading and learning about them -- am i wasting
> my time learing about fourier transforms bearing in mind i'd like to be
> able to take some recorded sound and split it up into it's
> notes/tones/frequencies whatever the correct word is, particularly for
> speech recognition? (i'm told that speech is much like  a song --
> there's multiple tones occuring at the same time)
> 
> any info much appreciated (not so much specifically about speech
> recognition but generally about splitting sound into its frequencies),

well, splitting a sound into its frequencies certainly *is* a topic
regarding the Fourier Transform (in one of its forms).

-- 

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."