DSPRelated.com
Forums

Pitch detection

Started by altmeyermartin March 21, 2005
Jerry Avins wrote:
> dt@soundmathtech.com wrote: > >> Didn't I tell everybody here that pitch detection problem is solved ? > > > Don't hold your breath. PID loops were obsolete since about 1975, but > nobody seems to have noticed. > >> It seems that some people are unable to learn, or they are deliberatly >> trying to mislead general public asking questions here. >> >> Go to http://www.soundmathtech.com/p�itch for more information. > > > Not Found > The requested URL /p�itch was not found on this server > > Something funny with the link as it appeared. >
Clicking on it sends you to http://www.soundmathtech.com/p%C2%ADitch *NOT* http://www.soundmathtech.com/pitch/ :{
> http://www.soundmathtech.com/pitch/ works. > >> Also, US Patent Application at http://www.uspto.gov/patft >> (Pub. No. 20030088401) > > > http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PG01&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.html&r=1&f=G&l=50&s1=%2220030088401%22.PGNR.&OS=DN/20030088401&RS=DN/20030088401 > > >> And one more time: FORGET ABOUT USING FFT for pitch detection !!! >> >> Dmitry Terez > > > Jerry
In article <BE78340D.5EE0%rbj@audioimagination.com>,
robert bristow-johnson  <rbj@audioimagination.com> wrote:
>in article d2tb8d$fvh$1@blue.rahul.net, Ronald H. Nicholson Jr. at >rhn@mauve.rahul.net wrote on 04/05/2005 02:29: >i do not entirely understand why some weaker harmonic should be the one to >care about in a pitch-detection problem. can you explain?
If the lower harmonics are below the range of normal human hearing (or the microphone or speakers used to transmit the signal), why should they be relevant to pitch detection?
>that i don't understand. the ASDF or AMDF (or an auto-correlation derived >from it) will be minimum (or max for autocorrelation) at what looks like the >period.
The period might be irrelevant to the pitch.
>this sounds like the old "octave problem". take a perfectly periodic 200 Hz >waveform (doesn't need to be a sine). the pitch detector says it's 200 Hz. >now add to it a 100 Hz waveform that is -80 dB relative to the 200 Hz >waveform. what's the fundamental frequency? mathematically? or >perceptually?
That's a perfectly harmonic example, not an inharmonic one. Try again with a 99.9 Hz 2nd harmonic added to the 200 Hz 4th harmonic (assuming a missing fundamental somewhere just below 50 Hz). Won't the AMSD hunt for some inaudible and nonsensical period somewhere around 0.00005005005 Hz?
>> Others here have mentioned using long zero-padded fft's as one alternative >> for certain types of pitch detection/measurement. > >i wouldn't do that. you still need to deal with the possibility of missing >or weak harmonics (inc. fundamental). i agree with Dmitry Terez about not >using FFT.
That assumes the pitch (even with a missing fundamental) is related to the period, and not some other near sub-multiple of the more audible higher harmonics. An FFT might be better at accurately finding one of the higher harmonics, and then one can then divide down to get the missing fundamental. IMHO. YMMV. -- Ron Nicholson rhn AT nicholson DOT com http://www.nicholson.com/rhn/ #include <canonical.disclaimer> // only my own opinions, etc.
Ronald H. Nicholson Jr. wrote:
> In article <BE78340D.5EE0%rbj@audioimagination.com>, > robert bristow-johnson <rbj@audioimagination.com> wrote: > >in article d2tb8d$fvh$1@blue.rahul.net, Ronald H. Nicholson Jr. at > >rhn@mauve.rahul.net wrote on 04/05/2005 02:29: > >i do not entirely understand why some weaker harmonic should be the
one to
> >care about in a pitch-detection problem. can you explain? > > If the lower harmonics are below the range of normal human hearing
(or
> the microphone or speakers used to transmit the signal), why should
they
> be relevant to pitch detection?
i think your brain sorta fills in the missing fundamental if there is a 2nd, 3rd, 4th, etc harmonic of a tone. try it with MATLAB or the code of your choice.
> >that i don't understand. the ASDF or AMDF (or an auto-correlation
derived
> >from it) will be minimum (or max for autocorrelation) at what looks
like the
> >period. > > The period might be irrelevant to the pitch.
for quasi-periodic tones, i don't think it is irrelevant.
> >this sounds like the old "octave problem". take a perfectly
periodic 200 Hz
> >waveform (doesn't need to be a sine). the pitch detector says it's
200 Hz.
> >now add to it a 100 Hz waveform that is -80 dB relative to the 200
Hz
> >waveform. what's the fundamental frequency? mathematically? or > >perceptually? > > That's a perfectly harmonic example, not an inharmonic one. Try
again
> with a 99.9 Hz 2nd harmonic added to the 200 Hz 4th harmonic
(assuming
> a missing fundamental somewhere just below 50 Hz). Won't the AMDF
hunt
> for some inaudible and nonsensical period somewhere around
0.00005005005 Hz? no, because you won't be looking down there (or at that long of a lag for the AMDF). there will be a pretty nice null at around 1/100 sec (depends exactly where the other harmonics are) and the AMDF will call that the period. you said 99.9 was 2nd and 200 was 4th, implying f0 = 50 Hz. but if there is no 150 or 250, 350, etc. there is no reason for the AMDF or your hearing to interpret that as 50 Hz and your 99.9 will be view as the fundamental.
> >> Others here have mentioned using long zero-padded fft's as one
alternative
> >> for certain types of pitch detection/measurement. > > > >i wouldn't do that. you still need to deal with the possibility of
missing
> >or weak harmonics (inc. fundamental). i agree with Dmitry Terez
about not
> >using FFT. > > That assumes the pitch (even with a missing fundamental) is related > to the period, and not some other near sub-multiple of the more
audible
> higher harmonics. An FFT might be better at accurately finding one > of the higher harmonics, and then one can then divide down to get the > missing fundamental.
i don't get it. why would getting the higher harmonics and then dividing the frequency down to an implied missing fundamental be better than determining the period of some quasi-periodic signal for measure of pitch?
> IMHO. YMMV.
yeah, i don't think i would get as much mileage in this. r b-j
"robert bristow-johnson" <rbj@audioimagination.com> wrote in message
news:1112764006.059594.279510@z14g2000cwz.googlegroups.com...
> Ronald H. Nicholson Jr. wrote: > > In article <BE78340D.5EE0%rbj@audioimagination.com>, > > robert bristow-johnson <rbj@audioimagination.com> wrote: > > >in article d2tb8d$fvh$1@blue.rahul.net, Ronald H. Nicholson Jr. at > > >rhn@mauve.rahul.net wrote on 04/05/2005 02:29: > > >i do not entirely understand why some weaker harmonic should be the > one to > > >care about in a pitch-detection problem. can you explain? > > > > If the lower harmonics are below the range of normal human hearing > (or > > the microphone or speakers used to transmit the signal), why should > they > > be relevant to pitch detection? > > i think your brain sorta fills in the missing fundamental if there is a > 2nd, 3rd, 4th, etc harmonic of a tone. try it with MATLAB or the code > of your choice.
Or check out these examples on the web: http://www.ee.calpoly.edu/~jbreiten/audio/missfund/ http://physics.mtsu.edu/~wmr/julianna.html
In article <BE78340D.5EE0%rbj@audioimagination.com>, robert
bristow-johnson <rbj@audioimagination.com> wrote:

> i wouldn't do that. you still need to deal with the possibility of missing > or weak harmonics (inc. fundamental). i agree with Dmitry Terez about not > using FFT.
...[to detect pictch] is this correct? when you say not to use fft i take it you mean fourier transforms in general not just the fast fourier transform specifically? it's a little bit of a shock to me this -- i asked elsewhere about pitch detection (which is the same thing as frequency or tone alanlysis / extraction right?) and was told 'fourier transform' by numerous people -- that seemed to be *the* one and only answer. i don't know much about fourier transforms but extracting the frequency / tone is the reason i've started reading and learning about them -- am i wasting my time learing about fourier transforms bearing in mind i'd like to be able to take some recorded sound and split it up into it's notes/tones/frequencies whatever the correct word is, particularly for speech recognition? (i'm told that speech is much like a song -- there's multiple tones occuring at the same time) any info much appreciated (not so much specifically about speech recognition but generally about splitting sound into its frequencies), thanks very much, ben
ben wrote:

> it's a little bit of a shock to me this -- i asked elsewhere about > pitch detection (which is the same thing as frequency or tone > alanlysis / extraction right?)
No. The pitch detection problem is different from the frequency estimation problem. The main difference (as far as I know; others probably have a different opinion) is the psycho-acoustic effects that need to be taken into account. Otherwise, you might just as well do what Julius Smith recommends: http://www-ccrma.stanford.edu/~jos/pasp/Plucked_Struck_String_Pitch_Estimation.html The "Optional" section is just an approximation to the maximum likelihood estimator of the fundamental frequency (as you may be alluding to).
> and was told 'fourier transform' by numerous > people -- that seemed to be *the* one and only answer. i don't know > much about fourier transforms but extracting the frequency / tone is > the reason i've started reading and learning about them -- am i > wasting my time learing about fourier transforms bearing in mind i'd > like to be able to take some recorded sound and split it up into it's > notes/tones/frequencies whatever the correct word is, particularly > for speech recognition? (i'm told that speech is much like a song -- > there's multiple tones occuring at the same time) > > any info much appreciated (not so much specifically about speech > recognition but generally about splitting sound into its > frequencies),
There are several others on this newsgroup who know far more than I do about pitch estimation/detection. I'd suggest you formulate some sensible questions (this post is a good start) and educate yourself! Ciao, Peter K.
In article <1112872929.385589.268080@z14g2000cwz.googlegroups.com>,
Peter K. <p.kootsookos@iolfree.ie> wrote:

> ben wrote: > > > it's a little bit of a shock to me this -- i asked elsewhere about > > pitch detection (which is the same thing as frequency or tone > > alanlysis / extraction right?) > > No. The pitch detection problem is different from the frequency > estimation problem. The main difference (as far as I know; others > probably have a different opinion) is the psycho-acoustic effects that > need to be taken into account. > > Otherwise, you might just as well do what Julius Smith recommends: > > > http://www-ccrma.stanford.edu/~jos/pasp/Plucked_Struck_String_Pitch_Estimation. > html > > The "Optional" section is just an approximation to the maximum > likelihood estimator of the fundamental frequency (as you may be > alluding to). > > > and was told 'fourier transform' by numerous > > people -- that seemed to be *the* one and only answer. i don't know > > much about fourier transforms but extracting the frequency / tone is > > the reason i've started reading and learning about them -- am i > > wasting my time learing about fourier transforms bearing in mind i'd > > like to be able to take some recorded sound and split it up into it's > > notes/tones/frequencies whatever the correct word is, particularly > > for speech recognition? (i'm told that speech is much like a song -- > > there's multiple tones occuring at the same time) > > > > any info much appreciated (not so much specifically about speech > > recognition but generally about splitting sound into its > > frequencies), > > There are several others on this newsgroup who know far more than I do > about pitch estimation/detection. I'd suggest you formulate some > sensible questions (this post is a good start) and educate yourself! > > Ciao, > > Peter K. >
thanks very much for the reply and info. i think what you're saying is the difference between frequency detection and pitch detection is the difference between the technical, actual (measured very mechanically) frequencies, and the perceived by humans frequencies (pitches) -- post complicated brain processing -- our perception of sounds. i think the technical, actual frequencies, not human perception frequencies (if that makes sense/is correct) will do fine for what i want. what i want to do is the very first stage of analysing sound to go about speech recognition (and i know there's a hell of a lot to speech recognition). this first stage, which will provide data for much much more further analysis after this stage, needs to split the sound, a series of amplitudes, into the frequencies that are occuring (and when they're occuring -- i know that's a bit of an issue but anyway...) without going into details, should i be looking into fourier transforms to do this first stage that i've described or something else? until i saw this thread i didn't think there was any doubt but i'm not sure. (it looks like fourier transforms are still the way but want to check). so i just want to split the sounds up into the various frequencies to be able to provide the later in the line of analysis steps appropriate data to process. fourier transforms or something else needed/best for this first stage? thanks, ben.
ben wrote:

   ...

> is this correct? when you say not to use fft i take it you mean fourier > transforms in general not just the fast fourier transform specifically?
To touch on one point only. An FFT is just a fast way to compute a Fourier transform. If its result weren't identical to all the other ways of computing it, it wouldn't be a Fourier transform. "Don't walk there." "Does that mean in any shoes, or just in running shoes?" Jerry -- Engineering is the art of making what you want from things you can get. &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;
ben wrote:
...
> what i want to do is the very first stage of analysing sound to go > about speech recognition (and i know there's a hell of a lot to
speech
> recognition). this first stage, which will provide data for much much > more further analysis after this stage, needs to split the sound, a > series of amplitudes, into the frequencies that are occuring (and
when
> they're occuring -- i know that's a bit of an issue but anyway...)
It's good to keep an eye on that issue.
> without going into details, should i be looking into fourier
transforms
> to do this first stage that i've described or something else? until i > saw this thread i didn't think there was any doubt but i'm not sure. > (it looks like fourier transforms are still the way but want to
check).
> > so i just want to split the sounds up into the various frequencies to > be able to provide the later in the line of analysis steps
appropriate
> data to process. fourier transforms or something else needed/best for > this first stage?
I'm also no speech processing guy (perhaps you should go over to comp.speech or comp.speech.research for more qualified responses). I know that speech processing people like to estimate the frequency components of speech indirectly by computing LPC coefficients (that is what vocoders such as MELP, CELP etc. do). These coefficients do not give you frequency information directly (you have to factor the polynomial first), but it seems that certain vocal tract parameters can be deduced directly from these coefficients (without having to take it from the frequency domain). I think modeling the vocal tract should be more interesting than frequency estimation for speech recognition. If that is not the case, pure frequency estimation can be done via FFT --- consider windowing, averaging and overlapping to improve the raw FFT data. FWIW. Regards, Andor
in article 070420051129510384%x@x.x, ben at x@x.x wrote on 04/07/2005 06:30:

> In article <BE78340D.5EE0%rbj@audioimagination.com>, robert > bristow-johnson <rbj@audioimagination.com> wrote: > >> i wouldn't do that. you still need to deal with the possibility of missing >> or weak harmonics (inc. fundamental). i agree with Dmitry Terez about not >> using FFT. > ...[to detect pitch] > > is this correct? when you say not to use fft i take it you mean fourier > transforms in general not just the fast fourier transform specifically? > it's a little bit of a shock to me this -- i asked elsewhere about > pitch detection (which is the same thing as frequency or tone alanlysis > / extraction right?) and was told 'fourier transform' by numerous > people -- that seemed to be *the* one and only answer.
whatever linear operation you do in the frequency domain can be constructed as an equivalent time-domain operation. i guess we need to talk a little about what "pitch detection" means. we have a perceptual meaning that is hard to describe for sounds in general. if i recorded a fart into a sampling keyboard and then played on the keys a recognizable melody (say "Mary had a little lamb"), you might likely hear a sense of pitch for each note, but i would have trouble defining clearly how that fart gives you a sense of pitch for that note. but highly tonal instruments are different. *then* we are pretty clear that the pitch of the note is directly related to the fundamental frequency, f0, of the quasi-periodic function that is the note's waveform (which is the reciprocal of the period). now, in the spectrum, you will see spikes that are equally spaced and integer multiples of that fundamental frequency. each spike represents a harmonic and the height of if is the strength of that harmonic. now, we could use a comb filter to isolate those spikes. there are two basic kinds of comb filters, one that puts in a null every f1 Hz and one that puts in a peak every f1 Hz. now if we use the first one and vary f1 until it happens upon f0 or a submultiple of f0, then the output of that comb filter will be minimum. that is essentially what the AMDF or ASDF algorithm does in the time domain. it's the same thing but in two different domains. and autocorrellation is directly related to the ASDF.
> i don't know > much about fourier transforms but extracting the frequency / tone is > the reason i've started reading and learning about them -- am i wasting > my time learing about fourier transforms bearing in mind i'd like to be > able to take some recorded sound and split it up into it's > notes/tones/frequencies whatever the correct word is, particularly for > speech recognition? (i'm told that speech is much like a song -- > there's multiple tones occuring at the same time) > > any info much appreciated (not so much specifically about speech > recognition but generally about splitting sound into its frequencies),
well, splitting a sound into its frequencies certainly *is* a topic regarding the Fourier Transform (in one of its forms). -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."