DSPRelated.com
Forums

Realtime pitch detection - How does the different algorithms compare

Started by PaulWiik May 23, 2005
Hi,

What I try to do:
Type of audio: Solo Vocal music.
Samlingrate : 44100
Desired Output : Pitch (frequency), hopefully >=10 detections per second.

I've read a lot of discussions on this subject on the forum. As you will
probably see in my post, my knowledge in maths are limited. I work as a
software developer, but mainly GUI/DB. 

I started out creating an algorithm without doing much research. So I
guess I have made all the stupid mistakes (but hopefully gained knowledge
by doing them) :)

Started out with FFT. Because I have heard the fft buzz-word a million
times the last decade I immediately thought that must be it. Successfully
implemented it, but found it to not provide good enough resolution of
frequencies in the bins/time.
Tried correcting the frequency by looking at the neighbours of the one I
decided to be the peak. Weighted the neighbours frequencies in by
difference in magnitude to get a better reading. Gave up FFT.

Googled a bit and found the Goertzel algorithm. Implemented that, and ran
it for 48 frequencies 
generated from : 110*power(power(2,1/12),1...48)

I had expected to get readings with almost no spectral leakage for low
frequencies in Goertzel, but this was not the case (and that is obvious
with my short buffers when I think about it).
Anyway, the Goertzel gave quite good results, so I started considering
adding in harmonics (if you understand what I mean by that) to hopefully
get rid of noise and get clearer readings on real tones with harmonies.
Perhaps it would also help getting rid of Octave errors?

But the thing is that the Goertzel approach is not ideal, as I would like
to also be able to detect how much the tones are off pitch.

Now I have read a few of the discussions on AMDF and ASDF for pitch
detection on this forum.

And to the questions:
Can I expect much better results than Goertzel with ASDF/AMDF, or is it
just another way of getting to the same results?
 
I've been thinking of doing ASDF/AMDF that includes some of the harmonics
in the calculation. I envision that it will help in optimizing and at the
same time reduce the error level.
Something like (I hope I get some of this right):
  
  value = value+(Buffer[n]*Buffer[n+t])+(Buffer[n]*Buffer[n+(t/2))
         +(buffer[n]*Buffer[n+(t/3)])

Am I barking up the wrong tree here?

Sorry for the long post,
Paul


		
This message was sent using the Comp.DSP web interface on
www.DSPRelated.com
in article 28idnaiP6_DC9Q_fRVn-hA@giganews.com, PaulWiik at paul@wiik.net
wrote on 05/23/2005 19:33:
...
> > But the thing is that the Goertzel approach is not ideal, as I would like > to also be able to detect how much the tones are off pitch.
Goertzel is useful for computing the magnitude of one or a very few frequencies. it's not really a pitch detector (but if you knew the pitch or fundamental frequency, Goertzel could tell you how much energy is there at that frequency or any of the harmonics).
> Now I have read a few of the discussions on AMDF and ASDF for pitch > detection on this forum. > > And to the questions: > Can I expect much better results than Goertzel with ASDF/AMDF, or is it > just another way of getting to the same results?
they're different (and not comparable) algorithms to get you different parameters.
> I've been thinking of doing ASDF/AMDF that includes some of the harmonics > in the calculation.
AMDF or ASDF inherently do include harmonics. they really are methods to measure the period (which is the reciprocal of the fundamental frequency) of a periodic function. doesn't matter how much any particular harmonic is. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
In article <28idnaiP6_DC9Q_fRVn-hA@giganews.com>,
PaulWiik <paul@wiik.net> wrote:
>What I try to do: >Type of audio: Solo Vocal music. >Samlingrate : 44100 >Desired Output : Pitch (frequency), hopefully >=10 detections per second. > >I've read a lot of discussions on this subject on the forum. As you will >probably see in my post, my knowledge in maths are limited. I work as a >software developer, but mainly GUI/DB. > >I started out creating an algorithm without doing much research. So I >guess I have made all the stupid mistakes (but hopefully gained knowledge >by doing them) :) > >Started out with FFT. Because I have heard the fft buzz-word a million >times the last decade I immediately thought that must be it. Successfully >implemented it, but found it to not provide good enough resolution of >frequencies in the bins/time.
Have you looked up phase vocoder techiques? These algorithms use complex FFT results plus historical phase data to resolve single frequencies between the FFT bins even when using short buffers compared to the tonal period(s) of interest. They seem to provide decent resolution quickly, even in the presence of some types of noise. IMHO. YMMV. -- Ron Nicholson rhn AT nicholson DOT com http://www.nicholson.com/rhn/ #include <canonical.disclaimer> // only my own opinions, etc.
Thanks for your quick response.

>they're different (and not comparable) algorithms to get you different >parameters.
You are right. I was thinking more in the sense that if I were to calculate the enery with Goertzel, of all the frequencies in ASDF/AMDF (with results in Bins), would I come up with basically the same info. I think I understand how ASDF/AMDF work (If I'm not mixing with Autocorrelation), but not the Goertzel.
>AMDF or ASDF inherently do include harmonics. they really are methods
to
>measure the period (which is the reciprocal of the fundamental frequency)
of
>a periodic function. doesn't matter how much any particular harmonic
is. I'm not sure I understand how they include harmonics? Yes, I can see that harmonics are included in the way that if you resolve them for the complete range and look at all the lows, the Harmonics would also have their separate lows in the result. My first thought was, locate the first Low, but to be sure this is not noice, I would check that there are also a low in the results where the harmonics should mathematically end up. Is this what you mean by harmonics being included, or do you actually say that the first low will be influenced by harmonics? Sorry, I'm sure I'm asking stupid questions here. I do really appreciate that you professional guys take the time to help me, and others like me. This message was sent using the Comp.DSP web interface on www.DSPRelated.com
>Have you looked up phase vocoder techiques? These algorithms use complex >FFT results plus historical phase data to resolve single frequencies >between the FFT bins even when using short buffers compared to the tonal >period(s) of interest. They seem to provide decent resolution quickly, >even in the presence of some types of noise.
Thanks for the tip. I will try looking it up, and see if I can understand how to implement it. BTW: I used a Blackman-Harris window on the data before the FFT and Goertzel algorithms. I should have included the following question in my previous Follow-up: I don't need to use Blackman-Harris or other window for AMDF/ASDF, right? As far as I can figure, there would be no use for it in those algortihms, but I thought I'd ask anyway. This message was sent using the Comp.DSP web interface on www.DSPRelated.com
PaulWiik wrote:

>>Have you looked up phase vocoder techiques? These algorithms use complex >>FFT results plus historical phase data to resolve single frequencies >>between the FFT bins even when using short buffers compared to the tonal >>period(s) of interest. They seem to provide decent resolution quickly, >>even in the presence of some types of noise. >> >> >Thanks for the tip. I will try looking it up, and see if I can understand >how to implement it. > >BTW: I used a Blackman-Harris window on the data before the FFT and >Goertzel algorithms. > >I should have included the following question in my previous Follow-up: >I don't need to use Blackman-Harris or other window for AMDF/ASDF, right? > >
Using a window would simply spoil the results of AMDF, ASDF or autocorrelation. These algorithms are looking for the lag which gives the maximum similarity between two equal sized chunks of the signal. Altering the amplitude of some of the samples would reduce the similarity. Not good.
>As far as I can figure, there would be no use for it in those algortihms, >but I thought I'd ask anyway. > >This message was sent using the Comp.DSP web interface on >www.DSPRelated.com > >
Regards, Steve
>Using a window would simply spoil the results of AMDF, ASDF or >autocorrelation.
Thanks, that's what I thought, but I see it all more clear now.
> These algorithms are looking for the lag which gives > the maximum similarity between two equal sized chunks of the signal.
I find your simple description of the algorithms incredibly clear. I think I could actually have implemented such an algorithm, with no prior knowledge, on the basis of that one sentence. And, come to think of it, I now realize what Robert Bristow meant when he said they inherently include harmonics. Lesson Learnt: Think twice before posting stupid questions. I'll give this stuff (and I believe I read about combinations of them) a good chance before I start looking at the Phase vocoder stuff. It feels better using stuff I fully understand how works. This message was sent using the Comp.DSP web interface on www.DSPRelated.com
in article d6ulbc$10o$1@home.itg.ti.com, Steve Underwood at steveu@dis.org
wrote on 05/24/2005 03:32:

> PaulWiik wrote: >
...
>> >> I should have included the following question in my previous Follow-up: >> I don't need to use Blackman-Harris or other window for AMDF/ASDF, right? >> >> > Using a window would simply spoil the results of AMDF, ASDF or > autocorrelation.
i fully disagree with that. first of all, even if a window isn't "used", there is still an inherent rectangular window unless you are planning on summing an infinite number of terms.
> These algorithms are looking for the lag which gives > the maximum similarity between two equal sized chunks of the signal. > Altering the amplitude of some of the samples would reduce the > similarity. Not good.
there are two different ways of using the window: 1. applying it directly to the samples before the traditional way of doing autocorrelation creates a sorta "envelope" on the results (appears like the window correlated against itself) that these peaks (representing a similarity) can be compared against. Rx(tau, N) = SUM{ (x[n-tau/2]*w[n-tau/2-N]) * (x[n+tau/2]*w[n+tau/2-N]) } = SUM{ (x[n]*w[n-N]) * (x[n+tau]*w[n+tau-N]) } w(n-N) is a window symmetrical around sample number N which means you are trying to estimate pitch around sample N. if tau happens to take on the value of the period (or a multiple of it), that is tau = P where x[n+P] = x[n], then Rx(P, N) = SUM{ (x[n]*w[n-N]) * (x[n+P]*w[n+P-N]) } = SUM{ (x[n]*w[n-N]) * (x[n]*w[n+P-N]) } = SUM{ (x[n])^2 * (w[n-N]*w[n+P-N]) } = SUM{ (x[n+N])^2 * (w[n]*w[n+P]) } if the window is much wider than any anticipated period, a good approximation can be made by replacing (x[n])^2 with its mean. Rx(P, N) = SUM{ (x[n+N])^2 * (w[n]*w[n+P]) } ~= mean{ (x[n+N])^2 } * SUM{ (w[n]*w[n+P]) } ~= Rx(0,N)/SUM{(w[n])^2} * SUM{ (w[n]*w[n+P]) } ~= Rx(0,N) * W(P) where W(P) = SUM{ (w[n]*w[n+P]) } / SUM{ (w[n])^2 } and now, what you have is a nearly deterministic function that is proportional to a function solely of P. if Rx(tau, N) gets close to Rx(0, N) * W(tau), you know that tau is close to the period, P or a multiple of it. 2. alternatively, the window can be applied *after* the two signals (one is lagged) are compared in all three methods: Rx(tau, N) = SUM{ (x[n-tau/2] * x[n+tau/2]) * w[n-N] } AMDF(tau, N) = SUM{ |x[n-tau/2] - x[n+tau/2]| * w[n-N] } ASDF(tau, N) = SUM{ (x[n-tau/2] - x[n+tau/2])^2 * w[n-N] } in this case, the window serves to smooth out the result of the comparison from the portion of the signal where you happen to land this estimate (that is around sample N). it approximately decouples the result from small variations of N.
>> As far as I can figure, there would be no use for it in those algortihms, >> but I thought I'd ask anyway.
now you have a second opinion. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
"PaulWiik" <paul@wiik.net> wrote in message
news:cr2dnT0II-34dQ_fRVn-1A@giganews.com...
> >Using a window would simply spoil the results of AMDF, ASDF or > >autocorrelation. > Thanks, that's what I thought, but I see it all more clear now. > > > These algorithms are looking for the lag which gives > > the maximum similarity between two equal sized chunks of the signal. > > I find your simple description of the algorithms incredibly clear. I think > I could actually have implemented such an algorithm, with no prior > knowledge, on the basis of that one sentence. > > And, come to think of it, I now realize what Robert Bristow meant when he > said they inherently include harmonics. Lesson Learnt: Think twice before > posting stupid questions.
Actually PaulWiik, I have found your posts to be among the better ones in this group! You present your past work and questions clearly and respond quickly to follow-up questions. Keep up the good work.
Robert bristow-johnson wrote:
> now you have a second opinion.
Ok I think I see one of the advantages of one the firstuses you describe:
> w(n-N) is a window symmetrical around sample number N which > means you are trying to estimate pitch around sample N.
I think this is what I stumbled accross during my testing tonight, but instead of applying the window, I narrowed the scan of n as to avoid partial cancellation of higher frequencies with small frequency variations in a big buffer. I have a hard time understanding all these mathematical expressions, so if you feel I haven't listened to your advice, it's because I don't understand, rather than because I don't want to. I have a 5 year Electronics/telecom education where I'm sure I was supposed to learn all these maths, but it's more than 10 years ago, and I have never used it since. Jon Harris wrote:
> You present your past work and questions clearly and respond quickly to >follow-up questions. Keep up the good work.
Thanks! Here is what I did so far: - I implemented a combination of AutoCorrelation and AMDF, something like: combined=AC/(AMDF+1) Come to think of it, I forgot to try replacing AutoCorr with ASDF. I did this in a function where I also throw in the frequency I want to look for (the same way as I used Goertzel). I then call this function for all the frequencies in the musical scale I'm interested in. From a CPU load point of view I very much like this approach, since I hope to detect roughly the whereabouts of the fundamental by doing 40 scans (my testing tone range, >3 octaves). It could definitvely save a lot of CPU. The function calculates tau on the basis of the sampling rate and frequency param. What I discovered next was that I could easily reduce down to scanning n from 0 to tau*3 (only if tau*3<(N/2) -to stay in the buffer). This not only saved a lot of computing, but also seemed to help detection! To me this proves, probably in a low-fi way, that what Robert Bristow-Johnson pointed out about using windows even with AMDF/ASDF has got advantages. My plan is that after I've detected the "rough" fundamental, I'll do a full scan with tau from the period of the tone below, and up to the tone above to try to find the "exact pitch". Current main concerns: I still struggle with "Shhh" sounds having lower harmonies that end up detcted (a tendency to occur around 700hz), and when the female vocalists voice "break" (by intention). I'm not sure what the english term is, but it's the kind of "sounding sexy" on low tones thing they do. I'm displaying the magnitudes in a plot, and there is one thing I do not understand. A lot of descriptions on pitch-detection say you should pick the first "peak" over a certain treshold. In my case, whith this female singer, I always get a powerful peak one octave below what definitively must be the musical pitch. Does this mean I've messed up my calculation of tau? I use Tau = samplerate/frequency. I believe I get the correct results by using (samplerate*2)/frequency? but this creates a tau>(N/2) for 110hz (N=1024, samplerate=44100). I would appreciate any hints on the "shhh sounds", and the yet very undefined "Sexy voice" -problems. On Shh: I guess It won't help filtering out high frequencies, since the problem is in sub-harmonics (700hz etc)? Is this an issue where it could help running a new function (correlation?)on my set of magnitudes to try to detect the periods of the harmonics? This message was sent using the Comp.DSP web interface on www.DSPRelated.com