comp.dsp | Realtime pitch detection - How does the different algorithms compare

Hi,

What I try to do:
Type of audio: Solo Vocal music.
Samlingrate : 44100
Desired Output : Pitch (frequency), hopefully >=10 detections per second.

I've read a lot of discussions on this subject on the forum. As you will
probably see in my post, my knowledge in maths are limited. I work as a
software developer, but mainly GUI/DB. 

I started out creating an algorithm without doing much research. So I
guess I have made all the stupid mistakes (but hopefully gained knowledge
by doing them) :)

Started out with FFT. Because I have heard the fft buzz-word a million
times the last decade I immediately thought that must be it. Successfully
implemented it, but found it to not provide good enough resolution of
frequencies in the bins/time.
Tried correcting the frequency by looking at the neighbours of the one I
decided to be the peak. Weighted the neighbours frequencies in by
difference in magnitude to get a better reading. Gave up FFT.

Googled a bit and found the Goertzel algorithm. Implemented that, and ran
it for 48 frequencies 
generated from : 110*power(power(2,1/12),1...48)

I had expected to get readings with almost no spectral leakage for low
frequencies in Goertzel, but this was not the case (and that is obvious
with my short buffers when I think about it).
Anyway, the Goertzel gave quite good results, so I started considering
adding in harmonics (if you understand what I mean by that) to hopefully
get rid of noise and get clearer readings on real tones with harmonies.
Perhaps it would also help getting rid of Octave errors?

But the thing is that the Goertzel approach is not ideal, as I would like
to also be able to detect how much the tones are off pitch.

Now I have read a few of the discussions on AMDF and ASDF for pitch
detection on this forum.

And to the questions:
Can I expect much better results than Goertzel with ASDF/AMDF, or is it
just another way of getting to the same results?
 
I've been thinking of doing ASDF/AMDF that includes some of the harmonics
in the calculation. I envision that it will help in optimizing and at the
same time reduce the error level.
Something like (I hope I get some of this right):
  
  value = value+(Buffer[n]*Buffer[n+t])+(Buffer[n]*Buffer[n+(t/2))
         +(buffer[n]*Buffer[n+(t/3)])

Am I barking up the wrong tree here?

Sorry for the long post,
Paul


		
This message was sent using the Comp.DSP web interface on
www.DSPRelated.com

Reply by robert bristow-johnson ●May 23, 20052005-05-23

in article 28idnaiP6_DC9Q_fRVn-hA@giganews.com, PaulWiik at paul@wiik.net
wrote on 05/23/2005 19:33:
...
> 
> But the thing is that the Goertzel approach is not ideal, as I would like
> to also be able to detect how much the tones are off pitch.

Goertzel is useful for computing the magnitude of one or a very few
frequencies.  it's not really a pitch detector (but if you knew the pitch or
fundamental frequency, Goertzel could tell you how much energy is there at
that frequency or any of the harmonics).

> Now I have read a few of the discussions on AMDF and ASDF for pitch
> detection on this forum.
> 
> And to the questions:
> Can I expect much better results than Goertzel with ASDF/AMDF, or is it
> just another way of getting to the same results?

they're different (and not comparable) algorithms to get you different
parameters.

> I've been thinking of doing ASDF/AMDF that includes some of the harmonics
> in the calculation.

AMDF or ASDF inherently do include harmonics.  they really are methods to
measure the period (which is the reciprocal of the fundamental frequency) of
a periodic function.  doesn't matter how much any particular harmonic is.


-- 

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."

Reply by Ronald H. Nicholson Jr. ●May 24, 20052005-05-24

In article <28idnaiP6_DC9Q_fRVn-hA@giganews.com>,
PaulWiik <paul@wiik.net> wrote:
>What I try to do:
>Type of audio: Solo Vocal music.
>Samlingrate : 44100
>Desired Output : Pitch (frequency), hopefully >=10 detections per second.
>
>I've read a lot of discussions on this subject on the forum. As you will
>probably see in my post, my knowledge in maths are limited. I work as a
>software developer, but mainly GUI/DB. 
>
>I started out creating an algorithm without doing much research. So I
>guess I have made all the stupid mistakes (but hopefully gained knowledge
>by doing them) :)
>
>Started out with FFT. Because I have heard the fft buzz-word a million
>times the last decade I immediately thought that must be it. Successfully
>implemented it, but found it to not provide good enough resolution of
>frequencies in the bins/time.

Have you looked up phase vocoder techiques?  These algorithms use complex
FFT results plus historical phase data to resolve single frequencies
between the FFT bins even when using short buffers compared to the tonal
period(s) of interest.  They seem to provide decent resolution quickly,
even in the presence of some types of noise.


IMHO. YMMV.
-- 
Ron Nicholson   rhn AT nicholson DOT com   http://www.nicholson.com/rhn/ 
#include <canonical.disclaimer>        // only my own opinions, etc.

Reply by PaulWiik ●May 24, 20052005-05-24

Thanks for your quick response.

>they're different (and not comparable) algorithms to get you different
>parameters.
You are right. I was thinking more in the sense that if I were to
calculate the enery with Goertzel, of all the frequencies in ASDF/AMDF
(with results in Bins), would I come up with basically the same info. I
think I understand how ASDF/AMDF work (If I'm not mixing with
Autocorrelation), but not the Goertzel.

>AMDF or ASDF inherently do include harmonics.  they really are methods
to
>measure the period (which is the reciprocal of the fundamental frequency)
of
>a periodic function.  doesn't matter how much any particular harmonic
is.
I'm not sure I understand how they include harmonics?

Yes, I can see that harmonics are included in the way that if you resolve
them for the complete range and look at all the lows, the Harmonics would
also have their separate lows in the result.
My first thought was, locate the first Low, but to be sure this is not
noice, I would check that there are also a low in the results where the
harmonics should mathematically end up.
Is this what you mean by harmonics being included, or do you actually say
that the first low will be influenced by harmonics?

Sorry, I'm sure I'm asking stupid questions here. I do really appreciate
that you professional guys take the time to help me, and others like me.
		
This message was sent using the Comp.DSP web interface on
www.DSPRelated.com

Reply by PaulWiik ●May 24, 20052005-05-24

>Have you looked up phase vocoder techiques?  These algorithms use complex
>FFT results plus historical phase data to resolve single frequencies
>between the FFT bins even when using short buffers compared to the tonal
>period(s) of interest.  They seem to provide decent resolution quickly,
>even in the presence of some types of noise.
Thanks for the tip. I will try looking it up, and see if I can understand
how to implement it.

BTW: I used a Blackman-Harris window on the data before the FFT and
Goertzel algorithms.

I should have included the following question in my previous Follow-up:
I don't need to use Blackman-Harris or other window for AMDF/ASDF, right?

As far as I can figure, there would be no use for it in those algortihms,
but I thought I'd ask anyway.
		
This message was sent using the Comp.DSP web interface on
www.DSPRelated.com

Reply by Steve Underwood ●May 24, 20052005-05-24

PaulWiik wrote:

>>Have you looked up phase vocoder techiques?  These algorithms use complex
>>FFT results plus historical phase data to resolve single frequencies
>>between the FFT bins even when using short buffers compared to the tonal
>>period(s) of interest.  They seem to provide decent resolution quickly,
>>even in the presence of some types of noise.
>>    
>>
>Thanks for the tip. I will try looking it up, and see if I can understand
>how to implement it.
>
>BTW: I used a Blackman-Harris window on the data before the FFT and
>Goertzel algorithms.
>
>I should have included the following question in my previous Follow-up:
>I don't need to use Blackman-Harris or other window for AMDF/ASDF, right?
>  
>
Using a window would simply spoil the results of AMDF, ASDF or 
autocorrelation. These algorithms are looking for the lag which gives 
the maximum similarity between two equal sized chunks of the signal. 
Altering the amplitude of some of the samples would reduce the 
similarity. Not good.

>As far as I can figure, there would be no use for it in those algortihms,
>but I thought I'd ask anyway.
>		
>This message was sent using the Comp.DSP web interface on
>www.DSPRelated.com
>  
>
Regards,
Steve

Reply by PaulWiik ●May 24, 20052005-05-24

>Using a window would simply spoil the results of AMDF, ASDF or 
>autocorrelation. 
Thanks, that's what I thought, but I see it all more clear now.

> These algorithms are looking for the lag which gives 
> the maximum similarity between two equal sized chunks of the signal. 

I find your simple description of the algorithms incredibly clear. I think
I could actually have implemented such an algorithm, with no prior
knowledge, on the basis of that one sentence.

And, come to think of it, I now realize what Robert Bristow meant when he
said they inherently include harmonics. Lesson Learnt: Think twice before
posting stupid questions.

I'll give this stuff (and I believe I read about combinations of them) a
good chance before I start looking at the Phase vocoder stuff. It feels
better using stuff I fully understand how works.
		
This message was sent using the Comp.DSP web interface on
www.DSPRelated.com

Reply by robert bristow-johnson ●May 24, 20052005-05-24

in article d6ulbc$10o$1@home.itg.ti.com, Steve Underwood at steveu@dis.org
wrote on 05/24/2005 03:32:

> PaulWiik wrote:
> 
...
>> 
>> I should have included the following question in my previous Follow-up:
>> I don't need to use Blackman-Harris or other window for AMDF/ASDF, right?
>> 
>> 
> Using a window would simply spoil the results of AMDF, ASDF or
> autocorrelation.

i fully disagree with that.  first of all, even if a window isn't "used",
there is still an inherent rectangular window unless you are planning on
summing an infinite number of terms.

> These algorithms are looking for the lag which gives
> the maximum similarity between two equal sized chunks of the signal.
> Altering the amplitude of some of the samples would reduce the
> similarity. Not good.

there are two different ways of using the window:

1. applying it directly to the samples before the traditional way of doing
autocorrelation creates a sorta "envelope" on the results (appears like the
window correlated against itself) that these peaks (representing a
similarity) can be compared against.

 Rx(tau, N)  = SUM{ (x[n-tau/2]*w[n-tau/2-N]) * (x[n+tau/2]*w[n+tau/2-N]) }

             = SUM{ (x[n]*w[n-N])             * (x[n+tau]*w[n+tau-N]) }

               w(n-N) is a window symmetrical around sample number N which
               means you are trying to estimate pitch around sample N.

if tau happens to take on the value of the period (or a multiple of it),
that is tau = P where x[n+P] = x[n], then

    Rx(P, N)  =  SUM{ (x[n]*w[n-N]) * (x[n+P]*w[n+P-N]) }

              =  SUM{ (x[n]*w[n-N]) * (x[n]*w[n+P-N]) }

              =  SUM{ (x[n])^2   * (w[n-N]*w[n+P-N]) }

              =  SUM{ (x[n+N])^2 * (w[n]*w[n+P]) }

if the window is much wider than any anticipated period, a good
approximation can be made by replacing (x[n])^2 with its mean.

    Rx(P, N)  =   SUM{ (x[n+N])^2 * (w[n]*w[n+P]) }

             ~=   mean{ (x[n+N])^2 }     * SUM{ (w[n]*w[n+P]) }

             ~=   Rx(0,N)/SUM{(w[n])^2}  * SUM{ (w[n]*w[n+P]) }

             ~=   Rx(0,N)                * W(P)

                where  W(P) = SUM{ (w[n]*w[n+P]) } / SUM{ (w[n])^2 }

and now, what you have is a nearly deterministic function that is
proportional to a function solely of P.
if Rx(tau, N) gets close to  Rx(0, N) * W(tau), you know that tau is close
to the period, P or a multiple of it.

2. alternatively, the window can be applied *after* the two signals (one is
lagged) are compared in all three methods:

    Rx(tau, N)   = SUM{ (x[n-tau/2] * x[n+tau/2])   * w[n-N] }

    AMDF(tau, N) = SUM{ |x[n-tau/2] - x[n+tau/2]|   * w[n-N] }

    ASDF(tau, N) = SUM{ (x[n-tau/2] - x[n+tau/2])^2 * w[n-N] }

in this case, the window serves to smooth out the result of the comparison
from the portion of the signal where you happen to land this estimate (that
is around sample N).  it approximately decouples the result from small
variations of N.

>> As far as I can figure, there would be no use for it in those algortihms,
>> but I thought I'd ask anyway.

now you have a second opinion.

-- 

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."

Reply by Jon Harris ●May 24, 20052005-05-24

"PaulWiik" <paul@wiik.net> wrote in message
news:cr2dnT0II-34dQ_fRVn-1A@giganews.com...
> >Using a window would simply spoil the results of AMDF, ASDF or
> >autocorrelation.
> Thanks, that's what I thought, but I see it all more clear now.
>
> > These algorithms are looking for the lag which gives
> > the maximum similarity between two equal sized chunks of the signal.
>
> I find your simple description of the algorithms incredibly clear. I think
> I could actually have implemented such an algorithm, with no prior
> knowledge, on the basis of that one sentence.
>
> And, come to think of it, I now realize what Robert Bristow meant when he
> said they inherently include harmonics. Lesson Learnt: Think twice before
> posting stupid questions.

Actually PaulWiik, I have found your posts to be among the better ones in this
group!  You present your past work and questions clearly and respond quickly to
follow-up questions.  Keep up the good work.

Reply by PaulWiik ●May 24, 20052005-05-24

Robert bristow-johnson wrote:
> now you have a second opinion.
Ok I think I see one of the advantages of one the firstuses you describe:

> w(n-N) is a window symmetrical around sample number N which
> means you are trying to estimate pitch around sample N.
I think this is what I stumbled accross during my testing tonight, but
instead of applying the window, I narrowed the scan of n as to avoid
partial cancellation of higher frequencies with small frequency variations
in a big buffer.

I have a hard time understanding all these mathematical expressions, so if
you feel I haven't listened to your advice, it's because I don't
understand, rather than because I don't want to.
I have a 5 year Electronics/telecom education where I'm sure I was
supposed to learn all these maths, but it's more than 10 years ago, and I
have never used it since.

Jon Harris wrote:
> You present your past work and questions clearly and respond quickly to
>follow-up questions.  Keep up the good work.

Thanks!

Here is what I did so far:
- I implemented a combination of AutoCorrelation and AMDF, something
like:
 combined=AC/(AMDF+1)
Come to think of it, I forgot to try replacing AutoCorr with ASDF.

I did this in a function where I also throw in the frequency I want to
look for (the same way as I used Goertzel). 
I then call this function for all the frequencies in the musical scale I'm
interested in.
From a CPU load point of view I very much like this approach, since I hope
to detect roughly the whereabouts of the fundamental by doing 40 scans (my
testing tone range, >3 octaves). It could definitvely save a lot of CPU.
The function calculates tau on the basis of the sampling rate and
frequency param.

What I discovered next was that I could easily reduce down to scanning n
from 0 to tau*3 (only if tau*3<(N/2) -to stay in the buffer). This not
only saved a lot of computing, but also seemed to help detection! 
To me this proves, probably in a low-fi way, that what Robert
Bristow-Johnson pointed out about using windows even with AMDF/ASDF has
got advantages.

My plan is that after I've detected the "rough" fundamental, I'll do a
full scan with tau from the period of the tone below, and up to the tone
above to try to find the "exact pitch".

Current main concerns:
I still struggle with "Shhh" sounds having lower harmonies that end up
detcted (a tendency to occur around 700hz), and when the female vocalists
voice "break" (by intention). I'm not sure what the english term is, but
it's the kind of "sounding sexy" on low tones thing they do. 

I'm displaying the magnitudes in a plot, and there is one thing I do not
understand. A lot of descriptions on pitch-detection say you should pick
the first "peak" over a certain treshold. In my case, whith this female
singer, I always get a powerful peak one octave below what definitively
must be the musical pitch. 
Does this mean I've messed up my calculation of tau?

I use Tau = samplerate/frequency. 

I believe I get the correct results by using (samplerate*2)/frequency?
but this creates a tau>(N/2) for 110hz (N=1024, samplerate=44100).

I would appreciate any hints on the "shhh sounds", and the yet very
undefined "Sexy voice" -problems.
On Shh:
I guess It won't help filtering out high frequencies, since the problem is
in sub-harmonics (700hz etc)?

Is this an issue where it could help running a new function
(correlation?)on my set of magnitudes to try to detect the periods of the
harmonics?

This message was sent using the Comp.DSP web interface on
www.DSPRelated.com

Previous12 Next

Realtime pitch detection - How does the different algorithms compare

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group