DSPRelated.com
Forums

Realtime pitch detection - How does the different algorithms compare

Started by PaulWiik May 23, 2005
After I sent the last post yesterday night (Norway time) I was lying in
bed, and I couldn't sleep, because I think I figured out why I had an
octave error in my pitch detection.

The way the signal and the lagged signal are compared, the "first peak"
occurs when Tau is half the period of the detected frequency, not 1x the
period.

So, to look for the energy of a specific frequency, I should use:
Tau = 1/2*(samplerate/frequency);

My application was comparing the period with the next period, so that's
why I would get peaks on one octave below the main formant.

Probably, the normal approach, by "sweeping the Tau" to find the main
formant use som formula like:
freq = 2*(samplerate/tau) -to calculate the frequency.

I think this might help with my "Shhh" problem, because I think it will be
shifted one octave up. (what I used to detect at around 700hz will now be
1400, and outside my range).

Now to my newly formulated question on Goertzel/Autocorrelation.

The way I see it (from a Programming point of view), I devide the
AMDF/ASDF into two loops. 
The outer-loop: what I think about as sweeping the tau.
The inner-loop: sweep n to sum up the magnitude for the current tau.

I'm thinking:
In the outer loop, I can calculate the coeffisient required for the
Goertzel (on the basis that we have a tau, that basically represents a
frequency)
In the inner loop, since we are already reading x(n), We might stuff this
into the Goertzel algorithm.
Then, when inner-loop is done, we can calculate the absolute magnitude of
the Goertzel, and add this to the equation along with maybe a combined
AMDF/ASDF.

So again back to a question I asked previously:
Will the goertzel pick up other info than the ASDF/AMDF ?

Goertzel seems to detect energy of a frequency in "current period", while
ASDF/AMDF seems to detect energy of a frequency in a combination between
current period, and current period +0.5

If this is the case, I think a combination of the two could help reducing
error-rate, while not adding that much to computation.

Best Regards,
Paul
		
This message was sent using the Comp.DSP web interface on
www.DSPRelated.com
> Current main concerns: > I still struggle with "Shhh" sounds having lower harmonies that end up > detcted (a tendency to occur around 700hz), and when the female vocalists > voice "break" (by intention). I'm not sure what the english term is, but > it's the kind of "sounding sexy" on low tones thing they do.
The "breaking" voice sound (laid-back & sultry) amounts to approx. 20Hz amplitude modulation (gating) of the voice pitch. Effectively this adds sidebands to the signal, plus likely start & stop artefacts. This can confuse things, particularly if your (possibly short) sound buffer holds only the gaps or the start & end of bursts, but not the body of the tone. You may have to ensure that your buffer is long enough to bridge the gap & hold enough samples of the "on" voice burst for it to dominate the pitch detection result. However, a long buffer increases detection lag. This may be reduced by acquiring one buffer while doing your pitch-detecting on the previous buffer. Better still, if you have enough CPU horsepower, use short buffers. Append the most recently tested buffer(s) contents onto the end of your new incoming buffer & run your pitch detection over the lot. A given buffer will thus be used in the pitch detection process several times. This way you get short-buffer lag, but long-buffer gap bridging. Jim Adamthwaite
>The "breaking" voice sound (laid-back & sultry) amounts to approx. 20Hz >amplitude modulation (gating) of the voice pitch. Effectively this adds >sidebands to the signal, plus likely start & stop artefacts. This can >confuse things, particularly if your (possibly short) sound buffer holds >only the gaps or the start & end of bursts, but not the body of the
tone. Wow, thanks! What you say makes perfect sense, but it would probably have taken me weeks of analysis to find out by myself. Not to mention the time I would have spent on trying to fix it without digging into what happened to the signal in the first place.
>Better still, if you have enough CPU horsepower, use short buffers.
Append
>the most recently tested buffer(s) contents onto the end of your new >incoming buffer & run your pitch detection over the lot. A given buffer >will thus be used in the pitch detection process several times. This
way
>you get short-buffer lag, but long-buffer gap bridging.
Thanks for the idea. If I store the magnitudes for the different Taus in bins, I guess I could re-use these magnuitudes from the previous buffer(s), and weigh them into the consideration of the "current buffer". If that works, it wouldn't require that much more CPU, since the inner-loop won't grow. I think it would add just one memory fetch and one multiplication for each tau in the outer-loop. That way, I can also experiment with different weights to historical magnitudes. This message was sent using the Comp.DSP web interface on www.DSPRelated.com
Hi Paul!

"PaulWiik" <paul@wiik.net> schrieb im Newsbeitrag 
news:AIednYMGesdmUg_fRVn-hw@giganews.com...
> >Have you looked up phase vocoder techiques? These algorithms use complex >>FFT results plus historical phase data to resolve single frequencies >>between the FFT bins even when using short buffers compared to the tonal >>period(s) of interest. They seem to provide decent resolution quickly, >>even in the presence of some types of noise. > Thanks for the tip. I will try looking it up, and see if I can understand > how to implement it. >
If you are interested in the phase vocoder techniques I would recommend reading the tutorials on Stephan Bernsee's great page www.dspdimension.com ! He is really good in explaining things, so have a look! ( The page is mainly about pitch shifting, but most pitch shifting algorithms use the phase vocoder!) And of course there is the tutorial by Mark Dolson! http://www.panix.com/~jens/pvoc-dolson.par Anyways, you don't need to implement the phase vocoder, there is much source code freely available :-) Good luck! Karin
Karin wrote:
>If you are interested in the phase vocoder techniques I would recommend >reading the tutorials on Stephan Bernsee's great page
www.dspdimension.com !
>He is really good in explaining things, so have a look! ( The page is
mainly
>about pitch shifting, but most pitch shifting algorithms use the phase >vocoder!) > >And of course there is the tutorial by Mark Dolson! >http://www.panix.com/~jens/pvoc-dolson.par > >Anyways, you don't need to implement the phase vocoder, there is much
source
>code freely available :-)
Thank you for the links! I've briefely looked at them and they look informative so I will definitively read them. This message was sent using the Comp.DSP web interface on www.DSPRelated.com
Hi again,

I feel I have made som major progress now. I had a lot of errors in my
code prior to the last post. I believe I've been able to correct most of
thit now (some of them actually have quite good results though...)

In case someone is interested, I've posted some images showing how the
different functions compare (in a diagram). Could be interesting for
others considering the different functions, and combinations I guess.
NOTE: There is a typo in the images. The white bars (Combo) are really an
average between Goertzel, AMDF and ADSF

Voice singnals:
http://photos11.flickr.com/15818874_ad192fb4d1_o.jpg
http://photos14.flickr.com/15818868_3b02a21d49_o.jpg

Synthetic 880hz:
http://photos10.flickr.com/15822663_b9806d9f34_o.jpg
Synthetic 440hz:
http://photos14.flickr.com/15822677_1371e8438c_o.jpg

The waveform (after Blackman-Harris) are displayed in silver below the
magnitudes.
The Blue Line marks the treshold.

I tried the AMDF/ADSF and Autocorrelation without Blackman-Harris, and got
massive Phase-issues. Resulting in "waves flowing over my result bins".
Very visible on frequencies where phase only shifted slightly from buffer
to buffer.

So again: Robert B-J was completely right in what he said about the
Windows for these Algorithms.

Here is my master function calculating the magnitudes. It's in
Pascal/Delphi, and I would welcome any comments.

sf := power(2,1/12);
len := BufLength;
for toneIndex := 1 to NumTones-1 do  // Iterate all the tones
begin
  // Calc frequency for this tone
  Frequency := round(110*power(sf,toneIndex)); 

  tau := Round(0.5*(Samplerate)/Frequency);
  ACmag := 0;  // AutoCorrelation
  dmag := 0;   // AMDF
  smag := 0;   // ADSF

  coeff := 2*cos(2*pi*frequency/samplerate);  //For Goertzel

  skn:=0;
  skn1:=0;
  skn2:=0;

  for i := 0 to len-1 do
  begin
    Val :=Buffer[i];
    if (i+tau<len) then  // Ooops, Hints on handling this are welcome :)
    begin
      ACmag := ACmag+(val*Buffer[i+tau]);  //Autocorrelation
      diff := Buffer[i+tau]-val;
      dmag := dmag+(abs(diff));            //AMDF
      smag := smag+power(diff,2);          //ADSF 
    end;
    // Goertzel
    Skn := coeff*Skn1 - Skn2 + val;
    Skn2 := Skn1;
    Skn1 := Skn;
  end;


  NumIterations := len-tau;
  toneACmag[toneIndex] := -1* (ACmag /(NumIterations*2));
  toneDmag[toneIndex]  := dmag /(NumIterations*2);
  toneSmag[toneIndex] := smag /(NumIterations*2);
  GoertzelMag[toneIndex] := 0.003*
          sqrt(power(skn1,2)+power(skn2,2)-skn1*skn2*coeff);

  // Combo
  ToneMagnitude[toneIndex] := (toneDMag[toneIndex]
                               +toneSMag[toneIndex]
                               +GoertzelMag[toneIndex]
                               )/3;
end;

I haven't even considered further optimizing yet, and it doesn't look like
I need to worry. - Calculating this (44.1khz samplerate, bufferlength=1024)
for 40 tones + displaying the results (every 100msec) + some other
functionality consumes 0.8% of one of my CPUs (my computer is a dual AMD
P1800+, but application is not multithreaded)
So I think I can do rough Pitch Detection this way, and then start digging
into details for finding the exact frequency when I know in what area to
look.

I would appreciate ideas on how I could combine ADSF/AMDF with Goertzel to
reduce error level.

		
This message was sent using the Comp.DSP web interface on
www.DSPRelated.com
Here are a screenshot of one of the problematic areas.
http://photos11.flickr.com/15832939_12ba6f6a0a_o.jpg

This is from a portion where I think the voice are what Jim Adamthwaite
referred to as "laid-back & sultry". 
He said:
>The "breaking" voice sound (laid-back & sultry) amounts to approx. 20Hz >amplitude modulation (gating) of the voice pitch. >Effectively this adds sidebands to the signal, plus likely start & stop >
artefacts. This can confuse things, particularly if your (possibly
> short) sound buffer holds only the gaps or the start & end of > bursts, but not the body of the tone.
I've not yet implemented any correction like he suggested. Longer buffers would most certainly fix the Amplitude modulation, and then I just have to hope that even after the magnitude reduction effects of amp modulation, main tone is high enough to be detected through sidebands and start / stop artefacts. Notice how the Goertzel and ASDF differ between D and D# in the first Goertzel peak of this image. I wish I had included higher frequences, because that would maybe have justified the erratic results of ASDF to the far right (maybe a peak on the next D there?). However, I don't want pitches that high in the musical score (unless it could be used indirectly, maybe along with the goertzel to justify that lower D). From experience so far I think: 1. If the max Goertzel peak is <500hz it's almost always what I'm looking for. 2. If there is a big number of peaks in ADSF, more than 10 perhaps, it's likely to be a Shhh-sound, and the results can be disregarded. Problems in ADSF could maybe be due to my less than ideal handling of keeping in the buffer where (n+tau)>N. Currently I just stop. Allthough this should be minimal because of the Blackman-Harris window I guess. At least for small taus. And in bigger taus I don't seem to have any relevant problems. This message was sent using the Comp.DSP web interface on www.DSPRelated.com
in article 6N-dnXYkfZuwpQvfRVn-qg@giganews.com, PaulWiik at paul@wiik.net
wrote on 05/26/2005 16:57:

> I feel I have made som major progress now. I had a lot of errors in my > code prior to the last post. I believe I've been able to correct most of > thit now (some of them actually have quite good results though...) > > In case someone is interested, I've posted some images showing how the > different functions compare (in a diagram). Could be interesting for > others considering the different functions, and combinations I guess. > NOTE: There is a typo in the images. The white bars (Combo) are really an > average between Goertzel, AMDF and ADSF > > Voice singnals: > http://photos11.flickr.com/15818874_ad192fb4d1_o.jpg > http://photos14.flickr.com/15818868_3b02a21d49_o.jpg > > Synthetic 880hz: > http://photos10.flickr.com/15822663_b9806d9f34_o.jpg > Synthetic 440hz: > http://photos14.flickr.com/15822677_1371e8438c_o.jpg
one possibility to consider is that you can have a periodic function with fundamental frequency of f0 (period is 1/f0), but have no energy at the fundamental f0. i don't think Goertzel will be of much help then. Goertzel is essentially a tuned circuit that is resonant at your trial frequency. perhaps instead of a filter with a single resonant frequency, you might try a comb filter instead and have the harmonics pitch in, but then you'll find it equivalent to AMDF or ASDF (depending on how you measure the output amplitude of the comb filter).
> I would appreciate ideas on how I could combine ADSF/AMDF with Goertzel to > reduce error level.
comb filters with teeth spaced at multiples of f0 where 1/f0 is not an integer number of samples require interpolation. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."