comp.dsp | Realtime pitch detection - How does the different algorithms compare| page 2

Reply by PaulWiik ●May 25, 20052005-05-25

After I sent the last post yesterday night (Norway time) I was lying in
bed, and I couldn't sleep, because I think I figured out why I had an
octave error in my pitch detection.

The way the signal and the lagged signal are compared, the "first peak"
occurs when Tau is half the period of the detected frequency, not 1x the
period.

So, to look for the energy of a specific frequency, I should use:
Tau = 1/2*(samplerate/frequency);

My application was comparing the period with the next period, so that's
why I would get peaks on one octave below the main formant.

Probably, the normal approach, by "sweeping the Tau" to find the main
formant use som formula like:
freq = 2*(samplerate/tau) -to calculate the frequency.

I think this might help with my "Shhh" problem, because I think it will be
shifted one octave up. (what I used to detect at around 700hz will now be
1400, and outside my range).

Now to my newly formulated question on Goertzel/Autocorrelation.

The way I see it (from a Programming point of view), I devide the
AMDF/ASDF into two loops. 
The outer-loop: what I think about as sweeping the tau.
The inner-loop: sweep n to sum up the magnitude for the current tau.

I'm thinking:
In the outer loop, I can calculate the coeffisient required for the
Goertzel (on the basis that we have a tau, that basically represents a
frequency)
In the inner loop, since we are already reading x(n), We might stuff this
into the Goertzel algorithm.
Then, when inner-loop is done, we can calculate the absolute magnitude of
the Goertzel, and add this to the equation along with maybe a combined
AMDF/ASDF.

So again back to a question I asked previously:
Will the goertzel pick up other info than the ASDF/AMDF ?

Goertzel seems to detect energy of a frequency in "current period", while
ASDF/AMDF seems to detect energy of a frequency in a combination between
current period, and current period +0.5

If this is the case, I think a combination of the two could help reducing
error-rate, while not adding that much to computation.

Best Regards,
Paul
		
This message was sent using the Comp.DSP web interface on
www.DSPRelated.com

Reply by Jim Adamthwaite ●May 25, 20052005-05-25

> Current main concerns:
> I still struggle with "Shhh" sounds having lower harmonies that end up
> detcted (a tendency to occur around 700hz), and when the female vocalists
> voice "break" (by intention). I'm not sure what the english term is, but
> it's the kind of "sounding sexy" on low tones thing they do.

The "breaking" voice sound (laid-back & sultry) amounts to approx. 20Hz
amplitude modulation (gating) of the voice pitch.  Effectively this adds
sidebands to the signal, plus likely start & stop artefacts.  This can
confuse things, particularly if your (possibly short) sound buffer holds
only the gaps or the start & end of bursts, but not the body of the tone.

You may have to ensure that your buffer is long enough to bridge the gap &
hold enough samples of the "on" voice burst for it to dominate the pitch
detection result.  However, a long buffer increases detection lag.  This may
be reduced by acquiring one buffer while doing your pitch-detecting on the
previous buffer.

Better still, if you have enough CPU horsepower, use short buffers.  Append
the most recently tested buffer(s) contents onto the end of your new
incoming buffer & run your pitch detection over the lot.  A given buffer
will thus be used in the pitch detection process several times.  This way
you get short-buffer lag, but long-buffer gap bridging.

Jim Adamthwaite

Reply by PaulWiik ●May 25, 20052005-05-25

>The "breaking" voice sound (laid-back & sultry) amounts to approx. 20Hz
>amplitude modulation (gating) of the voice pitch.  Effectively this adds
>sidebands to the signal, plus likely start & stop artefacts.  This can
>confuse things, particularly if your (possibly short) sound buffer holds
>only the gaps or the start & end of bursts, but not the body of the
tone.
Wow, thanks!
What you say makes perfect sense, but it would probably have taken me
weeks of analysis to find out by myself. Not to mention the time I would
have spent on trying to fix it without digging into what happened to the
signal in the first place.

>Better still, if you have enough CPU horsepower, use short buffers. 
Append
>the most recently tested buffer(s) contents onto the end of your new
>incoming buffer & run your pitch detection over the lot.  A given buffer
>will thus be used in the pitch detection process several times.  This
way
>you get short-buffer lag, but long-buffer gap bridging.
Thanks for the idea.
If I store the magnitudes for the different Taus in bins, I guess I could
re-use these magnuitudes from the previous buffer(s), and weigh them into
the consideration of the "current buffer". If that works, it wouldn't
require that much more CPU, since the inner-loop won't grow. 
I think it would add just one memory fetch and one multiplication for each
tau in the outer-loop.
That way, I can also experiment with different weights to historical
magnitudes.
		
This message was sent using the Comp.DSP web interface on
www.DSPRelated.com

Reply by Karin Dressler ●May 25, 20052005-05-25

Hi Paul!

"PaulWiik" <paul@wiik.net> schrieb im Newsbeitrag 
news:AIednYMGesdmUg_fRVn-hw@giganews.com...
> >Have you looked up phase vocoder techiques?  These algorithms use complex
>>FFT results plus historical phase data to resolve single frequencies
>>between the FFT bins even when using short buffers compared to the tonal
>>period(s) of interest.  They seem to provide decent resolution quickly,
>>even in the presence of some types of noise.
> Thanks for the tip. I will try looking it up, and see if I can understand
> how to implement it.
>
If you are interested in the phase vocoder techniques I would recommend 
reading the tutorials on Stephan Bernsee's great page www.dspdimension.com ! 
He is really good in explaining things, so have a look! ( The page is mainly 
about pitch shifting, but most pitch shifting algorithms use the phase 
vocoder!)

And of course there is the tutorial by Mark Dolson! 
http://www.panix.com/~jens/pvoc-dolson.par

Anyways, you don't need to implement the phase vocoder, there is much source 
code freely available :-)

Good luck!
Karin

Reply by PaulWiik ●May 25, 20052005-05-25

Karin wrote:
>If you are interested in the phase vocoder techniques I would recommend 
>reading the tutorials on Stephan Bernsee's great page
www.dspdimension.com ! 
>He is really good in explaining things, so have a look! ( The page is
mainly 
>about pitch shifting, but most pitch shifting algorithms use the phase 
>vocoder!)
>
>And of course there is the tutorial by Mark Dolson! 
>http://www.panix.com/~jens/pvoc-dolson.par
>
>Anyways, you don't need to implement the phase vocoder, there is much
source 
>code freely available :-)
Thank you for the links!
I've briefely looked at them and they look informative so I will
definitively read them.

		
This message was sent using the Comp.DSP web interface on
www.DSPRelated.com

Reply by PaulWiik ●May 26, 20052005-05-26

Hi again,

I feel I have made som major progress now. I had a lot of errors in my
code prior to the last post. I believe I've been able to correct most of
thit now (some of them actually have quite good results though...)

In case someone is interested, I've posted some images showing how the
different functions compare (in a diagram). Could be interesting for
others considering the different functions, and combinations I guess.
NOTE: There is a typo in the images. The white bars (Combo) are really an
average between Goertzel, AMDF and ADSF

Voice singnals:
http://photos11.flickr.com/15818874_ad192fb4d1_o.jpg
http://photos14.flickr.com/15818868_3b02a21d49_o.jpg

Synthetic 880hz:
http://photos10.flickr.com/15822663_b9806d9f34_o.jpg
Synthetic 440hz:
http://photos14.flickr.com/15822677_1371e8438c_o.jpg

The waveform (after Blackman-Harris) are displayed in silver below the
magnitudes.
The Blue Line marks the treshold.

I tried the AMDF/ADSF and Autocorrelation without Blackman-Harris, and got
massive Phase-issues. Resulting in "waves flowing over my result bins".
Very visible on frequencies where phase only shifted slightly from buffer
to buffer.

So again: Robert B-J was completely right in what he said about the
Windows for these Algorithms.

Here is my master function calculating the magnitudes. It's in
Pascal/Delphi, and I would welcome any comments.

sf := power(2,1/12);
len := BufLength;
for toneIndex := 1 to NumTones-1 do  // Iterate all the tones
begin
  // Calc frequency for this tone
  Frequency := round(110*power(sf,toneIndex)); 

  tau := Round(0.5*(Samplerate)/Frequency);
  ACmag := 0;  // AutoCorrelation
  dmag := 0;   // AMDF
  smag := 0;   // ADSF

  coeff := 2*cos(2*pi*frequency/samplerate);  //For Goertzel

  skn:=0;
  skn1:=0;
  skn2:=0;

  for i := 0 to len-1 do
  begin
    Val :=Buffer[i];
    if (i+tau<len) then  // Ooops, Hints on handling this are welcome :)
    begin
      ACmag := ACmag+(val*Buffer[i+tau]);  //Autocorrelation
      diff := Buffer[i+tau]-val;
      dmag := dmag+(abs(diff));            //AMDF
      smag := smag+power(diff,2);          //ADSF 
    end;
    // Goertzel
    Skn := coeff*Skn1 - Skn2 + val;
    Skn2 := Skn1;
    Skn1 := Skn;
  end;


  NumIterations := len-tau;
  toneACmag[toneIndex] := -1* (ACmag /(NumIterations*2));
  toneDmag[toneIndex]  := dmag /(NumIterations*2);
  toneSmag[toneIndex] := smag /(NumIterations*2);
  GoertzelMag[toneIndex] := 0.003*
          sqrt(power(skn1,2)+power(skn2,2)-skn1*skn2*coeff);

  // Combo
  ToneMagnitude[toneIndex] := (toneDMag[toneIndex]
                               +toneSMag[toneIndex]
                               +GoertzelMag[toneIndex]
                               )/3;
end;

I haven't even considered further optimizing yet, and it doesn't look like
I need to worry. - Calculating this (44.1khz samplerate, bufferlength=1024)
for 40 tones + displaying the results (every 100msec) + some other
functionality consumes 0.8% of one of my CPUs (my computer is a dual AMD
P1800+, but application is not multithreaded)
So I think I can do rough Pitch Detection this way, and then start digging
into details for finding the exact frequency when I know in what area to
look.

I would appreciate ideas on how I could combine ADSF/AMDF with Goertzel to
reduce error level.

		
This message was sent using the Comp.DSP web interface on
www.DSPRelated.com

Reply by PaulWiik ●May 26, 20052005-05-26

Here are a screenshot of one of the problematic areas.
http://photos11.flickr.com/15832939_12ba6f6a0a_o.jpg

This is from a portion where I think the voice are what Jim Adamthwaite
referred to as "laid-back & sultry". 
He said:
>The "breaking" voice sound (laid-back & sultry) amounts to approx. 20Hz
>amplitude modulation (gating) of the voice pitch.
>Effectively this adds sidebands to the signal, plus likely start & stop >
artefacts. This can confuse things, particularly if your (possibly 
> short) sound buffer holds only the gaps or the start & end of 
> bursts, but not the body of the tone.

I've not yet implemented any correction like he suggested. Longer buffers
would most certainly fix the Amplitude modulation, and then I just have to
hope that even after the magnitude reduction effects of amp modulation,
main tone is high enough to be detected through sidebands and start / stop
artefacts.

Notice how the Goertzel and ASDF differ between D and D# in the first
Goertzel peak of this image.
I wish I had included higher frequences, because that would maybe have
justified the erratic results of ASDF to the far right (maybe a peak on
the next D there?). 
However, I don't want pitches that high in the musical score (unless it
could be used indirectly, maybe along with the goertzel to justify that
lower D).

From experience so far I think:
1. If the max Goertzel peak is <500hz it's almost always what I'm looking
for.
2. If there is a big number of peaks in ADSF, more than 10 perhaps, it's
likely to be a Shhh-sound, and the results can be disregarded.

Problems in ADSF could maybe be due to my less than ideal handling of
keeping in the buffer where (n+tau)>N. Currently I just stop.
Allthough this should be minimal because of the Blackman-Harris window I
guess. At least for small taus. And in bigger taus I don't seem to have
any relevant problems.

		
This message was sent using the Comp.DSP web interface on
www.DSPRelated.com

Reply by robert bristow-johnson ●May 26, 20052005-05-26

in article 6N-dnXYkfZuwpQvfRVn-qg@giganews.com, PaulWiik at paul@wiik.net
wrote on 05/26/2005 16:57:

> I feel I have made som major progress now. I had a lot of errors in my
> code prior to the last post. I believe I've been able to correct most of
> thit now (some of them actually have quite good results though...)
> 
> In case someone is interested, I've posted some images showing how the
> different functions compare (in a diagram). Could be interesting for
> others considering the different functions, and combinations I guess.
> NOTE: There is a typo in the images. The white bars (Combo) are really an
> average between Goertzel, AMDF and ADSF
> 
> Voice singnals:
> http://photos11.flickr.com/15818874_ad192fb4d1_o.jpg
> http://photos14.flickr.com/15818868_3b02a21d49_o.jpg
> 
> Synthetic 880hz:
> http://photos10.flickr.com/15822663_b9806d9f34_o.jpg
> Synthetic 440hz:
> http://photos14.flickr.com/15822677_1371e8438c_o.jpg

one possibility to consider is that you can have a periodic function with
fundamental frequency of f0 (period is 1/f0), but have no energy at the
fundamental f0.  i don't think Goertzel will be of much help then.

Goertzel is essentially a tuned circuit that is resonant at your trial
frequency.  perhaps instead of a filter with a single resonant frequency,
you might try a comb filter instead and have the harmonics pitch in, but
then you'll find it equivalent to AMDF or ASDF (depending on how you measure
the output amplitude of the comb filter).

> I would appreciate ideas on how I could combine ADSF/AMDF with Goertzel to
> reduce error level.

comb filters with teeth spaced at multiples of f0 where 1/f0 is not an
integer number of samples require interpolation.


-- 

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."

Previous 12Next

Realtime pitch detection - How does the different algorithms compare

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group