in article 6N-dnXYkfZuwpQvfRVn-qg@giganews.com, PaulWiik at paul@wiik.net
wrote on 05/26/2005 16:57:

> I feel I have made som major progress now. I had a lot of errors in my
> code prior to the last post. I believe I've been able to correct most of
> thit now (some of them actually have quite good results though...)
> 
> In case someone is interested, I've posted some images showing how the
> different functions compare (in a diagram). Could be interesting for
> others considering the different functions, and combinations I guess.
> NOTE: There is a typo in the images. The white bars (Combo) are really an
> average between Goertzel, AMDF and ADSF
> 
> Voice singnals:
> http://photos11.flickr.com/15818874_ad192fb4d1_o.jpg
> http://photos14.flickr.com/15818868_3b02a21d49_o.jpg
> 
> Synthetic 880hz:
> http://photos10.flickr.com/15822663_b9806d9f34_o.jpg
> Synthetic 440hz:
> http://photos14.flickr.com/15822677_1371e8438c_o.jpg

one possibility to consider is that you can have a periodic function with
fundamental frequency of f0 (period is 1/f0), but have no energy at the
fundamental f0.  i don't think Goertzel will be of much help then.

Goertzel is essentially a tuned circuit that is resonant at your trial
frequency.  perhaps instead of a filter with a single resonant frequency,
you might try a comb filter instead and have the harmonics pitch in, but
then you'll find it equivalent to AMDF or ASDF (depending on how you measure
the output amplitude of the comb filter).

> I would appreciate ideas on how I could combine ADSF/AMDF with Goertzel to
> reduce error level.

comb filters with teeth spaced at multiples of f0 where 1/f0 is not an
integer number of samples require interpolation.


-- 

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."

Here are a screenshot of one of the problematic areas.
http://photos11.flickr.com/15832939_12ba6f6a0a_o.jpg

This is from a portion where I think the voice are what Jim Adamthwaite
referred to as "laid-back & sultry". 
He said:
>The "breaking" voice sound (laid-back & sultry) amounts to approx. 20Hz
>amplitude modulation (gating) of the voice pitch.
>Effectively this adds sidebands to the signal, plus likely start & stop >
artefacts. This can confuse things, particularly if your (possibly 
> short) sound buffer holds only the gaps or the start & end of 
> bursts, but not the body of the tone.

I've not yet implemented any correction like he suggested. Longer buffers
would most certainly fix the Amplitude modulation, and then I just have to
hope that even after the magnitude reduction effects of amp modulation,
main tone is high enough to be detected through sidebands and start / stop
artefacts.

Notice how the Goertzel and ASDF differ between D and D# in the first
Goertzel peak of this image.
I wish I had included higher frequences, because that would maybe have
justified the erratic results of ASDF to the far right (maybe a peak on
the next D there?). 
However, I don't want pitches that high in the musical score (unless it
could be used indirectly, maybe along with the goertzel to justify that
lower D).

From experience so far I think:
1. If the max Goertzel peak is <500hz it's almost always what I'm looking
for.
2. If there is a big number of peaks in ADSF, more than 10 perhaps, it's
likely to be a Shhh-sound, and the results can be disregarded.

Problems in ADSF could maybe be due to my less than ideal handling of
keeping in the buffer where (n+tau)>N. Currently I just stop.
Allthough this should be minimal because of the Blackman-Harris window I
guess. At least for small taus. And in bigger taus I don't seem to have
any relevant problems.

		
This message was sent using the Comp.DSP web interface on
www.DSPRelated.com

Hi again,

I feel I have made som major progress now. I had a lot of errors in my
code prior to the last post. I believe I've been able to correct most of
thit now (some of them actually have quite good results though...)

In case someone is interested, I've posted some images showing how the
different functions compare (in a diagram). Could be interesting for
others considering the different functions, and combinations I guess.
NOTE: There is a typo in the images. The white bars (Combo) are really an
average between Goertzel, AMDF and ADSF

Voice singnals:
http://photos11.flickr.com/15818874_ad192fb4d1_o.jpg
http://photos14.flickr.com/15818868_3b02a21d49_o.jpg

Synthetic 880hz:
http://photos10.flickr.com/15822663_b9806d9f34_o.jpg
Synthetic 440hz:
http://photos14.flickr.com/15822677_1371e8438c_o.jpg

The waveform (after Blackman-Harris) are displayed in silver below the
magnitudes.
The Blue Line marks the treshold.

I tried the AMDF/ADSF and Autocorrelation without Blackman-Harris, and got
massive Phase-issues. Resulting in "waves flowing over my result bins".
Very visible on frequencies where phase only shifted slightly from buffer
to buffer.

So again: Robert B-J was completely right in what he said about the
Windows for these Algorithms.

Here is my master function calculating the magnitudes. It's in
Pascal/Delphi, and I would welcome any comments.

sf := power(2,1/12);
len := BufLength;
for toneIndex := 1 to NumTones-1 do  // Iterate all the tones
begin
  // Calc frequency for this tone
  Frequency := round(110*power(sf,toneIndex)); 

  tau := Round(0.5*(Samplerate)/Frequency);
  ACmag := 0;  // AutoCorrelation
  dmag := 0;   // AMDF
  smag := 0;   // ADSF

  coeff := 2*cos(2*pi*frequency/samplerate);  //For Goertzel

  skn:=0;
  skn1:=0;
  skn2:=0;

  for i := 0 to len-1 do
  begin
    Val :=Buffer[i];
    if (i+tau<len) then  // Ooops, Hints on handling this are welcome :)
    begin
      ACmag := ACmag+(val*Buffer[i+tau]);  //Autocorrelation
      diff := Buffer[i+tau]-val;
      dmag := dmag+(abs(diff));            //AMDF
      smag := smag+power(diff,2);          //ADSF 
    end;
    // Goertzel
    Skn := coeff*Skn1 - Skn2 + val;
    Skn2 := Skn1;
    Skn1 := Skn;
  end;


  NumIterations := len-tau;
  toneACmag[toneIndex] := -1* (ACmag /(NumIterations*2));
  toneDmag[toneIndex]  := dmag /(NumIterations*2);
  toneSmag[toneIndex] := smag /(NumIterations*2);
  GoertzelMag[toneIndex] := 0.003*
          sqrt(power(skn1,2)+power(skn2,2)-skn1*skn2*coeff);

  // Combo
  ToneMagnitude[toneIndex] := (toneDMag[toneIndex]
                               +toneSMag[toneIndex]
                               +GoertzelMag[toneIndex]
                               )/3;
end;

I haven't even considered further optimizing yet, and it doesn't look like
I need to worry. - Calculating this (44.1khz samplerate, bufferlength=1024)
for 40 tones + displaying the results (every 100msec) + some other
functionality consumes 0.8% of one of my CPUs (my computer is a dual AMD
P1800+, but application is not multithreaded)
So I think I can do rough Pitch Detection this way, and then start digging
into details for finding the exact frequency when I know in what area to
look.

I would appreciate ideas on how I could combine ADSF/AMDF with Goertzel to
reduce error level.

		
This message was sent using the Comp.DSP web interface on
www.DSPRelated.com

Karin wrote:
>If you are interested in the phase vocoder techniques I would recommend 
>reading the tutorials on Stephan Bernsee's great page
www.dspdimension.com ! 
>He is really good in explaining things, so have a look! ( The page is
mainly 
>about pitch shifting, but most pitch shifting algorithms use the phase 
>vocoder!)
>
>And of course there is the tutorial by Mark Dolson! 
>http://www.panix.com/~jens/pvoc-dolson.par
>
>Anyways, you don't need to implement the phase vocoder, there is much
source 
>code freely available :-)
Thank you for the links!
I've briefely looked at them and they look informative so I will
definitively read them.

		
This message was sent using the Comp.DSP web interface on
www.DSPRelated.com

Hi Paul!

"PaulWiik" <paul@wiik.net> schrieb im Newsbeitrag 
news:AIednYMGesdmUg_fRVn-hw@giganews.com...
> >Have you looked up phase vocoder techiques?  These algorithms use complex
>>FFT results plus historical phase data to resolve single frequencies
>>between the FFT bins even when using short buffers compared to the tonal
>>period(s) of interest.  They seem to provide decent resolution quickly,
>>even in the presence of some types of noise.
> Thanks for the tip. I will try looking it up, and see if I can understand
> how to implement it.
>
If you are interested in the phase vocoder techniques I would recommend 
reading the tutorials on Stephan Bernsee's great page www.dspdimension.com ! 
He is really good in explaining things, so have a look! ( The page is mainly 
about pitch shifting, but most pitch shifting algorithms use the phase 
vocoder!)

And of course there is the tutorial by Mark Dolson! 
http://www.panix.com/~jens/pvoc-dolson.par

Anyways, you don't need to implement the phase vocoder, there is much source 
code freely available :-)

Good luck!
Karin

>The "breaking" voice sound (laid-back & sultry) amounts to approx. 20Hz
>amplitude modulation (gating) of the voice pitch.  Effectively this adds
>sidebands to the signal, plus likely start & stop artefacts.  This can
>confuse things, particularly if your (possibly short) sound buffer holds
>only the gaps or the start & end of bursts, but not the body of the
tone.
Wow, thanks!
What you say makes perfect sense, but it would probably have taken me
weeks of analysis to find out by myself. Not to mention the time I would
have spent on trying to fix it without digging into what happened to the
signal in the first place.

>Better still, if you have enough CPU horsepower, use short buffers. 
Append
>the most recently tested buffer(s) contents onto the end of your new
>incoming buffer & run your pitch detection over the lot.  A given buffer
>will thus be used in the pitch detection process several times.  This
way
>you get short-buffer lag, but long-buffer gap bridging.
Thanks for the idea.
If I store the magnitudes for the different Taus in bins, I guess I could
re-use these magnuitudes from the previous buffer(s), and weigh them into
the consideration of the "current buffer". If that works, it wouldn't
require that much more CPU, since the inner-loop won't grow. 
I think it would add just one memory fetch and one multiplication for each
tau in the outer-loop.
That way, I can also experiment with different weights to historical
magnitudes.
		
This message was sent using the Comp.DSP web interface on
www.DSPRelated.com

> Current main concerns:
> I still struggle with "Shhh" sounds having lower harmonies that end up
> detcted (a tendency to occur around 700hz), and when the female vocalists
> voice "break" (by intention). I'm not sure what the english term is, but
> it's the kind of "sounding sexy" on low tones thing they do.

The "breaking" voice sound (laid-back & sultry) amounts to approx. 20Hz
amplitude modulation (gating) of the voice pitch.  Effectively this adds
sidebands to the signal, plus likely start & stop artefacts.  This can
confuse things, particularly if your (possibly short) sound buffer holds
only the gaps or the start & end of bursts, but not the body of the tone.

You may have to ensure that your buffer is long enough to bridge the gap &
hold enough samples of the "on" voice burst for it to dominate the pitch
detection result.  However, a long buffer increases detection lag.  This may
be reduced by acquiring one buffer while doing your pitch-detecting on the
previous buffer.

Better still, if you have enough CPU horsepower, use short buffers.  Append
the most recently tested buffer(s) contents onto the end of your new
incoming buffer & run your pitch detection over the lot.  A given buffer
will thus be used in the pitch detection process several times.  This way
you get short-buffer lag, but long-buffer gap bridging.

Jim Adamthwaite

After I sent the last post yesterday night (Norway time) I was lying in
bed, and I couldn't sleep, because I think I figured out why I had an
octave error in my pitch detection.

The way the signal and the lagged signal are compared, the "first peak"
occurs when Tau is half the period of the detected frequency, not 1x the
period.

So, to look for the energy of a specific frequency, I should use:
Tau = 1/2*(samplerate/frequency);

My application was comparing the period with the next period, so that's
why I would get peaks on one octave below the main formant.

Probably, the normal approach, by "sweeping the Tau" to find the main
formant use som formula like:
freq = 2*(samplerate/tau) -to calculate the frequency.

I think this might help with my "Shhh" problem, because I think it will be
shifted one octave up. (what I used to detect at around 700hz will now be
1400, and outside my range).

Now to my newly formulated question on Goertzel/Autocorrelation.

The way I see it (from a Programming point of view), I devide the
AMDF/ASDF into two loops. 
The outer-loop: what I think about as sweeping the tau.
The inner-loop: sweep n to sum up the magnitude for the current tau.

I'm thinking:
In the outer loop, I can calculate the coeffisient required for the
Goertzel (on the basis that we have a tau, that basically represents a
frequency)
In the inner loop, since we are already reading x(n), We might stuff this
into the Goertzel algorithm.
Then, when inner-loop is done, we can calculate the absolute magnitude of
the Goertzel, and add this to the equation along with maybe a combined
AMDF/ASDF.

So again back to a question I asked previously:
Will the goertzel pick up other info than the ASDF/AMDF ?

Goertzel seems to detect energy of a frequency in "current period", while
ASDF/AMDF seems to detect energy of a frequency in a combination between
current period, and current period +0.5

If this is the case, I think a combination of the two could help reducing
error-rate, while not adding that much to computation.

Best Regards,
Paul
		
This message was sent using the Comp.DSP web interface on
www.DSPRelated.com

Robert bristow-johnson wrote:
> now you have a second opinion.
Ok I think I see one of the advantages of one the firstuses you describe:

> w(n-N) is a window symmetrical around sample number N which
> means you are trying to estimate pitch around sample N.
I think this is what I stumbled accross during my testing tonight, but
instead of applying the window, I narrowed the scan of n as to avoid
partial cancellation of higher frequencies with small frequency variations
in a big buffer.

I have a hard time understanding all these mathematical expressions, so if
you feel I haven't listened to your advice, it's because I don't
understand, rather than because I don't want to.
I have a 5 year Electronics/telecom education where I'm sure I was
supposed to learn all these maths, but it's more than 10 years ago, and I
have never used it since.

Jon Harris wrote:
> You present your past work and questions clearly and respond quickly to
>follow-up questions.  Keep up the good work.

Thanks!

Here is what I did so far:
- I implemented a combination of AutoCorrelation and AMDF, something
like:
 combined=AC/(AMDF+1)
Come to think of it, I forgot to try replacing AutoCorr with ASDF.

I did this in a function where I also throw in the frequency I want to
look for (the same way as I used Goertzel). 
I then call this function for all the frequencies in the musical scale I'm
interested in.
From a CPU load point of view I very much like this approach, since I hope
to detect roughly the whereabouts of the fundamental by doing 40 scans (my
testing tone range, >3 octaves). It could definitvely save a lot of CPU.
The function calculates tau on the basis of the sampling rate and
frequency param.

What I discovered next was that I could easily reduce down to scanning n
from 0 to tau*3 (only if tau*3<(N/2) -to stay in the buffer). This not
only saved a lot of computing, but also seemed to help detection! 
To me this proves, probably in a low-fi way, that what Robert
Bristow-Johnson pointed out about using windows even with AMDF/ASDF has
got advantages.

My plan is that after I've detected the "rough" fundamental, I'll do a
full scan with tau from the period of the tone below, and up to the tone
above to try to find the "exact pitch".

Current main concerns:
I still struggle with "Shhh" sounds having lower harmonies that end up
detcted (a tendency to occur around 700hz), and when the female vocalists
voice "break" (by intention). I'm not sure what the english term is, but
it's the kind of "sounding sexy" on low tones thing they do. 

I'm displaying the magnitudes in a plot, and there is one thing I do not
understand. A lot of descriptions on pitch-detection say you should pick
the first "peak" over a certain treshold. In my case, whith this female
singer, I always get a powerful peak one octave below what definitively
must be the musical pitch. 
Does this mean I've messed up my calculation of tau?

I use Tau = samplerate/frequency. 

I believe I get the correct results by using (samplerate*2)/frequency?
but this creates a tau>(N/2) for 110hz (N=1024, samplerate=44100).

I would appreciate any hints on the "shhh sounds", and the yet very
undefined "Sexy voice" -problems.
On Shh:
I guess It won't help filtering out high frequencies, since the problem is
in sub-harmonics (700hz etc)?

Is this an issue where it could help running a new function
(correlation?)on my set of magnitudes to try to detect the periods of the
harmonics?

This message was sent using the Comp.DSP web interface on
www.DSPRelated.com

"PaulWiik" <paul@wiik.net> wrote in message
news:cr2dnT0II-34dQ_fRVn-1A@giganews.com...
> >Using a window would simply spoil the results of AMDF, ASDF or
> >autocorrelation.
> Thanks, that's what I thought, but I see it all more clear now.
>
> > These algorithms are looking for the lag which gives
> > the maximum similarity between two equal sized chunks of the signal.
>
> I find your simple description of the algorithms incredibly clear. I think
> I could actually have implemented such an algorithm, with no prior
> knowledge, on the basis of that one sentence.
>
> And, come to think of it, I now realize what Robert Bristow meant when he
> said they inherently include harmonics. Lesson Learnt: Think twice before
> posting stupid questions.

Actually PaulWiik, I have found your posts to be among the better ones in this
group!  You present your past work and questions clearly and respond quickly to
follow-up questions.  Keep up the good work.