comp.dsp | Pitch Detection (various methods)

Hi All,

I've implemented a pitch detector using a pretty brute force method.
PCM audio -> FFT(512) -> DCT (512), then do auto-correlation using the
current data and a history buffer. It's working great. The application
assumes singing on a potentially noisy channel. (Headset mic or open
mic configurations are possible, Karaoke ). I do some signal
normalization in the Cepstral domain.

This seems like overkill, but time domain stuff, ala zero crossing or
something, makes some huge assumptions regarding signal to noise, and
even DC offset. My assumption is this technique will usually be more
robust against different signal to noise characteristics. I'd like to
try out some different algorithms, perhaps time domain. what else
would you try?

Any thoughts?

Regards,
MarkZ
Annosoft

Reply by robert bristow-johnson ●December 31, 20082008-12-31

On Dec 31, 12:14&#4294967295;pm, zotty <mzart...@annosoft.com> wrote:
>
> ... time domain stuff, ala zero crossing or
> something, makes some huge assumptions regarding signal to noise, and
> even DC offset. My assumption is this technique will usually be more
> robust against different signal to noise characteristics. I'd like to
> try out some different algorithms, perhaps time domain. what else
> would you try?
>
> Any thoughts?

look up "Average Magnitude Difference Function" (AMDF) or the similar
average squared difference function (sometimes called "ASDF") or
autocorrelation.  these methods make no assumptions regarding the
signal other than that there is some degree of periodicity.  they are
mostly equivalent to each other in theory (the ASDF hits a minimum
exactly where the autocorrelation hits a max).  i tried to make a
simple and formal description of ASDF in my old Wavetable Synthesis
101 paper (it's somewhere at musicdsp.org).  so even though there are
no zero-crossing issues, there are threshold issues that need to be
worked out to avoid the "octave problem".

even though there are no assumptions about the signal and noise, if
you are willing to make a few, you might consider pre-filtering the
signal going to the correlation operation.  sometimes DC-blocking and
LPFing can be helpful.

r b-j

Reply by Vladimir Vassilevsky ●December 31, 20082008-12-31


zotty wrote:
> Hi All,
> 
> I've implemented a pitch detector 

For what purpose?

> using a pretty brute force method.
> PCM audio -> FFT(512) -> DCT (512), then do auto-correlation using the
> current data and a history buffer.

Why smoke and mirrors instead of the trivial method of the normalized 
autocorrelation?

> It's working great. The application
> assumes singing on a potentially noisy channel. (Headset mic or open
> mic configurations are possible, Karaoke ). I do some signal
> normalization in the Cepstral domain.
> 
> This seems like overkill, but time domain stuff, ala zero crossing or
> something, makes some huge assumptions regarding signal to noise, and
> even DC offset. My assumption is this technique will usually be more
> robust against different signal to noise characteristics. I'd like to
> try out some different algorithms, perhaps time domain. what else
> would you try?

There used to be Dmitry Terez here, who claimed that he invented the 
ultimate absolute revolutinary top secret pitch detection algorithm 
superior to anything else. I wonder what happened to him.


Vladimir Vassilevsky
DSP and Mixed Signal Design Consultant
http://www.abvolt.com

Reply by zotty ●December 31, 20082008-12-31

On Dec 31, 1:53&#4294967295;pm, robert bristow-johnson <r...@audioimagination.com>
wrote:
> On Dec 31, 12:14&#4294967295;pm, zotty <mzart...@annosoft.com> wrote:
>
>
>
> > ... time domain stuff, ala zero crossing or
> > something, makes some huge assumptions regarding signal to noise, and
> > even DC offset. My assumption is this technique will usually be more
> > robust against different signal to noise characteristics. I'd like to
> > try out some different algorithms, perhaps time domain. what else
> > would you try?
>
> > Any thoughts?
>
> look up "Average Magnitude Difference Function" (AMDF) or the similar
> average squared difference function (sometimes called "ASDF") or
> autocorrelation. &#4294967295;these methods make no assumptions regarding the
> signal other than that there is some degree of periodicity. &#4294967295;they are
> mostly equivalent to each other in theory (the ASDF hits a minimum
> exactly where the autocorrelation hits a max). &#4294967295;i tried to make a
> simple and formal description of ASDF in my old Wavetable Synthesis
> 101 paper (it's somewhere at musicdsp.org). &#4294967295;so even though there are
> no zero-crossing issues, there are threshold issues that need to be
> worked out to avoid the "octave problem".
>
> even though there are no assumptions about the signal and noise, if
> you are willing to make a few, you might consider pre-filtering the
> signal going to the correlation operation. &#4294967295;sometimes DC-blocking and
> LPFing can be helpful.
>
> r b-j

Hey r,

thank you for the tips and expertise.

I immediately turned to papers covering AMDF. I often have trouble
translating these papers into working logic (sigh). it might just be
my wetware limitations. I'd like to take a crack at understanding and
I'm close,

AMDF(t) = - 1/L * SUM(i, 1 to L) of ABS(s(i) - s(i - t))

The AMDF of t is the sum of the magnitude differences of a forward
looking buffer size L against a delay line of size L. In code terms,
this looks like a single "for" loop, with a simple subtraction of the
current buffer against a delay line.

This looks wrong though. if it were the case s(i-t) would be s(i-L)
instead. I'm sorry for the clob headedness here, but does "t"
introduce another loop?
-- snippet without boundary checks -- given an infinite "audio_buffer"
and an integer "current_sample" compute AMDT on the current sample.

float temp = 0.0f;
for (i = 0; i < L; i++)
{
    temp += abs(audio_buffer[current_sample + i ] - audio_buffer
[current_sample -  L + i ] );
}
AMDT[current_sample] = temp/L;

Is that correct or is there actually another loop for "t" where the
current sample is differenced against everything, another "tap" in
there?

Thanks for your help.

Reply by Robert Adams ●December 31, 20082008-12-31

On Dec 31, 4:21&#4294967295;pm, zotty <mzart...@annosoft.com> wrote:
> On Dec 31, 1:53&#4294967295;pm, robert bristow-johnson <r...@audioimagination.com>
> wrote:
>
>
>
>
>
> > On Dec 31, 12:14&#4294967295;pm, zotty <mzart...@annosoft.com> wrote:
>
> > > ... time domain stuff, ala zero crossing or
> > > something, makes some huge assumptions regarding signal to noise, and
> > > even DC offset. My assumption is this technique will usually be more
> > > robust against different signal to noise characteristics. I'd like to
> > > try out some different algorithms, perhaps time domain. what else
> > > would you try?
>
> > > Any thoughts?
>
> > look up "Average Magnitude Difference Function" (AMDF) or the similar
> > average squared difference function (sometimes called "ASDF") or
> > autocorrelation. &#4294967295;these methods make no assumptions regarding the
> > signal other than that there is some degree of periodicity. &#4294967295;they are
> > mostly equivalent to each other in theory (the ASDF hits a minimum
> > exactly where the autocorrelation hits a max). &#4294967295;i tried to make a
> > simple and formal description of ASDF in my old Wavetable Synthesis
> > 101 paper (it's somewhere at musicdsp.org). &#4294967295;so even though there are
> > no zero-crossing issues, there are threshold issues that need to be
> > worked out to avoid the "octave problem".
>
> > even though there are no assumptions about the signal and noise, if
> > you are willing to make a few, you might consider pre-filtering the
> > signal going to the correlation operation. &#4294967295;sometimes DC-blocking and
> > LPFing can be helpful.
>
> > r b-j
>
> Hey r,
>
> thank you for the tips and expertise.
>
> I immediately turned to papers covering AMDF. I often have trouble
> translating these papers into working logic (sigh). it might just be
> my wetware limitations. I'd like to take a crack at understanding and
> I'm close,
>
> AMDF(t) = - 1/L * SUM(i, 1 to L) of ABS(s(i) - s(i - t))
>
> The AMDF of t is the sum of the magnitude differences of a forward
> looking buffer size L against a delay line of size L. In code terms,
> this looks like a single "for" loop, with a simple subtraction of the
> current buffer against a delay line.
>
> This looks wrong though. if it were the case s(i-t) would be s(i-L)
> instead. I'm sorry for the clob headedness here, but does "t"
> introduce another loop?
> -- snippet without boundary checks -- given an infinite "audio_buffer"
> and an integer "current_sample" compute AMDT on the current sample.
>
> float temp = 0.0f;
> for (i = 0; i < L; i++)
> {
> &#4294967295; &#4294967295; temp += abs(audio_buffer[current_sample + i ] - audio_buffer
> [current_sample - &#4294967295;L + i ] );}
>
> AMDT[current_sample] = temp/L;
>
> Is that correct or is there actually another loop for "t" where the
> current sample is differenced against everything, another "tap" in
> there?
>
> Thanks for your help.- Hide quoted text -
>
> - Show quoted text -

Is the pitch acquisition time important to you? If it's a guitar
synthesizer application then this is usually a big issue.


Bob Adams

Reply by Ron N. ●January 1, 20092009-01-01

On Dec 31, 9:14&#4294967295;am, zotty <mzart...@annosoft.com> wrote:
> Hi All,
>
> I've implemented a pitch detector using a pretty brute force method.
> PCM audio -> FFT(512) -> DCT (512), then do auto-correlation using the
> current data and a history buffer. It's working great. The application
> assumes singing on a potentially noisy channel. (Headset mic or open
> mic configurations are possible, Karaoke ). I do some signal
> normalization in the Cepstral domain.
>
> This seems like overkill, but time domain stuff, ala zero crossing or
> something, makes some huge assumptions regarding signal to noise, and
> even DC offset. My assumption is this technique will usually be more
> robust against different signal to noise characteristics. I'd like to
> try out some different algorithms, perhaps time domain. what else
> would you try?
>
> Any thoughts?

Depends on your criteria.  Latency?  Frequency accuracy?
Tracking FM or vibrato rate?  Pitch duration or transition
time measurement?

I've got a list of frequency and pitch estimation methods that I've
looked at or played with here:
  http://www.nicholson.com/rhn/dsp.html

My current random opinion is that the human ear uses something which
produces results a least slightly similar to FFT interpolated peak
estimation for high frequency pitches, Harmonic Product Spectrum
(sort of a poor man's Cepstrum) for middling frequency pitches,
and AMDF for very low pitches.

.

--
rhn A.T nicholson d.0.t C-o-M

Reply by zotty ●January 2, 20092009-01-02

Hey Ron,
Thank you.

> Depends on your criteria. &#4294967295;Latency? &#4294967295;Frequency accuracy?
> Tracking FM or vibrato rate? &#4294967295;Pitch duration or transition
> time measurement?

I've got two usages, one concerning speech, the other singing. We
license an sdk for doing automatic lip sync (mouth positions) given an
audio file/audio stream. Pitch information on voiced phonemes is a
useful cue for gesturing. In this case, I'm already doing a ton a
processing (the magnitude spectra is available for free), so cepstrum
work isn't very costly and latency isn't an issue.

Second usage is for pitch estimation from live vocals during game play
(like guitar hero or rock band). Semitone frequency accuracy is the
goal. I expect that i want to try smooth out vibrato changes in the
estimation, but it's probably not a huge deal. Cepstrum may still be
available depending on whether the realtime phoneme extractor is also
used to aid in the score.  I think 80-100 milliseconds of latency will
be acceptable.

The problem, when really broken down, is not arbitrary pitch
detection, but rather comparison of an audio signal with a given midi
realization of the vocal track. It's likely to be a very noisy
environment, so it's probably going to be more reliable to look for
signal energy in bands near the expected semitone and score based on
that.

Mark Zartler
Annosoft

Reply by robert bristow-johnson ●January 2, 20092009-01-02

On Dec 31 2008, 7:59 pm, Robert Adams <robert.ad...@analog.com> wrote:
>
> Is the pitch acquisition time important to you? If it's a guitar
> synthesizer application then this is usually a big issue.
>
...

On Jan 2, 12:09&#4294967295;pm, zotty <mzart...@annosoft.com> wrote:
> Hey Ron,
> Thank you.
>
> > Depends on your criteria. &#4294967295;Latency? &#4294967295;Frequency accuracy?
> > Tracking FM or vibrato rate? &#4294967295;Pitch duration or transition
> > time measurement?
>
> I've got two usages, one concerning speech, the other singing. We
> license an sdk for doing automatic lip sync (mouth positions) given an
> audio file/audio stream. Pitch information on voiced phonemes is a
> useful cue for gesturing. In this case, I'm already doing a ton a
> processing (the magnitude spectra is available for free), so cepstrum
> work isn't very costly and latency isn't an issue.

if delay is no problem, and you can afford to compute cepstrums, i
think the autocorrelation or squared-difference methods (which can be
expensive) as a starting point is your best bet.  the creative part is
looking at the autocorrelation or ASDF result and inferring the
correct fundamental frequency out of that.

there are lotsa subtle issues to worry about.  most of these issues i
won't talk about.  but one common issue is the so-called "octave
error" problem.  consider a perfectly periodic tone with fundamental
at 440 Hz.  you would listen to it and say it sounds like A4 (or MIDI
69).  but, mathematically, that tone is also a 220 Hz waveform (that
happens to have all of its odd harmonics with zero amplitude).  so
whatever measure of periodicity that says the note is very periodic at
440 Hz will also measure the periodicity (of that very same note) at
220 Hz to be just as high.  how do you prefer one estimate over the
other?  if you say that it is the highest possible fundamental
frequency that results in a high degree of periodicity, then you begin
to introduce a threshold to determine which candidate estimates are
counted.  on top of that, you can fool it with the appearance of low-
level sub-harmonics.  suppose your note is really an A440, but somehow
it has a teeny bit of A220 (with some odd harmonic energy), attenuated
by 80 dB, added to it.  mathematically, it's a 220 Hz waveform (and
you output MIDI 57) and not a 440 Hz waveform, but somehow it really
sounds like 440 and somehow your pitch detector has to make the same
judgment.  if it's a simple threshold, then when some waveform
approaches that threshold (and, in real life these waveforms come at
you at inconvenient times) you hear the tracking pitch jump back and
forth between what is likely the correct pitch and an octave (either
up or down) off. have fun with it.

> Second usage is for pitch estimation from live vocals during game play

then latency is an issue, no?

> (like guitar hero or rock band). Semitone frequency accuracy is the
> goal. I expect that i want to try smooth out vibrato changes in the
> estimation, but it's probably not a huge deal.

well, you will find that nearly all human vocal pitch contours have
lotsa variation in it and seldom lands right on (or really close to)
the dead center of the semitone pitches.  unless they're using a
commercial pitch processor and switching it to the "Cher
effect" (where the processed vocal has the pitch quantized tightly to
the semitone pitches or some other preprogrammed list of notes) which
is now really overused in pop music.

> Cepstrum may still be
> available depending on whether the realtime phoneme extractor is also
> used to aid in the score. &#4294967295;I think 80-100 milliseconds of latency will
> be acceptable.

you can do a lot in 100 ms.  be grateful that you have that much time
to play with.  my life would be easier if i had that much time.

> The problem, when really broken down, is not arbitrary pitch
> detection, but rather comparison of an audio signal with a given midi
> realization of the vocal track.

i dunno anything about MIDI files, but i know MIDI 1.0 pretty well.
how do you represent precise pitches (between dead-center semitones)
in a MIDI realization?  i know it's just a protocol or file format
issue (outside of MIDI 1.0), but i don't know how they define it.

> It's likely to be a very noisy
> environment, so it's probably going to be more reliable to look for
> signal energy in bands near the expected semitone and score based on
> that.

well, a filter that is tuned to the integer harmonics of a common
fundamental frequency that we'll call "the expected semitone", is a
comb filter.  if you consider a pitch detector that has a bank of
various comb filters tuned to all of the candidate pitches, and you
measure the output power of each comb filter by squaring and LPFing
the squared output, that is essentially what the ASDF.  if you
absolute valued the outputs of the comb filters before LPFing, then it
would be the more familiarly-titled AMDF.

r b-j

Reply by Rick Lyons ●January 3, 20092009-01-03

On Wed, 31 Dec 2008 22:53:34 -0800 (PST), "Ron N."
<rhnlogic@yahoo.com> wrote:

   (snipped by Lyons)
>
>Depends on your criteria.  Latency?  Frequency accuracy?
>Tracking FM or vibrato rate?  Pitch duration or transition
>time measurement?
>
>I've got a list of frequency and pitch estimation methods that I've
>looked at or played with here:
>  http://www.nicholson.com/rhn/dsp.html
>
>My current random opinion is that the human ear uses something which
>produces results a least slightly similar to FFT interpolated peak
>estimation for high frequency pitches, Harmonic Product Spectrum
>(sort of a poor man's Cepstrum) for middling frequency pitches,
>and AMDF for very low pitches.
>

Hi Ron,
   I took a look at your above web page.
I noticed at the bottom of the page, under 
the "Other Online DSP Resources" category you 
had the following line:

    a list of Online DSP books - from R. Lyons' article 
    in IEEE Signal Processing.

The URL you have there requires a visitor to be 
a "paid" member of the IEEE in order to see the 
"list of books".

Ron, as it turns out I have that same list of books, 
and secondary list, at the following DspRelated.com 
web site:

http://www.dsprelated.com/blogs-1/nf/Rick_Lyons.php

I mention this to you because the two lists of 
books on the DspRelated.com web site are totally 
free of charge, and available to anyone.

Regards,
[-Rick-]

Pitch Detection (various methods)

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group