comp.dsp | Pitch detection| page 3

Reply by Jerry Avins ●April 7, 20052005-04-07

robert bristow-johnson wrote:
   ...

> whatever linear operation you do in the frequency domain can be constructed
> as an equivalent time-domain operation.
> 
> i guess we need to talk a little about what "pitch detection" means.  we
> have a perceptual meaning that is hard to describe for sounds in general.
> if i recorded a fart into a sampling keyboard and then played on the keys a
> recognizable melody (say "Mary had a little lamb"), you might likely hear a
> sense of pitch for each note, but i would have trouble defining clearly how
> that fart gives you a sense of pitch for that note.
> 
> but highly tonal instruments are different.  *then* we are pretty clear that
> the pitch of the note is directly related to the fundamental frequency, f0,
> of the quasi-periodic function that is the note's waveform (which is the
> reciprocal of the period).

The B-flat woodpecker woke me this morning by tapping out a Personals ad 
on my furnace vent. With all that din, it was still B-flat.

   ...

Jerry
-- 
Engineering is the art of making what you want from things you can get.
&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;

Reply by ben ●April 7, 20052005-04-07

In article <1112887101.613550.274560@z14g2000cwz.googlegroups.com>,
Andor <an2or@mailcircuit.com> wrote:

> I'm also no speech processing guy (perhaps you should go over to
> comp.speech or comp.speech.research for more qualified responses). I
> know that speech processing people like to estimate the frequency
> components of speech indirectly by computing LPC coefficients (that is
> what vocoders such as MELP, CELP etc. do). These coefficients do not
> give you frequency information directly (you have to factor the
> polynomial first), but it seems that certain vocal tract parameters can
> be deduced directly from these coefficients (without having to take it
> from the frequency domain). I think modeling the vocal tract should be
> more interesting than frequency estimation for speech recognition.

hmm, that's interesting -- don't know what LPC is nor what vocal tract
parameters might be exactly -- but it's interesting that you're saying
that speech sound might be processed, right at the start, in a
fundamentally different way than other sounds.

as i said, i know there's much more and a lot to speech recognition,
but i thought that that "much more" part would come after the initial
processing of the sound -- i thought whether you're processing music or
speech or whatever i thought you'd probably (basically and
fundamentally that is -- not exactly -- you'd probably make
modifcations) handle the sound initially in much the same way. so
thanks for pointing out that that may not be the case. something to
look into.

although i like the way the human ear deals with music and speech and
other things, and that is basically a frequency splitter-upper isn't
it? i know there's nothing dictating that you should copy nature and
not take short cuts where you see them but anyway...

i suppose also i'd like to not restrict what i'm doing to just speech
recognition -- not cut off other possibilities althoug speech
recognition is the main goal at the moment. but if there's other
information to be gleaned by using other than frequency detection then
thye should be used for sure.

in fact what you're talking about, LPC and vocal tract parameters and
maybe other things, do you think that if i were to only use frequency
analysis (splitting the sound up into frequencies etc) i'd not be able
to get the info that the methods you talk about would give -- i'd miss
info?, or are the methods you talk about a short cut, a more efficient
way to get info you would still be able to get via splitting sound into
frequencies? probably something to ask on the speech recognition list
but i'd be interested in any opinions on that here. (speech recognition
people are probably more likely to "sell" their ways possibly, so nice
to get other opinions). if the ear does basically just split into
frequencies then i reckon what you're talking about is a short cut, and
not something that'd give extra info that couldn't be got from just
frequency splitting -- but i'm guessing.

> If that is not the case, pure frequency estimation can be done via FFT
> --- consider windowing, averaging and overlapping to improve the raw
> FFT data.

yes they're the kind of details i was skipping over when i said
"without going into the details" (although oversampling isn't something
i know about yet). as well as what you mention i thought of different
lengthed windows for different tones (short for high pitched, long for
bass-like), so different fourier transforms specifically for particular
(small) ranges of tones -- would generally allow more accurate time
info to be got i think (although no difference for the lowest tone).
and lots of overlapping as you said (both time wise (shifting the
windows along by a small amount each time) and frequency wise).

cool, thanks.

ben.

Reply by ben ●April 7, 20052005-04-07

In article <BE7AC7B8.600A%rbj@audioimagination.com>, robert
bristow-johnson <rbj@audioimagination.com> wrote:

> in article 070420051129510384%x@x.x, ben at x@x.x wrote on 04/07/2005 06:30:
> 
> > In article <BE78340D.5EE0%rbj@audioimagination.com>, robert
> > bristow-johnson <rbj@audioimagination.com> wrote:
> > 
> >> i wouldn't do that.  you still need to deal with the possibility of missing
> >> or weak harmonics (inc. fundamental).  i agree with Dmitry Terez about not
> >> using FFT.
> > ...[to detect pitch]
> > 
> > is this correct? when you say not to use fft i take it you mean fourier
> > transforms in general not just the fast fourier transform specifically?
> > it's a little bit of a shock to me this -- i asked elsewhere about
> > pitch detection (which is the same thing as frequency or tone alanlysis
> > / extraction right?) and was told 'fourier transform' by numerous
> > people -- that seemed to be *the* one and only answer.
> 
> whatever linear operation you do in the frequency domain can be constructed
> as an equivalent time-domain operation.
> 
> i guess we need to talk a little about what "pitch detection" means.  we
> have a perceptual meaning that is hard to describe for sounds in general.
> if i recorded a fart into a sampling keyboard and then played on the keys a
> recognizable melody (say "Mary had a little lamb"), you might likely hear a
> sense of pitch for each note, but i would have trouble defining clearly how
> that fart gives you a sense of pitch for that note.
> 
> but highly tonal instruments are different.  *then* we are pretty clear that
> the pitch of the note is directly related to the fundamental frequency, f0,
> of the quasi-periodic function that is the note's waveform (which is the
> reciprocal of the period).

i'm not quite sure i really see the difference -- we're talking about
the size of gaps between each spike for both things (the fart sample
and musical instrument) -- so what's the difference? the frequency of
spikes make a pitch right? lots of spikes per second -- high pitched
sound. sure, the fart sample is a bit rougher and maybe gappier but
it's still the same situation isn't it? i'm definetely not seeing the
difference between frequency and pitch but i don't think i'm that
fussed about the difference (although it is very interesting --
certainly got me thinking). maybe it is important though, i'm not sure.

anyway, when i said i asked elsewhere about pitch detection i actually
asked elsewhere about frequency detection -- it's just since seeing
this thread that the issue of the difference between pitch and
frequency has arisen and got merged together because i wasn't aware of
the difference. i think it's frequency detection i'm interested in
although seeing as i don't understand the difference i'm pretty unsure.

(i want to pre process sound -- split into the various frequencies --
in order to go on and further process for speech recognition)

> now, in the spectrum, you will see spikes that are equally spaced and
> integer multiples of that fundamental frequency.

this is the raw data that you get from a sound right? time going
horizontally and amplitude vertically as it's usually illustrated.

> each spike represents a
> harmonic and the height of if is the strength of that harmonic.  now, we
> could use a comb filter to isolate those spikes.  there are two basic kinds
> of comb filters, one that puts in a null every f1 Hz and one that puts in a
> peak every f1 Hz.  now if we use the first one and vary f1 until it happens
> upon f0 or a submultiple of f0, then the output of that comb filter will be
> minimum.  that is essentially what the AMDF or ASDF algorithm does in the
> time domain.  it's the same thing but in two different domains.  and
> autocorrellation is directly related to the ASDF.

i thought fourier transforms were what's used to transform those spikes
(i think of them as fence posts, but i'm silly) into something more
comprehendable / recognisable? how is AMDF and ASDF different / related
to a fourier transform? are they completely different? or are based on,
maybe even varients of fourier transforms?

to continue with the description of sound and extracting frequencies
which i find *very* helpful: the width of the window necessary to be
able to see a frequency is at least the width of two fence posts
together (maybe more) for the particular frequency you're looking at,
so you can't tell the frequency/tone from just one fence post (because
it's the relation between fence posts that make a frequency -- a single
fence post isn't a frequency -- hense multiple toothed comb in your
explenation). so the amount of fence posts in a particular stretch says
the frequency/tone, and the height of the fence posts says the volume.
and one main problem is, i can imagine, various styles of fences
(frequencies) overlap in most sounds -- occur within the same stretch
of time. so that's when you need more than two fence posts to be able
to tell which fence posts belong to which fence -- to be able to
see/determine the continuation. the more repetition over a longer
stretch the easier it is to be sure you're correctly ascertaining the
frequency and not just going of a mixture of frequencies so getting
incorrect frequency data.

yes the literal explenation of how to go about extracting frequencies
that you give completely tallies with how i imagined logically you
might go about getting frequencies out of raw audio data -- but what
gets me is, where on earth does the fourier transform come into this?
(i don't have a nice simple logical understanding of how a fourier
transform does what it does at all, unless it happens to be what's just
been described maybe? the comb etc?) i thought time/amplitude data >>>
frequency data was fourier transform's teritory. ft transforms back and
forth between raw data (time by amplitude, without apparent frequency
data) and spectrum data(not sure on that phrase -- frequency by
amplitude, without apparent time data).

so should i drop reading and learning about fourier transforms (bearing
in mind i want to extract the various frequencies that occur in sound
(mainly but not entirely for speech recognition)) and concentrate on
AMDF and ASDF ? or are they much the same / similar things anyway?

> well, splitting a sound into its frequencies certainly *is* a topic
> regarding the Fourier Transform (in one of its forms).

so i reckon AMDF and ASDF are versions of the fourier transform?

thanks very much for the reply,
ben.

Reply by ben ●April 7, 20052005-04-07

In article <KPidnYdVc-g908jfRVn-tg@rcn.net>, Jerry Avins <jya@ieee.org>
wrote:

> ben wrote:
> 
>    ...
> 
> > is this correct? when you say not to use fft i take it you mean fourier
> > transforms in general not just the fast fourier transform specifically?
> 
> To touch on one point only. An FFT is just a fast way to compute a 
> Fourier transform. If its result weren't identical to all the other ways 
> of computing it, it wouldn't be a Fourier transform.

right i see. i did think there were very similar -- didn't know they
gave exactly the same results. i was just being careful and making
sure.

thanks, ben.

Reply by Jerry Avins ●April 7, 20052005-04-07

ben wrote:

   ...

> i'm not quite sure i really see the difference -- we're talking about
> the size of gaps between each spike for both things (the fart sample
> and musical instrument) -- so what's the difference? the frequency of
> spikes make a pitch right? lots of spikes per second -- high pitched
> sound. sure, the fart sample is a bit rougher and maybe gappier but
> it's still the same situation isn't it? i'm definetely not seeing the
> difference between frequency and pitch but i don't think i'm that
> fussed about the difference (although it is very interesting --
> certainly got me thinking). maybe it is important though, i'm not sure.
> 
> anyway, when i said i asked elsewhere about pitch detection i actually
> asked elsewhere about frequency detection -- it's just since seeing
> this thread that the issue of the difference between pitch and
> frequency has arisen and got merged together because i wasn't aware of
> the difference. i think it's frequency detection i'm interested in
> although seeing as i don't understand the difference i'm pretty unsure.

Jon Harris wrote earlier: <quote>
"robert bristow-johnson" <rbj@audioimagination.com> wrote:

   ...

 >> i think your brain sorta fills in the missing fundamental if there is
 >> a 2nd, 3rd, 4th, etc harmonic of a tone.  try it with MATLAB or the
 >> code of your choice.


Or check out these examples on the web:
http://www.ee.calpoly.edu/~jbreiten/audio/missfund/
http://physics.mtsu.edu/~wmr/julianna.html <endquote>

Did you listen to those examples, particularly the ones at the second 
URL? Some clearly illustrate a *pitch* an octave lower than the lowest 
*frequency* in the audio.

   ...

Jerry
-- 
Engineering is the art of making what you want from things you can get.
&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;

Reply by Bob Cain ●April 8, 20052005-04-08


Jerry Avins wrote:
> ben wrote:
> 
>   ...
> 
>> is this correct? when you say not to use fft i take it you mean fourier
>> transforms in general not just the fast fourier transform specifically?
> 
> 
> To touch on one point only. An FFT is just a fast way to compute a 
> Fourier transform. 

To be more specific, a finite length, discrete time Fourier 
transform.  :-)


Bob
-- 

"Things should be described as simply as possible, but no 
simpler."

                                              A. Einstein

Reply by Peter K. ●April 8, 20052005-04-08

Bob Cain wrote:

> Jerry Avins wrote:
>
> > To touch on one point only. An FFT is just a fast way to compute a
> > Fourier transform.
>
> To be more specific, a finite length, discrete time Fourier
> transform.  :-)

Implemented in a computationally efficient manner.  Otherwise, "finite
length" etc. just says DFT, not necessarily FFT.

Ciao,

Peter K.

Reply by Jerry Avins ●April 9, 20052005-04-09

Bob Cain wrote:
> 
> 
> Jerry Avins wrote:
> 
>> ben wrote:
>>
>>   ...
>>
>>> is this correct? when you say not to use fft i take it you mean fourier
>>> transforms in general not just the fast fourier transform specifically?
>>
>>
>>
>> To touch on one point only. An FFT is just a fast way to compute a 
>> Fourier transform. 
> 
> 
> To be more specific, a finite length, discrete time Fourier transform.  :-)
> 
> 
> Bob

True. To be even more specific, a finite length, discrete time Fourier 
transform with quantized results and some round-off error. Is there 
another kind of Fourier transform that can be performed digitally?

Jerry
-- 
Engineering is the art of making what you want from things you can get.
&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;

Reply by Ronald H. Nicholson Jr. ●April 9, 20052005-04-09

In article <1112764006.059594.279510@z14g2000cwz.googlegroups.com>,
robert bristow-johnson <rbj@audioimagination.com> wrote:
>i don't get it.  why would getting the higher harmonics and then
>dividing the frequency down to an implied missing fundamental be better
>than determining the period of some quasi-periodic signal for measure
>of pitch?

If the spectral energy peaks are very closely but not exactly harmonically
related (which the physics of some real-world resonators can produce),
a sub-multiple of the lowest frequency might be what a human would call
the approximate pitch, but a sub-multiple of an even higher frequency
present might be what a musician would call the exact pitch relative to
other simultaneous musical notes present.

A good example might be a spinet piano, where a slightly flat low-A
(say 109.8 Hz) played through a teleco quality circuit might only have
frequency content above 200 Hz, but would still be heard as a low-A,
two octaves below concert-A, in appropriate context, even with little
spectral energy in that range.  But if the near 4th harmonic peaked at
440.8 Hz, and this waveform was played against a simultaneous exact 440
Hz concert-A flute tone, thus producing a noticeable beat, the low-A
piano note might be perceived as slightly #sharp in pitch, not flat.

Humans may also be more sensitive to pitch errors in the middle of
a the audio spectrum, versus in the lower or higher frequency ranges.

Thus the pitch in the above situation, to a piano tuner, might be best
considered as closer to 440.8/4 = 110.2 Hz, and neither, say, at 220 Hz,
where there might be the highest absolute spectral peak (according to
an FFT maxima), nor at the fundamental 109.8 Hz string resonance that
started off this overtone sequence (and which an AMSD or autocorrelation
algorithm might hunt and find).

IMHO. YMMV.
-- 
Ron Nicholson   rhn AT nicholson DOT com   http://www.nicholson.com/rhn/ 
#include <canonical.disclaimer>        // only my own opinions, etc.

Reply by robert bristow-johnson ●April 10, 20052005-04-10

in article d39gmo$vqr$1@blue.rahul.net, Ronald H. Nicholson Jr. at
rhn@mauve.rahul.net wrote on 04/09/2005 17:16:
 
> If the spectral energy peaks are very closely but not exactly harmonically
> related (which the physics of some real-world resonators can produce),
> a sub-multiple of the lowest frequency might be what a human would call
> the approximate pitch, but a sub-multiple of an even higher frequency
> present might be what a musician would call the exact pitch relative to
> other simultaneous musical notes present.
> 
> A good example might be a spinet piano, where a slightly flat low-A
> (say 109.8 Hz)

what is 109.8 Hz?  is it the frequency of the bottom overtone (often called
the fundamental)?  or is it the reciprocal of the period?  especially in the
situation you describe below, they are not exactly the same thing.  the AMDF
or ASDF measures the period.

> played through a teleco quality circuit might only have
> frequency content above 200 Hz, but would still be heard as a low-A,
> two octaves below concert-A, in appropriate context, even with little
> spectral energy in that range.

yup.  and the measured period will be about 1000/109.8 milliseconds.  but
possibly not exactly.

>  But if the near 4th harmonic peaked at 440.8 Hz,

you mean there's a formant (or resonance) at around 440 Hz making the 4th
harmonic particularly loud compared to others?  that will increase its
influence on the measured period.

> and this waveform was played against a simultaneous exact 440
> Hz concert-A flute tone, thus producing a noticeable beat, the low-A
> piano note might be perceived as slightly #sharp in pitch, not flat.

that may be true, but i am not sure that the AMDF will see it any
differently.  especially if the 109.8 Hz component was killed by an HPF,
then the period *will* be determined as the greatest common factor of the
remaining harmonics and if they are sharper than their integer harmonic
index times the 109.8 Hz component, the AMDF will arrive at a pitch that is
higher than 109.8.

> Humans may also be more sensitive to pitch errors in the middle of
> a the audio spectrum, versus in the lower or higher frequency ranges.

that may be, but is still not the issue.  just like for a VU meter, you
could run the audio through something like an A-weighting filter to
emphasize frequency components in the 2 to 5 kHz range and de-emphasize
components in the highest and lowest octaves before the AMDF algorithm see
it.

> Thus the pitch in the above situation, to a piano tuner, might be best
> considered as closer to 440.8/4 = 110.2 Hz, and neither, say, at 220 Hz,
> where there might be the highest absolute spectral peak (according to
> an FFT maxima), nor at the fundamental 109.8 Hz string resonance that
> started off this overtone sequence (and which an AMDF or autocorrelation
> algorithm might hunt and find).

no.  the AMDF or ASDF will find the best fit for the period, which is
influenced by all of the harmonics, and the harmonics greater in amplitude
will influence the measure more.  the reciprocal of that would be called the
fundamental frequency, but it might not be exactly the same frequency as the
1st harmonic.  as in the case above, if there was zero amplitude at 109.8 (i
dunno what meaning that precise frequency would have) but a decent amount of
energy at 220, 330.3, 440.8, 551.5, the AMDF will not measure a period of
1/109.8, but will be shorter than 1/110 because of the other harmonics.

i know about sharpened harmonics in many fixed string instruments with
increasing harmonic number (due to stiffness at the string termination that
effectively shortens the string, particularly for high amplitude hits).  i
know that piano tuners may very well tune higher notes slightly sharp, in
comparison to their mathematical value in an equally tempered scale to line
up octaves to power of 2 harmonics from lower notes.  for 12 note/octave
equal temperament, we don't line up the other harmonics, say the 3rd to
exactly 19 semitones up because 3 does not exactly equal 2^(19/12).   i know
about some tones possibly having missing fundamental (and possibly other
harmonics).  it's also possible, that the fundamental, even when it is
there, does not exactly equal the reciprocal of the measured period, because
of the aggregate influence of the other harmonics.

that doesn't change anything.  for a tonal musical note, they are
quasi-periodic and, for those kinds of notes, our most salient queue for
pitch will the reciprocal of the period and the AMDF or ASDF is designed to
best estimate that period.  now there are problems.  there is the classic
"octave problem" (but it could be with other harmonic intervals, too, but
most often, if there is an ambiguity, it's about an octave).  this come from
the fact that a 110 Hz note that is added to a *very* quiet 55 Hz note (say,
at -80 dB relative to the 110 Hz note), will look like a 55 Hz note
mathematically, but will sound like a 110 Hz note.  then there needs to be a
little brains built into the AMDF analysis to reject the null at 1/55 sec
just because it is ever so slightly lower than the null at 1/110 sec.  so
somehow you want to choose the first really good looking null, even if the
null at twice the lag is very slightly better.

that's the main problem with AMDF or ASDF.  i don't see the situation you
described as being a problem.  if you have a good (and short) sound file of
a note or even just a collection of amplitudes and frequencies that you
think would fool this, i might want to try it with a MATLAB kludge to see if
it does.

-- 

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."