comp.dsp | pitch question

Here are two questions about pitch retrieval method, which puzzled me
for a long time.

1. Since pitch is a kind of subjective and perceived factor, how it
can be searched by computing ?

2. Let us assume pitch can be searched by that.
   As we know, the pitch value of old man can be low to 55hz or so.
Some well-known speech codec have a   high pass filter located at the
beginning of process. That is G729 -- A second order pole/zero filter
  with a cut-off frequency of 140 Hz ,and G723.1 -- removes the DC
element from the input speech.  Since the components of 55 hz  have
been removed from signal,how can it be retrieved again?
 It is a non-existent element.

Cheers
HyeeWang

Reply by Richard Dobson ●June 15, 20092009-06-15

HyeeWang wrote:
> Here are two questions about pitch retrieval method, which puzzled me
> for a long time.
> 
> 1. Since pitch is a kind of subjective and perceived factor, how it
> can be searched by computing ?
> 

Yes, it is subjective some of the time; but a lot of the time it is 
nothing more or less than the fundamental of a harmonic sound (i.e. 
harmonics in tune with the fundamental). So it can be found simply by 
finding the lowest peak. Not robust unless you already know you are 
analysing a single tone. Otherwise:

> 2. Let us assume pitch can be searched by that.
>    As we know, the pitch value of old man can be low to 55hz or so.
> Some well-known speech codec have a   high pass filter located at the
> beginning of process. That is G729 -- A second order pole/zero filter
>   with a cut-off frequency of 140 Hz ,and G723.1 -- removes the DC
> element from the input speech.  Since the components of 55 hz  have
> been removed from signal,how can it be retrieved again?
>  It is a non-existent element.
>

You can exploit the way very cheap small transistor radios work(ed). 
They had rubbish LF response, but kids could still listen to their 
favourite tracks almost obliviously (and even classical music was often 
quite listenable-to). The ear is able to reconstruct missing 
fundamentals from the harmonics that ~are~ present. At its simplest, the 
phenomenon is called the "resultant tone" generated as the Hz difference 
between two tones - in effect, a beat with a frequency in the audible 
range. I even demonstrate this to my flute students by playing two high 
notes between us, and hearing the low resultant tone very clearly. The 
trick for us is to tune the two notes such that the resultant is also in 
tune (Equal Temperament is your enemy here). In this situation, the 
quasi-fundamental does not appear in spectrum analysis, it is a trick 
performed by the ear.

Consequently a more sophisticated (frequency-domain) pitch detection 
algorithm looks for the N strongest components (and the Hz differences 
between them) and calculates the likely fundamental as the average 
difference. This will work reasonably well even in the absence of the 
fundamental. In principle, you can take just harmonics 4 to 8 and still 
work out the fundamental pretty accurately. You accept a margin of error 
of at least 14 Cents, maybe more. However, the human voice (especially 
the speaking voice) is not exactly over-laden with partials, so the 
calculated pitch may have a somewhat larger error margin.

Richard Dobson

Reply by Vladimir Vassilevsky ●June 15, 20092009-06-15


HyeeWang wrote:

> Here are two questions about pitch retrieval method, which puzzled me
> for a long time.
> 
> 1. Since pitch is a kind of subjective and perceived factor, how it
> can be searched by computing ?
> 
> 2. Let us assume pitch can be searched by that.
>    As we know, the pitch value of old man can be low to 55hz or so.
> Some well-known speech codec have a   high pass filter located at the
> beginning of process. That is G729 -- A second order pole/zero filter
>   with a cut-off frequency of 140 Hz ,and G723.1 -- removes the DC
> element from the input speech.  Since the components of 55 hz  have
> been removed from signal,how can it be retrieved again?
>  It is a non-existent element.
> 
> Cheers
> HyeeWang

Reply by rickman ●June 16, 20092009-06-16

On Jun 15, 10:28&#4294967295;am, Vladimir Vassilevsky <antispam_bo...@hotmail.com>
wrote:
> HyeeWang wrote:
> > Here are two questions about pitch retrieval method, which puzzled me
> > for a long time.
>
> > 1. Since pitch is a kind of subjective and perceived factor, how it
> > can be searched by computing ?
>
> > 2. Let us assume pitch can be searched by that.
> > &#4294967295; &#4294967295;As we know, the pitch value of old man can be low to 55hz or so.
> > Some well-known speech codec have a &#4294967295; high pass filter located at the
> > beginning of process. That is G729 -- A second order pole/zero filter
> > &#4294967295; with a cut-off frequency of 140 Hz ,and G723.1 -- removes the DC
> > element from the input speech. &#4294967295;Since the components of 55 hz &#4294967295;have
> > been removed from signal,how can it be retrieved again?
> > &#4294967295;It is a non-existent element.
>
> > Cheers
> > HyeeWang

I think I would rather read some sort of insult rather than just
changing the subject line.  There has to be something very strange
going on over that side of the ocean.  This behavior is just a little
too weird.

Rick

Reply by robert bristow-johnson ●June 16, 20092009-06-16

On Jun 15, 11:58&#4294967295;pm, rickman <gnu...@gmail.com> wrote:
>
> I think I would rather read some sort of insult rather than just
> changing the subject line. &#4294967295;There has to be something very strange
> going on over that side of the ocean. &#4294967295;This behavior is just a little
> too weird.
>

dunno which side yer on, Rick, you might find Vlad is, i think, on the
Oklahoma side of the ocean.

On Jun 15, 4:37 am, Richard Dobson <richarddob...@blueyonder.co.uk>
wrote:
> HyeeWang wrote:
> > Here are two questions about pitch retrieval method, which puzzled me
> > for a long time.
>
> > 1. Since pitch is a kind of subjective and perceived factor, how it
> > can be searched by computing ?
>
> Yes, it is subjective some of the time; but a lot of the time it is
> nothing more or less than the fundamental of a harmonic sound (i.e.
> harmonics in tune with the fundamental).

probably, since "pitch" is quantified as a musical measure, you might
want to also log the fundamental (base 2 if your unit is octaves).

> So it can be found simply by finding the lowest peak.

you can have a periodic (probably we're dealing with quasi-periodic
functions for most musical notes or tones) function without a
fundamental.  or the fundamental has a poor S/N ratio.  something that
only assumes periodicity, like AMDF or some autocorrelation, is likely
what the OP should use.  the speech processing papers that also needed
to use a good pitch detector used these methods as far back as the
70s, i think.

> Not robust unless you already know you are analysing a single tone.

i dunno if it's even good for that, Richard.  :-\

> Otherwise:
>
> > 2. Let us assume pitch can be searched by that.
> >    As we know, the pitch value of old man can be low to 55hz or so.
> > Some well-known speech codec have a   high pass filter located at the
> > beginning of process. That is G729 -- A second order pole/zero filter
> >   with a cut-off frequency of 140 Hz ,and G723.1 -- removes the DC
> > element from the input speech.  Since the components of 55 hz  have
> > been removed from signal,how can it be retrieved again?
> >  It is a non-existent element.
>
> You can exploit the way very cheap small transistor radios work(ed).
> They had rubbish LF response, but kids could still listen to their
> favourite tracks almost obliviously (and even classical music was often
> quite listenable-to). The ear is able to reconstruct missing
> fundamentals from the harmonics that ~are~ present. At its simplest, the
> phenomenon is called the "resultant tone" generated as the Hz difference
> between two tones - in effect, a beat with a frequency in the audible
> range. I even demonstrate this to my flute students by playing two high
> notes between us, and hearing the low resultant tone very clearly. The
> trick for us is to tune the two notes such that the resultant is also in
> tune (Equal Temperament is your enemy here).

depends on the interval.  3/2 ain't so bad with equal temperament. 2
cents off.  major 3rds (what would be 4/3) doesn't do so good with
equal temp.

>  In this situation, the
> quasi-fundamental does not appear in spectrum analysis, it is a trick
> performed by the ear.

if the note (with the missing fundamental) is by itself or is quite
prominent with respect to the rest of the music, then running that
quasi-periodic signal through a static nonlinearty will create a
different (and distorted) quasi-periodic signal with the same quasi-
constant fundamental.  but, because of the nonlinearity, this periodic
signal might have some energy in it's fundamental from cross-products
of neighboring harmonics (like the 13th and 14th, but more likely the
2nd and 3rd, but all such pairs will contribute).  our hearing system
may have evolved some similar nonlinearity to do that.

but it doesn't matter.  since fundamental frequency (or period) is the
parameter of interest, you should use a method that makes the fewest
assumptions necessary.  assuming that there is a period of sorts
(which defines the fundamental frequency) makes a more general
assumption than also requiring energy at the fundamental.  it only
requires that the harmonics are all integer multiples of some
hypothetical candidate fundamental.  it doesn't need energy at the
fundamental (which is what you might do with a spectrum analyzer
method).

> Consequently a more sophisticated (frequency-domain) pitch detection
> algorithm looks for the N strongest components (and the Hz differences
> between them) and calculates the likely fundamental as the average
> difference. This will work reasonably well even in the absence of the
> fundamental.

so will autocorrelation or AMDF.  how will your method work if a note
had the 3rd, 5th, 7th, 10th, and 13th harmonic?  (no energy in the
other harmonics.)

>  In principle, you can take just harmonics 4 to 8 and still
> work out the fundamental pretty accurately. You accept a margin of error
> of at least 14 Cents, maybe more.

how about 1 cent or less?  you can measure the period to better
precision than 1 sample unit.

> However, the human voice (especially
> the speaking voice) is not exactly over-laden with partials, so the
> calculated pitch may have a somewhat larger error margin.

better method is time domain.  lots of computations, but DSP chips are
s'posed to do lotsa computations.  and there are ways of dividing the
work.

r b-j

Reply by Richard Dobson ●June 16, 20092009-06-16

robert bristow-johnson wrote:

> so will autocorrelation or AMDF.  how will your method work if a note
> had the 3rd, 5th, 7th, 10th, and 13th harmonic?  (no energy in the
> other harmonics.)
> 

Hmm well, what would you decide if you listen to it (you have not 
specified relative strengths of each component)? The first three might 
be good enough, ~if~ strong enough and high enough, to generate a 
recognisable resultant tone. The first 4 correspond to a sort-of 
widely-spaced diminished triad (5th+10th should = an octave, so a 
prominent combination), with the top component being harmonically less 
well-related (e.g. G,E',Bb-',E'',A-''), so I would guess most people 
(especially given the 7th harmonic is 1/4tone "out of tune" relative to 
ET) would call it a sort of bell or metallic tone. So, in the category 
perhaps of "ambiguous". It is known that many bells in effect have 
sub-audio fundamentals (or sub-harmonics, whatever!); and even a cymbal 
does if you listen to it close-up edge-on.

>>  In principle, you can take just harmonics 4 to 8 and still
>> work out the fundamental pretty accurately. You accept a margin of error
>> of at least 14 Cents, maybe more.
> 
> how about 1 cent or less?  you can measure the period to better
> precision than 1 sample unit.
> 

I more meant the step of fitting to a musical pitch (the ET problem again).

>> However, the human voice (especially
>> the speaking voice) is not exactly over-laden with partials, so the
>> calculated pitch may have a somewhat larger error margin.
> 

(hidden by wide vibrato in many cases!)

> better method is time domain.  lots of computations, but DSP chips are
> s'posed to do lotsa computations.  and there are ways of dividing the
> work.
> 

I have never used time-domain methods, as in the area I am involved with 
(E/A music, analysis and transformation) the interest is in finding all 
the components  and their evolution over time (partial tracking et al.). 
A PhD student of a colleague at Bath Uni did a very interesting project 
some years ago on guitar pitch tracking using Kalman filters (~don't~ 
ask me for any details! - I think there is at least a DaFX paper 
somewhere); the issue being to get the pitch as soon as possible even 
during the relatively chaotic attack. I mainly remember it because she 
was logged onto the same machine I was, but she was running lots of very 
heavy Matlab simulations (the sorts that take hours to complete), so 
everything I was doing suddenly slowed to a crawl.

Richard Dobson

Reply by rickman ●June 17, 20092009-06-17

On Jun 16, 1:48&#4294967295;am, robert bristow-johnson <r...@audioimagination.com>
wrote:
> On Jun 15, 11:58&#4294967295;pm, rickman <gnu...@gmail.com> wrote:
>
>
>
> > I think I would rather read some sort of insult rather than just
> > changing the subject line. &#4294967295;There has to be something very strange
> > going on over that side of the ocean. &#4294967295;This behavior is just a little
> > too weird.
>
> dunno which side yer on, Rick, you might find Vlad is, i think, on the
> Oklahoma side of the ocean.

Well, that certainly explains a lot!

Rick

pitch question

Sign in

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group