Forums

pitch question

Started by HyeeWang June 15, 2009
Here are two questions about pitch retrieval method, which puzzled me
for a long time.

1. Since pitch is a kind of subjective and perceived factor, how it
can be searched by computing ?

2. Let us assume pitch can be searched by that.
   As we know, the pitch value of old man can be low to 55hz or so.
Some well-known speech codec have a   high pass filter located at the
beginning of process. That is G729 -- A second order pole/zero filter
  with a cut-off frequency of 140 Hz ,and G723.1 -- removes the DC
element from the input speech.  Since the components of 55 hz  have
been removed from signal,how can it be retrieved again?
 It is a non-existent element.

Cheers
HyeeWang
HyeeWang wrote:
> Here are two questions about pitch retrieval method, which puzzled me > for a long time. > > 1. Since pitch is a kind of subjective and perceived factor, how it > can be searched by computing ? >
Yes, it is subjective some of the time; but a lot of the time it is nothing more or less than the fundamental of a harmonic sound (i.e. harmonics in tune with the fundamental). So it can be found simply by finding the lowest peak. Not robust unless you already know you are analysing a single tone. Otherwise:
> 2. Let us assume pitch can be searched by that. > As we know, the pitch value of old man can be low to 55hz or so. > Some well-known speech codec have a high pass filter located at the > beginning of process. That is G729 -- A second order pole/zero filter > with a cut-off frequency of 140 Hz ,and G723.1 -- removes the DC > element from the input speech. Since the components of 55 hz have > been removed from signal,how can it be retrieved again? > It is a non-existent element. >
You can exploit the way very cheap small transistor radios work(ed). They had rubbish LF response, but kids could still listen to their favourite tracks almost obliviously (and even classical music was often quite listenable-to). The ear is able to reconstruct missing fundamentals from the harmonics that ~are~ present. At its simplest, the phenomenon is called the "resultant tone" generated as the Hz difference between two tones - in effect, a beat with a frequency in the audible range. I even demonstrate this to my flute students by playing two high notes between us, and hearing the low resultant tone very clearly. The trick for us is to tune the two notes such that the resultant is also in tune (Equal Temperament is your enemy here). In this situation, the quasi-fundamental does not appear in spectrum analysis, it is a trick performed by the ear. Consequently a more sophisticated (frequency-domain) pitch detection algorithm looks for the N strongest components (and the Hz differences between them) and calculates the likely fundamental as the average difference. This will work reasonably well even in the absence of the fundamental. In principle, you can take just harmonics 4 to 8 and still work out the fundamental pretty accurately. You accept a margin of error of at least 14 Cents, maybe more. However, the human voice (especially the speaking voice) is not exactly over-laden with partials, so the calculated pitch may have a somewhat larger error margin. Richard Dobson

HyeeWang wrote:

> Here are two questions about pitch retrieval method, which puzzled me > for a long time. > > 1. Since pitch is a kind of subjective and perceived factor, how it > can be searched by computing ? > > 2. Let us assume pitch can be searched by that. > As we know, the pitch value of old man can be low to 55hz or so. > Some well-known speech codec have a high pass filter located at the > beginning of process. That is G729 -- A second order pole/zero filter > with a cut-off frequency of 140 Hz ,and G723.1 -- removes the DC > element from the input speech. Since the components of 55 hz have > been removed from signal,how can it be retrieved again? > It is a non-existent element. > > Cheers > HyeeWang
On Jun 15, 10:28&#2013266080;am, Vladimir Vassilevsky <antispam_bo...@hotmail.com>
wrote:
> HyeeWang wrote: > > Here are two questions about pitch retrieval method, which puzzled me > > for a long time. > > > 1. Since pitch is a kind of subjective and perceived factor, how it > > can be searched by computing ? > > > 2. Let us assume pitch can be searched by that. > > &#2013266080; &#2013266080;As we know, the pitch value of old man can be low to 55hz or so. > > Some well-known speech codec have a &#2013266080; high pass filter located at the > > beginning of process. That is G729 -- A second order pole/zero filter > > &#2013266080; with a cut-off frequency of 140 Hz ,and G723.1 -- removes the DC > > element from the input speech. &#2013266080;Since the components of 55 hz &#2013266080;have > > been removed from signal,how can it be retrieved again? > > &#2013266080;It is a non-existent element. > > > Cheers > > HyeeWang
I think I would rather read some sort of insult rather than just changing the subject line. There has to be something very strange going on over that side of the ocean. This behavior is just a little too weird. Rick
On Jun 15, 11:58&#2013266080;pm, rickman <gnu...@gmail.com> wrote:
> > I think I would rather read some sort of insult rather than just > changing the subject line. &#2013266080;There has to be something very strange > going on over that side of the ocean. &#2013266080;This behavior is just a little > too weird. >
dunno which side yer on, Rick, you might find Vlad is, i think, on the Oklahoma side of the ocean. On Jun 15, 4:37 am, Richard Dobson <richarddob...@blueyonder.co.uk> wrote:
> HyeeWang wrote: > > Here are two questions about pitch retrieval method, which puzzled me > > for a long time. > > > 1. Since pitch is a kind of subjective and perceived factor, how it > > can be searched by computing ? > > Yes, it is subjective some of the time; but a lot of the time it is > nothing more or less than the fundamental of a harmonic sound (i.e. > harmonics in tune with the fundamental).
probably, since "pitch" is quantified as a musical measure, you might want to also log the fundamental (base 2 if your unit is octaves).
> So it can be found simply by finding the lowest peak.
you can have a periodic (probably we're dealing with quasi-periodic functions for most musical notes or tones) function without a fundamental. or the fundamental has a poor S/N ratio. something that only assumes periodicity, like AMDF or some autocorrelation, is likely what the OP should use. the speech processing papers that also needed to use a good pitch detector used these methods as far back as the 70s, i think.
> Not robust unless you already know you are analysing a single tone.
i dunno if it's even good for that, Richard. :-\
> Otherwise: > > > 2. Let us assume pitch can be searched by that. > > As we know, the pitch value of old man can be low to 55hz or so. > > Some well-known speech codec have a high pass filter located at the > > beginning of process. That is G729 -- A second order pole/zero filter > > with a cut-off frequency of 140 Hz ,and G723.1 -- removes the DC > > element from the input speech. Since the components of 55 hz have > > been removed from signal,how can it be retrieved again? > > It is a non-existent element. > > You can exploit the way very cheap small transistor radios work(ed). > They had rubbish LF response, but kids could still listen to their > favourite tracks almost obliviously (and even classical music was often > quite listenable-to). The ear is able to reconstruct missing > fundamentals from the harmonics that ~are~ present. At its simplest, the > phenomenon is called the "resultant tone" generated as the Hz difference > between two tones - in effect, a beat with a frequency in the audible > range. I even demonstrate this to my flute students by playing two high > notes between us, and hearing the low resultant tone very clearly. The > trick for us is to tune the two notes such that the resultant is also in > tune (Equal Temperament is your enemy here).
depends on the interval. 3/2 ain't so bad with equal temperament. 2 cents off. major 3rds (what would be 4/3) doesn't do so good with equal temp.
> In this situation, the > quasi-fundamental does not appear in spectrum analysis, it is a trick > performed by the ear.
if the note (with the missing fundamental) is by itself or is quite prominent with respect to the rest of the music, then running that quasi-periodic signal through a static nonlinearty will create a different (and distorted) quasi-periodic signal with the same quasi- constant fundamental. but, because of the nonlinearity, this periodic signal might have some energy in it's fundamental from cross-products of neighboring harmonics (like the 13th and 14th, but more likely the 2nd and 3rd, but all such pairs will contribute). our hearing system may have evolved some similar nonlinearity to do that. but it doesn't matter. since fundamental frequency (or period) is the parameter of interest, you should use a method that makes the fewest assumptions necessary. assuming that there is a period of sorts (which defines the fundamental frequency) makes a more general assumption than also requiring energy at the fundamental. it only requires that the harmonics are all integer multiples of some hypothetical candidate fundamental. it doesn't need energy at the fundamental (which is what you might do with a spectrum analyzer method).
> Consequently a more sophisticated (frequency-domain) pitch detection > algorithm looks for the N strongest components (and the Hz differences > between them) and calculates the likely fundamental as the average > difference. This will work reasonably well even in the absence of the > fundamental.
so will autocorrelation or AMDF. how will your method work if a note had the 3rd, 5th, 7th, 10th, and 13th harmonic? (no energy in the other harmonics.)
> In principle, you can take just harmonics 4 to 8 and still > work out the fundamental pretty accurately. You accept a margin of error > of at least 14 Cents, maybe more.
how about 1 cent or less? you can measure the period to better precision than 1 sample unit.
> However, the human voice (especially > the speaking voice) is not exactly over-laden with partials, so the > calculated pitch may have a somewhat larger error margin.
better method is time domain. lots of computations, but DSP chips are s'posed to do lotsa computations. and there are ways of dividing the work. r b-j
robert bristow-johnson wrote:



> so will autocorrelation or AMDF. how will your method work if a note > had the 3rd, 5th, 7th, 10th, and 13th harmonic? (no energy in the > other harmonics.) >
Hmm well, what would you decide if you listen to it (you have not specified relative strengths of each component)? The first three might be good enough, ~if~ strong enough and high enough, to generate a recognisable resultant tone. The first 4 correspond to a sort-of widely-spaced diminished triad (5th+10th should = an octave, so a prominent combination), with the top component being harmonically less well-related (e.g. G,E',Bb-',E'',A-''), so I would guess most people (especially given the 7th harmonic is 1/4tone "out of tune" relative to ET) would call it a sort of bell or metallic tone. So, in the category perhaps of "ambiguous". It is known that many bells in effect have sub-audio fundamentals (or sub-harmonics, whatever!); and even a cymbal does if you listen to it close-up edge-on.
>> In principle, you can take just harmonics 4 to 8 and still >> work out the fundamental pretty accurately. You accept a margin of error >> of at least 14 Cents, maybe more. > > how about 1 cent or less? you can measure the period to better > precision than 1 sample unit. >
I more meant the step of fitting to a musical pitch (the ET problem again).
>> However, the human voice (especially >> the speaking voice) is not exactly over-laden with partials, so the >> calculated pitch may have a somewhat larger error margin. >
(hidden by wide vibrato in many cases!)
> better method is time domain. lots of computations, but DSP chips are > s'posed to do lotsa computations. and there are ways of dividing the > work. >
I have never used time-domain methods, as in the area I am involved with (E/A music, analysis and transformation) the interest is in finding all the components and their evolution over time (partial tracking et al.). A PhD student of a colleague at Bath Uni did a very interesting project some years ago on guitar pitch tracking using Kalman filters (~don't~ ask me for any details! - I think there is at least a DaFX paper somewhere); the issue being to get the pitch as soon as possible even during the relatively chaotic attack. I mainly remember it because she was logged onto the same machine I was, but she was running lots of very heavy Matlab simulations (the sorts that take hours to complete), so everything I was doing suddenly slowed to a crawl. Richard Dobson
On Jun 16, 1:48&#2013266080;am, robert bristow-johnson <r...@audioimagination.com>
wrote:
> On Jun 15, 11:58&#2013266080;pm, rickman <gnu...@gmail.com> wrote: > > > > > I think I would rather read some sort of insult rather than just > > changing the subject line. &#2013266080;There has to be something very strange > > going on over that side of the ocean. &#2013266080;This behavior is just a little > > too weird. > > dunno which side yer on, Rick, you might find Vlad is, i think, on the > Oklahoma side of the ocean.
Well, that certainly explains a lot! Rick