DSPRelated.com
Forums

funding fundamental frequency(pitch)

Started by cyberaishu November 3, 2007
On Nov 4, 5:55 pm, robert bristow-johnson <r...@audioimagination.com>
wrote:

> the problem is, we don't know precisely what the psychoacoustic model > is. we do not know precisely how humans judge the pitch of a sound > (if they judge it even *has* pitch). normally there is a pretty high > correlation of perceived pitch to the (highest possible) fundamental > frequency if the note is a quasi-periodic function of time. but even > though mathematically it may be a 220 Hz tone, i'll bet that if you > add a 220 Hz tone, with amplitude reduced by 70 dB, to a 440 Hz tone, > everyone (but some pitch detectors) will say the note is clearly > A440. but mathematically it is a 220 Hz tone. so at what threshold > do we say that attenuated odd harmonics don't count?
Your leaving out the time domain. My guess is that if you take a near ambiguous spectrum, and paste on the opposite transient, then the human would pick the pitch matching the transient, not the pitch appropriate to the actual harmonic content. e.g. the human would ignore the low 220 Hz tone added to the sustain portion of a 440 Hz A piano note, and might not notice if the fundamental and all the odd harmonics were gradually filtered out of a piano note at 220 Hz A. IMHO. YMMV. -- rhn A.T nicholson d.0.t C-o-M
Hi Robert,

robert bristow-johnson wrote:
> On Nov 4, 3:08 pm, Vladimir Vassilevsky <antispam_bo...@hotmail.com> > wrote: >> robert bristow-johnson wrote: >> > ... >>> i think his paper (ICASSP or similar conference) is on his site. if >>> not, maybe i can find my copy of it laying around and send it to you. >> I couldn't find the original paper. Can you please send it to me. >> > > yup, it looks like http://www.soundmathtech.com/ is outa business. so > i can't find it on the web anywhere either. here is the thread where > he first announced (as far as i can tell): > > http://groups.google.com/group/comp.speech.research/browse_frm/thread/47c863a624aac888/1897d8af3742992a#1897d8af3742992a > > and you can see my initial response. > > but Vlad, i can't find a copy here. i have a copy on my computer at > my home which is 4 hours away from where i am now (i work in the > Boston area, but my family is in Vermont). so it will take a while > (and i hope i don't forget). > >>>> If the goal is the best perceived >>>> quality (speech coding, speed/pitch change, etc.) then the best solution >>>> is the closed loop search near the possible candidates. And the >>>> canditates can be sorted out by either method; there is not much of a >>>> difference. >>> but the candidate picking is the big deal. that's still like alchemy, >>> very AI-ish. that's still where the patents and trade-secrets lie. >>> that's where some pitch-detection algs sound better than other pitch- >>> detection algs. >> I would start with the quantitative definition of what does it mean >> "better", i.e. what is the goal. The error in the time or frequency >> domain can be weighted against a psychoacoustic model; the best pitch >> value is the one which minimizes the error. > > the problem is, we don't know precisely what the psychoacoustic model > is. we do not know precisely how humans judge the pitch of a sound > (if they judge it even *has* pitch). normally there is a pretty high > correlation of perceived pitch to the (highest possible) fundamental > frequency if the note is a quasi-periodic function of time. but even > though mathematically it may be a 220 Hz tone, i'll bet that if you > add a 220 Hz tone, with amplitude reduced by 70 dB, to a 440 Hz tone, > everyone (but some pitch detectors) will say the note is clearly > A440. but mathematically it is a 220 Hz tone. so at what threshold > do we say that attenuated odd harmonics don't count? > >>> what do you do when no candidate looks very good >>> (during transients or other times the input is not sufficiently quasi- >>> periodic)? >> Probably the model of the signal is oversimplified, so it doesn't fit >> the reality. It is a known phenomena that if the model doesn't match, >> then the most likelihood solution is unstable, since it jumps on the >> random features. >> >>> or the "octave problem" (lotsa different candidates all look about >>> equally good)? >> Pick the candidate which makes for the least weighted error. > > what if that candidate is the wrong octave (as people perceive the > pitch)? > >>> for the case when a 440 Hz tone has a very small >>> amplitude (like down by 70 dB) 220 Hz tone (a some other sub-harmonic) >>> added to it: is it A440 (or midi note 69) or A220 (midi note 57)? how >>> would we hear such a pitch? at what threshold do you stop ignoring >>> the sub-harmonic? >> I would start from the most likely candidate and its nearest neighbors. >> It is very unlikely that the far harmonics or subharmonics will produce >> the minimum weighted error. > > if you add a synchronous 220 Hz tone (of very low amplitude) to a 440 > Hz tone, *any* mathematical measure of the candidates will show the > 1/220 period to be better than the candidate at 440.
Surely you mean any common mathematical measure. Most metrics weight similarity above anything else. However, there is no fundamental reason why a metric should not be sensitive to the relative levels of the pitches, and weight a weak fundamental as having a low likelihood. I presume our hearing is implementing some relatively simple maths. We just haven't figured out what that is. We know it ignores a weak fundamental, yet will fill in a missing 1/3rd frequency when it detects a 1, 3, 5, 7, ... descending amplitude sequence, where some trigger tells the brain there is a bass pitch missing. Maybe nature pre-adapted us to crappy speakers. :-) We don't know an awful lot beyond that. I found it interesting you picked 440Hz and 220Hz for your example. The brain tends to favour 1, 3, 5, 7 harmonic sequences as the more likely. It is much more interesting to compare how 146.67Hz and 440Hz are handled. Even though the common pitch detectors are too dumb/too brute force for that to make any difference to them, it is clearly of great importance in any comparison to human perception. Steve
On Nov 4, 9:59 pm, "Ron N." <rhnlo...@yahoo.com> wrote:
> On Nov 4, 5:55 pm, robert bristow-johnson <r...@audioimagination.com> > wrote: > > > the problem is, we don't know precisely what the psychoacoustic model > > is. we do not know precisely how humans judge the pitch of a sound > > (if they judge it even *has* pitch). normally there is a pretty high > > correlation of perceived pitch to the (highest possible) fundamental > > frequency if the note is a quasi-periodic function of time. but even > > though mathematically it may be a 220 Hz tone, i'll bet that if you > > add a 220 Hz tone, with amplitude reduced by 70 dB, to a 440 Hz tone, > > everyone (but some pitch detectors) will say the note is clearly > > A440. but mathematically it is a 220 Hz tone. so at what threshold > > do we say that attenuated odd harmonics don't count? > > Your leaving out the time domain. My guess is that if you take > a near ambiguous spectrum, and paste on the opposite transient,
not sure exactly what you mean, Ron.
> then the human would pick the pitch matching the transient,
do transients necessarily have a pitch? i was thinking that "pitch" here had to do with the apparent short-term fundamental frequency which implies a little periodicity. transients are less periodic than quasi-periodic tones. clicks or blaps or thuds or twacks or thumps often don't have pitches.
> not > the pitch appropriate to the actual harmonic content. e.g. > the human would ignore the low 220 Hz tone added to the sustain > portion of a 440 Hz A piano note,
if the amplitude of the 220 Hz tone is low enough relative to the 440 Hz tone.
> and might not notice if the > fundamental and all the odd harmonics were gradually filtered > out of a piano note at 220 Hz A.
they would still have A220 on the brain if it started out as so? even if the only harmonics left were integer multiples of 440 Hz? i might agree to some extent, but this is were perceptual research should be done (and if it had been done, i would like to know what the results were). what would be interesting is to take an A220 piano note held down so that it lasts a few seconds. about 500 ms after the note begins, some signal processing ramps in a comb filter that kills 220 Hz, 660 Hz, 1100 Hz, and so on. will people eventually wake up and say "hey! that's not A220 anymore! that's A440!" ? i dunno. r b-j
On Nov 4, 10:15 pm, robert bristow-johnson <r...@audioimagination.com>
wrote:
> On Nov 4, 9:59 pm, "Ron N." <rhnlo...@yahoo.com> wrote: > > > > > On Nov 4, 5:55 pm, robert bristow-johnson <r...@audioimagination.com> > > wrote: > > > > the problem is, we don't know precisely what the psychoacoustic model > > > is. we do not know precisely how humans judge the pitch of a sound > > > (if they judge it even *has* pitch). normally there is a pretty high > > > correlation of perceived pitch to the (highest possible) fundamental > > > frequency if the note is a quasi-periodic function of time. but even > > > though mathematically it may be a 220 Hz tone, i'll bet that if you > > > add a 220 Hz tone, with amplitude reduced by 70 dB, to a 440 Hz tone, > > > everyone (but some pitch detectors) will say the note is clearly > > > A440. but mathematically it is a 220 Hz tone. so at what threshold > > > do we say that attenuated odd harmonics don't count? > > > Your leaving out the time domain. My guess is that if you take > > a near ambiguous spectrum, and paste on the opposite transient, > > not sure exactly what you mean, Ron. > > > then the human would pick the pitch matching the transient, > > do transients necessarily have a pitch? i was thinking that "pitch" > here had to do with the apparent short-term fundamental frequency > which implies a little periodicity. transients are less periodic than > quasi-periodic tones. clicks or blaps or thuds or twacks or thumps > often don't have pitches.
The thuds are like consonants. They don't have to be voiced or pitched for humans to recognize them. The thud from the beginning of different registers of piano notes can sound different, and people use these thuds to help decide what the rest of the harmonic content should sound like, thus biasing their octave decisions... or even what instrument they hear. I read about one experiment where the transient attack of one instrument was pasted onto the longer harmonic sustain of another. People usually heard a note played by the first instrument, even though most of the composite waveform came from the second. So it's quite possible that what "note" a person hears may, in some circumstances, have less to do with the exact harmonic content at a particular point in time than in what transient thud/blap/click immediately preceded it. Time domain history. Something to be considered for pitch recognition. IMHO. YMMV. -- rhn A.T nicholson d.0.t C-o-M
>> i'm still of the opinion that the old AMDF (Average Magnitude
Difference Function) or a variant (like ASDF with a window and perhaps a filter on the difference signal) is the method that makes the fewest assumptions. .. Yes, the program on my web page is brain dead. It's meant only as a starting point for the grunt work: how to load .wavs, simple example of FFT etc. Cheers Markus PS:
>> Matlab is for stupidents; real men do their 2+2=4 without it.
"and another thing I won't discuss is reeeee-ligion it always causes a fight" - Ted the Mechanic
Along with "missing fundamental" and "octave problem" I want to add
one more. This is a problem of simultaneous sounds, when two and more
strings or two and more instruments play different notes. Can Dmitry
Terez or someone else recommend any solution? Is situation really
hopeless?


Vladimir Malakhov wrote:

> Along with "missing fundamental" and "octave problem" I want to add > one more. This is a problem of simultaneous sounds, when two and more > strings or two and more instruments play different notes. Can Dmitry > Terez or someone else recommend any solution? Is situation really > hopeless?
The problem is *not* in the pitch detection. First, we have to decide what exactly we are looking for. The real problem consists of the following: 1. Defining the adequate parametric model of the audio signal. 2. Defining the cost function to evaluate the parameters of the model. Without (1) and (2), the notion of "pitch" does not make any sense. The above mentioned problems are caused by the poorly defined (1) and (2). Furthermore, there can't be the ultimate universal pitch detector, because it all depends. Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com
> 1. Defining the adequate parametric model of the audio signal. > > 2. Defining the cost function to evaluate the parameters of the model. > > Without (1) and (2), the notion of "pitch" does not make any sense. > > The above mentioned problems are caused by the poorly defined (1) and (2). > > Furthermore, there can't be the ultimate universal pitch detector, > because it all depends. > > Vladimir Vassilevsky > DSP and Mixed Signal Design Consultanthttp://www.abvolt.com
Yes, first we should take a spherical horse in vacuum -:)

Vladimir Malakhov wrote:

>>1. Defining the adequate parametric model of the audio signal. >> >>2. Defining the cost function to evaluate the parameters of the model. >> >>Without (1) and (2), the notion of "pitch" does not make any sense. >> >>The above mentioned problems are caused by the poorly defined (1) and (2). >> >>Furthermore, there can't be the ultimate universal pitch detector, >>because it all depends. >> > > Yes, first we should take a spherical horse in vacuum -:)
Sure. Why can't we hit the Moon with bow and arrows? Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com
On Nov 5, 9:54 am, Vladimir Vassilevsky <antispam_bo...@hotmail.com>
wrote:
> Vladimir Malakhov wrote: > >>1. Defining the adequate parametric model of the audio signal. > > >>2. Defining the cost function to evaluate the parameters of the model.
You seem to be assuming the existence of closed form parametric model for whatever it is that the OP wants to measure.
> >>Without (1) and (2), the notion of "pitch" does not make any sense.
'course it does. I've seen little kids can transcribe simple melodies, and complain when some nearby players are sufficiently sharp or flat. Are you defining "pitch" differently?
> >>The above mentioned problems are caused by the poorly defined (1) and (2). > > >>Furthermore, there can't be the ultimate universal pitch detector, > >>because it all depends. > > > Yes, first we should take a spherical horse in vacuum -:) > > Sure. Why can't we hit the Moon with bow and arrows?
(1) Because the gold standard for the location of the moon isn't defined by some impressions formed inside the minds of humans (say, trained musicians from a given subculture). (2) One probably could from the doorway of an Apollo lunar lander. :) IMHO. YMMV. -- rhn A.T nicholson d.0.t C-o-M