DSPRelated.com
Forums

funding fundamental frequency(pitch)

Started by cyberaishu November 3, 2007
HI,

We re working on a project dealing with south Indian
music signals.
We re right now stuck with finding out the fundamental
frequency of the signal.
Wat we ve done so far:
1)Segmented the signal and store it in an array
2)Find out DFT of the segments and store.
3)Find the cross correlation between adjacent elements
from the DFT values and store.
4) Take the maximum value from the cross correlated
values.

How do we get the fundamental frequency from this
value?
We re total newbies to dsp.So plz help us!!
Thanks
Aishwarya

I have an example program on my webpage that might get you started.
However, be aware that audio pitch detection is a science of its own. 

http://www.elisanet.fi/mnentwig/webroot/FFT_peaksearch_audio_example/index.html

To give one example, the ear may be tricked into hearing a fundamental
that isn't there: If I filter the fundamental away from a piano note,
chances are that my ear will still hear it as the original note. 
There are many older threads on this topic.

Cheers

Markus
On Nov 3, 11:53 am, "mnentwig" <mnent...@elisanet.fi> wrote:
> I have an example program on my webpage that might get you started. > However, be aware that audio pitch detection is a science of its own. > > http://www.elisanet.fi/mnentwig/webroot/FFT_peaksearch_audio_example/... > > To give one example, the ear may be tricked into hearing a fundamental > that isn't there: If I filter the fundamental away from a piano note, > chances are that my ear will still hear it as the original note. > There are many older threads on this topic. >
i'm still of the opinion that the old AMDF (Average Magnitude Difference Function) or a variant (like ASDF with a window and perhaps a filter on the difference signal) is the method that makes the fewest assumptions. it only assumes some notion of periodicity and looks for the best period, given some error cost weighting applied to the difference signal (absolute value and squared are but two possible choices). you look for minimums in that and try to wisely choose (and stick with) the right minimum. that takes a little "expert systems" or AIish thinking in the alg. r b-j

cyberaishu wrote:

> HI, > > We re working on a project dealing with south Indian > music signals. > We re right now stuck with finding out the fundamental > frequency of the signal.
1. What is your definition of "fundamental frequency" ? What exactly are you looking for? 2. Are you sure there is such fundamental frequency in your signal? There very well could be none. 3. Dmitry Teres claims that he invented the ultimate pitch detector. Search the archives of this newsgroup. Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com
On Nov 3, 5:19 pm, Vladimir Vassilevsky <antispam_bo...@hotmail.com>
wrote:
> cyberaishu wrote: > > HI, > > > We re working on a project dealing with south Indian > > music signals. > > We re right now stuck with finding out the fundamental > > frequency of the signal. > > 1. What is your definition of "fundamental frequency" ? What exactly are > you looking for? > > 2. Are you sure there is such fundamental frequency in your signal? > There very well could be none. > > 3. Dmitry Teres claims that he invented the ultimate pitch detector. > Search the archives of this newsgroup.
i have read Dmitry's paper when he first announced it to this group, and, not counting his alternative method that made some use of SVD (single-value decomposition) which he did not describe in sufficient detail for me to understand, his published method is a sorta weird twist of AMDF, with some non-invertable non-linear operation (a step function) applied to intermediate data and using a histogram as an additional method of summing errors (or goodness of fit) for the different trial periods. it's still a method that compares (subtracts) the assumed quasi-periodic signal to a delayed copy of itself for some number of samples. these delays corresponding to various trial periods and then deciding on which trial period is the best pick (that results in the smallest difference function). i couldn't get his MATLAB program to work for me (and am still willing to, if i can get it to work on Octave, i don't have a current implementation of MATLAB), so i dunno how well it works and will not repeat what i've heard about that (i want to judge for myself). but, from what i read in his paper, it's another form of the AMDF (with a significantly souped-up means of adding up the score), even though Dmitry had not agreed with me about that assessment of the algorithm. r b-j

robert bristow-johnson wrote:


>>3. Dmitry Teres claims that he invented the ultimate pitch detector. >>Search the archives of this newsgroup. > > > i have read Dmitry's paper when he first announced it to this group, > and, not counting his alternative method that made some use of SVD > (single-value decomposition) which he did not describe in sufficient > detail for me to understand, his published method is a sorta weird > twist of AMDF, with some non-invertable non-linear operation (a step > function) applied to intermediate data and using a histogram as an > additional method of summing errors (or goodness of fit) for the > different trial periods. it's still a method that compares > (subtracts) the assumed quasi-periodic signal to a delayed copy of > itself for some number of samples. these delays corresponding to > various trial periods and then deciding on which trial period is the > best pick (that results in the smallest difference function).
I read through his patent and got the same impression. What I didn't understand is if, how, why and when his method is supposed to be superior to the well known approaches and how big is the advantage. IMO the optimal way of the pitch detection depends on the application; there can't be the universal approach. If the goal is the best perceived quality (speech coding, speed/pitch change, etc.) then the best solution is the closed loop search near the possible candidates. And the canditates can be sorted out by either method; there is not much of a difference.
> i couldn't get his MATLAB program to work for me (and am still willing > to, if i can get it to work on Octave, i don't have a current > implementation of MATLAB),
Fie. Matlab is for stupidents; real men do their 2+2=4 without it.
> so i dunno how well it works and will not > repeat what i've heard about that (i want to judge for myself). but, > from what i read in his paper, it's another form of the AMDF (with a > significantly souped-up means of adding up the score), even though > Dmitry had not agreed with me about that assessment of the algorithm.
I think your assessment is right. Fortunately, Dmitry didn't fall into fractal wavelet fuzzy genetic neural crap pseudoscience... Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com
On Nov 3, 7:35 pm, Vladimir Vassilevsky <antispam_bo...@hotmail.com>
wrote:
> robert bristow-johnson wrote: > >>3. Dmitry Teres claims that he invented the ultimate pitch detector. > >>Search the archives of this newsgroup. > > > i have read Dmitry's paper when he first announced it to this group, > > and, not counting his alternative method that made some use of SVD > > (single-value decomposition) which he did not describe in sufficient > > detail for me to understand, his published method is a sorta weird > > twist of AMDF, with some non-invertable non-linear operation (a step > > function) applied to intermediate data and using a histogram as an > > additional method of summing errors (or goodness of fit) for the > > different trial periods. it's still a method that compares > > (subtracts) the assumed quasi-periodic signal to a delayed copy of > > itself for some number of samples. these delays corresponding to > > various trial periods and then deciding on which trial period is the > > best pick (that results in the smallest difference function). > > I read through his patent and got the same impression.
i think his paper (ICASSP or similar conference) is on his site. if not, maybe i can find my copy of it laying around and send it to you.
> What I didn't > understand is if, how, why and when his method is supposed to be > superior to the well known approaches and how big is the advantage.
Dmitry really kicked into "salesman mode" with all the confidence and bluster associated with it, which made it hard for me to take his alg as seriously as he does.
> IMO the optimal way of the pitch detection depends on the application; > there can't be the universal approach.
in the case where we're detecting the audible pitch of harmonic or quasi-periodic notes, which is the case for monophonic sounds coming from a very large class of pitch musical instruments, i think there *can* be a universal approach, that can get better and better, as we work out most of these known issues (most notably, transient problems and the "octave problem"), and if we can tolerate some latency in the pitch detection. the case where this quasi-periodic nature is not a given (some bells, transients from note attacks, and percussive sounds that can be described as short and sorta staccato bursts of filtered noise), then it's likely that a completely different algorithm (not based on the delay-difference signal), perhaps some kinda peak-picking in the windowed spectra, may have to be used. in that sense i agree with you.
> If the goal is the best perceived > quality (speech coding, speed/pitch change, etc.) then the best solution > is the closed loop search near the possible candidates. And the > canditates can be sorted out by either method; there is not much of a > difference.
well, not much, if the candidates all come from examining the difference signal. maybe a little. but the candidate picking is the big deal. that's still like alchemy, very AI-ish. that's still where the patents and trade-secrets lie. that's where some pitch-detection algs sound better than other pitch- detection algs. what do you do when no candidate looks very good (during transients or other times the input is not sufficiently quasi- periodic)? or the "octave problem" (lotsa different candidates all look about equally good)? for the case when a 440 Hz tone has a very small amplitude (like down by 70 dB) 220 Hz tone (a some other sub-harmonic) added to it: is it A440 (or midi note 69) or A220 (midi note 57)? how would we hear such a pitch? at what threshold do you stop ignoring the sub-harmonic? if you hear a synthesizer oscillator hooked to a pitch detector having such a problem, the pitch of the oscillator will jump up and down from one possible harmonic to another, during a single note, and will sound like dog excrement. Tuvan throat singers can really kill a pitch- detector with the octave problem, but other singers, singing some note but starting out with their mouth cavity tuned to the 2nd or 3rd harmonic and backing off from that can also cause it. how do you get this mindless pitch-detection algorithm that looks at, what initially appears to be a 440 Hz waveform, but ends up as a 220 Hz waveform (without glissando from one note to the other), to say at the outset, that it was 220 Hz? and if you bias the threshold to choose 1/220 second as the period over the nearly equally good 1/440 second period candidate, what are you gonna do if this super small 220 Hz subharmonic gets weaker (fades to silence)? your pitch-detector will say it's A220 when the person listening to the note thinks it's A440. *that* is the octave problem.
> > i couldn't get his MATLAB program to work for me (and am still willing > > to, if i can get it to work on Octave, i don't have a current > > implementation of MATLAB), > > Fie. Matlab is for stupidents; real men do their 2+2=4 without it.
naw, MATLAB (or Octave) can be useful. do you actually design filters or look at FFT data or such with your own C code? i used to do that in the early ninety's (i even wrote a few papers with graphics generated with my own C code), but eventually got a little lazy using MATLAB. the usefulness i recognized to the extent that i was very unhappy that The Math Works (and the inventor of MATLAB, Cleve Moler) could see no benefit to extending the language (in a backward compatible manner) so that we could define the base or origin to the indices of every dimension in an array. they are very subborn about it, and i think foolishly so. their resistance comes from arrogance, obstinance, and lack of vision (and NIH, the "not-invented-here" syndrome), not because of a defensible technical reason. personally i wish that i knew C++ a little better (or a decent OOP from which i've been told that Smalltalk is s'posed to be) and a real nice set of classes for representing matrices (and arrays), complex numbers, matricies with complex elements, and such (along with methods performing the operations that we find handy in MATLAB including display functions), would be better and more portable to implementations. that is code you write for concept development and testing could slip right into a build of a real application or embedded target.
> > so i dunno how well it works and will not > > repeat what i've heard about that (i want to judge for myself). but, > > from what i read in his paper, it's another form of the AMDF (with a > > significantly souped-up means of adding up the score), even though > > Dmitry had not agreed with me about that assessment of the algorithm. > > I think your assessment is right. Fortunately, Dmitry didn't fall into > fractal wavelet fuzzy genetic neural crap pseudoscience...
as far as i could tell, his histograms would hit peaks at the period (and integer multiples of the period, so Dmitry's alg still has the "octave problem") of a periodic or quasi-periodic function. r b-j
On Nov 3, 1:37 pm, robert bristow-johnson <r...@audioimagination.com>
wrote:
> On Nov 3, 11:53 am, "mnentwig" <mnent...@elisanet.fi> wrote: > > To give one example, the ear may be tricked into hearing a fundamental > > that isn't there: If I filter the fundamental away from a piano note, > > chances are that my ear will still hear it as the original note. > > There are many older threads on this topic. > > i'm still of the opinion that the old AMDF (Average Magnitude > Difference Function) or a variant (like ASDF with a window and perhaps > a filter on the difference signal) is the method that makes the fewest > assumptions.
Pitch transcription is usually measured against what a trained human musician would decide. Is how a human ear processes music more like an AMDF search, or more like overlapping filter banks fed into some sort of pattern matching process? As for the octave problem, my guess is that the human ear/brain doesn't really solve it. It may guess based on the transient preceding the sustained periodicity, and go with that decision, even if slightly wrong. IMHO. YMMV. -- rhn A.T nicholson d.0.t C-o-M

robert bristow-johnson wrote:

>>>>3. Dmitry Teres claims that he invented the ultimate pitch detector. >> >>>i have read Dmitry's paper when he first announced it to this group, >>>and, not counting his alternative method that made some use of SVD >>>(single-value decomposition) which he did not describe in sufficient >>>detail for me to understand, his published method is a sorta weird >>>twist of AMDF
>>I read through his patent and got the same impression. > > i think his paper (ICASSP or similar conference) is on his site. if > not, maybe i can find my copy of it laying around and send it to you.
I couldn't find the original paper. Can you please send it to me.
>>If the goal is the best perceived >>quality (speech coding, speed/pitch change, etc.) then the best solution >>is the closed loop search near the possible candidates. And the >>canditates can be sorted out by either method; there is not much of a >>difference. > > > but the candidate picking is the big deal. that's still like alchemy, > very AI-ish. that's still where the patents and trade-secrets lie. > that's where some pitch-detection algs sound better than other pitch- > detection algs.
I would start with the quantitative definition of what does it mean "better", i.e. what is the goal. The error in the time or frequency domain can be weighted against a psychoacoustic model; the best pitch value is the one which minimizes the error.
> what do you do when no candidate looks very good > (during transients or other times the input is not sufficiently quasi- > periodic)?
Probably the model of the signal is oversimplified, so it doesn't fit the reality. It is a known phenomena that if the model doesn't match, then the most likelihood solution is unstable, since it jumps on the random features.
> or the "octave problem" (lotsa different candidates all look about > equally good)?
Pick the candidate which makes for the least weighted error.
> for the case when a 440 Hz tone has a very small > amplitude (like down by 70 dB) 220 Hz tone (a some other sub-harmonic) > added to it: is it A440 (or midi note 69) or A220 (midi note 57)? how > would we hear such a pitch? at what threshold do you stop ignoring > the sub-harmonic?
I would start from the most likely candidate and its nearest neighbors. It is very unlikely that the far harmonics or subharmonics will produce the minimum weighted error.
> if you hear a synthesizer oscillator hooked to a pitch detector having > such a problem, the pitch of the oscillator will jump up and down from > one possible harmonic to another, during a single note, and will sound > like dog excrement. Tuvan throat singers can really kill a pitch- > detector with the octave problem, but other singers, singing some note > but starting out with their mouth cavity tuned to the 2nd or 3rd > harmonic and backing off from that can also cause it.
The model of the signal should include the hidden Markov chain to predict the variations.
> how do you get this mindless pitch-detection algorithm that looks at, > what initially appears to be a 440 Hz waveform, but ends up as a 220 > Hz waveform (without glissando from one note to the other), to say at > the outset, that it was 220 Hz?
A mindless algorithm just supplies a set of candidates, i.e. defines the area for the deep search.
>>Fie. Matlab is for stupidents; real men do their 2+2=4 without it. > naw, MATLAB (or Octave) can be useful. do you actually design filters
Yes, indeed. Actually, only the doing the things by your own hand provides for the understanding of the subject and develops the skill. Look at this newsgroup: the dummies are not asking of how to do the 2+2=4, they are asking how to get everything already available in Matlab.
> or look at FFT data or such with your own C code?
I do the plots by importing the data to Excel.
> i used to do that > in the early ninety's (i even wrote a few papers with graphics > generated with my own C code), but eventually got a little lazy using > MATLAB. the usefulness i recognized to the extent that i was very > unhappy that The Math Works (and the inventor of MATLAB, Cleve Moler) > could see no benefit to extending the language (in a backward > compatible manner) so that we could define the base or origin to the > indices of every dimension in an array. they are very subborn about > it, and i think foolishly so. their resistance comes from arrogance, > obstinance, and lack of vision (and NIH, the "not-invented-here" > syndrome), not because of a defensible technical reason.
MatLab is too heavy to change anything. Any modification will inevitably introduce some incompatibility, and the crowds of helpless idiots will be running around and screaming. Mr. Moler doesn't want it to happen.
> personally i wish that i knew C++ a little better (or a decent OOP > from which i've been told that Smalltalk is s'posed to be) and a real > nice set of classes for representing matrices (and arrays), complex > numbers, matricies with complex elements, and such (along with methods > performing the operations that we find handy in MATLAB including > display functions), would be better and more portable to > implementations. that is code you write for concept development and > testing could slip right into a build of a real application or > embedded target.
Matlab advertized the ability to generate the C code for TMS320x from the Matlab source. I don't know how well it works in reality although I have some doubts about it. Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com
On Nov 4, 3:08 pm, Vladimir Vassilevsky <antispam_bo...@hotmail.com>
wrote:
> robert bristow-johnson wrote: >
...
> > > i think his paper (ICASSP or similar conference) is on his site. if > > not, maybe i can find my copy of it laying around and send it to you. > > I couldn't find the original paper. Can you please send it to me. >
yup, it looks like http://www.soundmathtech.com/ is outa business. so i can't find it on the web anywhere either. here is the thread where he first announced (as far as i can tell): http://groups.google.com/group/comp.speech.research/browse_frm/thread/47c863a624aac888/1897d8af3742992a#1897d8af3742992a and you can see my initial response. but Vlad, i can't find a copy here. i have a copy on my computer at my home which is 4 hours away from where i am now (i work in the Boston area, but my family is in Vermont). so it will take a while (and i hope i don't forget).
> >>If the goal is the best perceived > >>quality (speech coding, speed/pitch change, etc.) then the best solution > >>is the closed loop search near the possible candidates. And the > >>canditates can be sorted out by either method; there is not much of a > >>difference. > > > but the candidate picking is the big deal. that's still like alchemy, > > very AI-ish. that's still where the patents and trade-secrets lie. > > that's where some pitch-detection algs sound better than other pitch- > > detection algs. > > I would start with the quantitative definition of what does it mean > "better", i.e. what is the goal. The error in the time or frequency > domain can be weighted against a psychoacoustic model; the best pitch > value is the one which minimizes the error.
the problem is, we don't know precisely what the psychoacoustic model is. we do not know precisely how humans judge the pitch of a sound (if they judge it even *has* pitch). normally there is a pretty high correlation of perceived pitch to the (highest possible) fundamental frequency if the note is a quasi-periodic function of time. but even though mathematically it may be a 220 Hz tone, i'll bet that if you add a 220 Hz tone, with amplitude reduced by 70 dB, to a 440 Hz tone, everyone (but some pitch detectors) will say the note is clearly A440. but mathematically it is a 220 Hz tone. so at what threshold do we say that attenuated odd harmonics don't count?
> > what do you do when no candidate looks very good > > (during transients or other times the input is not sufficiently quasi- > > periodic)? > > Probably the model of the signal is oversimplified, so it doesn't fit > the reality. It is a known phenomena that if the model doesn't match, > then the most likelihood solution is unstable, since it jumps on the > random features. > > > or the "octave problem" (lotsa different candidates all look about > > equally good)? > > Pick the candidate which makes for the least weighted error.
what if that candidate is the wrong octave (as people perceive the pitch)?
> > for the case when a 440 Hz tone has a very small > > amplitude (like down by 70 dB) 220 Hz tone (a some other sub-harmonic) > > added to it: is it A440 (or midi note 69) or A220 (midi note 57)? how > > would we hear such a pitch? at what threshold do you stop ignoring > > the sub-harmonic? > > I would start from the most likely candidate and its nearest neighbors. > It is very unlikely that the far harmonics or subharmonics will produce > the minimum weighted error.
if you add a synchronous 220 Hz tone (of very low amplitude) to a 440 Hz tone, *any* mathematical measure of the candidates will show the 1/220 period to be better than the candidate at 440.
> > if you hear a synthesizer oscillator hooked to a pitch detector having > > such a problem, the pitch of the oscillator will jump up and down from > > one possible harmonic to another, during a single note, and will sound > > like dog excrement. Tuvan throat singers can really kill a pitch- > > detector with the octave problem, but other singers, singing some note > > but starting out with their mouth cavity tuned to the 2nd or 3rd > > harmonic and backing off from that can also cause it. > > The model of the signal should include the hidden Markov chain to > predict the variations. > > > how do you get this mindless pitch-detection algorithm that looks at, > > what initially appears to be a 440 Hz waveform, but ends up as a 220 > > Hz waveform (without glissando from one note to the other), to say at > > the outset, that it was 220 Hz? > > A mindless algorithm just supplies a set of candidates, i.e. defines the > area for the deep search. > > >>Fie. Matlab is for stupidents; real men do their 2+2=4 without it. > > naw, MATLAB (or Octave) can be useful. do you actually design filters > > Yes, indeed. Actually, only the doing the things by your own hand > provides for the understanding of the subject and develops the skill. > Look at this newsgroup: the dummies are not asking of how to do the > 2+2=4, they are asking how to get everything already available in Matlab. > > > or look at FFT data or such with your own C code? > > I do the plots by importing the data to Excel. > > > i used to do that > > in the early ninety's (i even wrote a few papers with graphics > > generated with my own C code), but eventually got a little lazy using > > MATLAB. the usefulness i recognized to the extent that i was very > > unhappy that The Math Works (and the inventor of MATLAB, Cleve Moler) > > could see no benefit to extending the language (in a backward > > compatible manner) so that we could define the base or origin to the > > indices of every dimension in an array. they are very subborn about > > it, and i think foolishly so. their resistance comes from arrogance, > > obstinance, and lack of vision (and NIH, the "not-invented-here" > > syndrome), not because of a defensible technical reason. > > MatLab is too heavy to change anything. Any modification will inevitably > introduce some incompatibility,
no. not true. you can make mods that are guaranteed to be backward compatible. now if a user changes their default array (that is always 1-origin when it is first created) to an array with some other base or origin, then that code could not have existed in the days preceding the mod. there is no problem with creating another structure inside a MATLAB variable that defines the origin for every dimension (usually it's two dimensions).
> and the crowds of helpless idiots will > be running around and screaming. Mr. Moler doesn't want it to happen.
some of us have been screaming here. r b-j