Reply by acid...@inwind March 26, 20062006-03-26
Hi!

local maxima and minima in an array really means peak detection.

where can i find an algorithm to perform peak detection?

thanks in advance
Reply by acid...@inwind March 26, 20062006-03-26
Hi!

>Do you know how to find the local maxima and minima in an array?
I've solved my problems reading "The Scientist and Engineer's Guide to Digital Signal Processing". Now, i've another problem: how to find the local maxima and minima in an array? after computed the DFT, i've extracted the spectra of the signal: now, how to find the first high peak (the fundamental) and the other high peaks (the armonics) from the array-output of the DFT? thanks in advance
Reply by Richard Owlett March 23, 20062006-03-23
Richard Dobson wrote:
> Richard Owlett wrote: > >> >>> In any case, the database of vowel format frequencies is independent >>> of the spoken/sung pitch. >> >> >> Can you suggest a page that discusses that? >> > > The best I can find on a quick Google is: > > http://www2.sfu.ca/sonic-studio/handbook/Formant.html > > You need to look for publications by Johan Sundberg, he did the original > research on vocal formants, some time ago now. There is relatively > little of his material directly on the net, most is in books, journals. > If you Google on "formant" + "Sundberg", you should find most of > whatever is available. > > Richard Dobson > >
I added "pitch" to your suggested search and hit paydirt. None of them told me everything but with several starting points and their links I got a satisfying picture. I hadn't overtly/consciously/??? realized that the vocal cords were vibrating an octave below lowest formant. Logical in retrospect ;} Thanks
Reply by Ron N. March 23, 20062006-03-23
acid_burn@inwind wrote:
> Hi! I must build a vowel recognizer using the library FFTW: > analyzing a .wav file, I must retrieve the fundamental and the armonics, > then compare these with fundamental and armonics of other .wav files > previously archived to choose the vowel "most" closeness. > > i followed these steps: > - first, I load the samples from the .wav file into an array of > fftw_complex, using 0.0 as imaginary parts; > - then, perform a c2c DFT using FFTW_ESTIMATE as flag; the length of the > DFT is the number of samples (say NS) in the .wav file (in general, this > number ISN'T power of 2); > - last, i've got an array of fftw_complex; the length of the array is NS. > > now, I must retrieve the fundamental and the armonics from this array. > how can I interpretate the values of the array? i've read the manual of > FFTW, but the problem is still unresolved.
Sounds like you may want a quick and dirty approximation rather than the better techniques previously mentioned in this subject thread. Do you know how to convert complex data in magnitudes? Do you know how to find the local maxima and minima in an array? Do you know what frequency the second bin in your vector represents? (you may not want to answer that one :) Do you know any polynomial interpolation formulas? Do you know that the frequency with the most energy doesn't necessarily represent the fundamental pitch? -- rhn A.T nicholson d.0.t C-o-M
Reply by Richard Dobson March 23, 20062006-03-23
Richard Owlett wrote:

> >> In any case, the database of vowel format frequencies is independent >> of the spoken/sung pitch. > > Can you suggest a page that discusses that? >
The best I can find on a quick Google is: http://www2.sfu.ca/sonic-studio/handbook/Formant.html You need to look for publications by Johan Sundberg, he did the original research on vocal formants, some time ago now. There is relatively little of his material directly on the net, most is in books, journals. If you Google on "formant" + "Sundberg", you should find most of whatever is available. Richard Dobson
Reply by Richard Owlett March 23, 20062006-03-23
Richard Dobson wrote:

> In any case, the database of vowel format > frequencies is independent of the spoken/sung pitch.
Can you suggest a page that discusses that? I explored http://hyperphysics.phy-astr.gsu.edu/hbase/music/vowel.html . That site is geared towards pseudo-random exploration. I'm looking for something more akin to a "guided tour".
Reply by Richard Dobson March 22, 20062006-03-22
acid_burn@inwind wrote:
..
> > ok, but how i perform this whit FFTW? can you post a simple pseudo-code to > do that? >
Doing the FFT is just the first stage. Posting pseudo-code that would be of any use is more than I can take on right now. I suggest you look at the CLAM sources: http://www.iua.upf.es/mtg/clam/ This has loads of C++ code (using FFTW, but possibly still v2) for extracting spectral envelopes, finding peaks, pitch extraction, etc. You may find CLAM of interest anyway, it is a widely used library of classes for sound analysis and processing, with some very cool GUI tools as well. Richard Dobson
Reply by acid...@inwind March 22, 20062006-03-22
Hi! 

>Extracting a spectral envelope is in effect a low-pass filtering process
>on a frame of FFT amplitudes (I am used to thinking in terms of the >phase vocoder, so these are the amplitudes calculated with "hypot()" >from the raw complex output of the FFT), to find the overall shape of >the spectrum, and indeed to ignore small-scale deviations representing >individual partials. > >Many vowels are dipthongs, and (for speech especially) are >characterised by pitch rises or falls, so one does need to extract the >pitch trajectory from the sound as well to identify these. Finding the >fundamental is sufficient; but one may prefer to derive this from >detected harmonics as FFT resolution is typically better "up there". >This in turn implies that one needs to detect the actual (or relative) >pitch of a vowel combination, and not to normalise everything to a >single reference pitch. In any case, the database of vowel format >frequencies is independent of the spoken/sung pitch.
ok, but how i perform this whit FFTW? can you post a simple pseudo-code to do that? thank in advance gianluca
Reply by acid...@inwind March 22, 20062006-03-22
Hi!

>It seems you could benefit from reading a text on DSP. Try
this is right, i understand my lacks in dsp theories, but i've no time to read entirely a book... i need to simply undestand how to interpretate the FFTW's output array and how to extract from there the fundamental and the armonics i've seen that trasforming all the samples in one step, i have in output a wave that has all frequencies near the 0-frequency. maybe must to extract a little subset of pitchs from the recorded wave? thanks in advance gianluca
Reply by Richard Dobson March 22, 20062006-03-22
Rune Allnor wrote:
> acid_burn@inwind wrote: >> Hi! I must build a vowel recognizer using the library FFTW: >> analyzing a .wav file, I must retrieve the fundamental and the armonics, >> then compare these with fundamental and armonics of other .wav files >> previously archived to choose the vowel "most" closeness. >.. > First, the problem consits of two parts: Pitch and wovel. It is not > reasonable > to expect that a given wovel at a high pitch should compare well with > the > same wovel at a lower pitch, so the first task would be to normalize > the signal spectrum. One could, for instance, use some sort of AM > scheme > to modulat the detected pitch to some normalized refernce pitch.
This sounds unnecessary. The primary task in vowel recognition is to extract and identify the vocal formants which in turn make up the spectral envelope, all of which is independent of pitch. For an illustration see e.g.: http://hyperphysics.phy-astr.gsu.edu/hbase/music/vowel.html Extracting a spectral envelope is in effect a low-pass filtering process on a frame of FFT amplitudes (I am used to thinking in terms of the phase vocoder, so these are the amplitudes calculated with "hypot()" from the raw complex output of the FFT), to find the overall shape of the spectrum, and indeed to ignore small-scale deviations representing individual partials. Many vowels are dipthongs, and (for speech especially) are characterised by pitch rises or falls, so one does need to extract the pitch trajectory from the sound as well to identify these. Finding the fundamental is sufficient; but one may prefer to derive this from detected harmonics as FFT resolution is typically better "up there". This in turn implies that one needs to detect the actual (or relative) pitch of a vowel combination, and not to normalise everything to a single reference pitch. In any case, the database of vowel format frequencies is independent of the spoken/sung pitch. Richard Dobson.