Hi!

>Do you know how to find the local maxima and minima in an array?

I've solved my problems reading "The Scientist and Engineer's
Guide to Digital Signal Processing".

Now, i've another problem: how to find the local maxima and minima in an
array?

after computed the DFT, i've extracted the spectra of the signal: now, how
to find the first high peak (the fundamental) and the other high peaks (the
armonics) from the array-output of the DFT?

thanks in advance

Richard Dobson wrote:
> Richard Owlett wrote:
> 
>>
>>> In any case, the database of vowel format frequencies is independent 
>>> of the spoken/sung pitch.
>>
>>
>> Can you suggest a page that discusses that?
>>
> 
> The best I can find on a quick Google is:
> 
> http://www2.sfu.ca/sonic-studio/handbook/Formant.html
> 
> You need to look for publications by Johan Sundberg, he did the original 
> research on vocal formants, some time ago now. There is relatively 
> little of his material directly on the net, most is in books, journals.
> If you Google on "formant" + "Sundberg", you should find most of 
> whatever is available.
> 
> Richard Dobson
> 
> 

I added "pitch" to your suggested search and hit paydirt. None of them 
told me everything but with several starting points and their links I 
got a satisfying picture. I hadn't overtly/consciously/??? realized that 
the vocal cords were vibrating an octave below lowest formant. Logical 
in retrospect ;}

Thanks

acid_burn@inwind wrote:
> Hi! I must build a vowel recognizer using the library FFTW:
> analyzing a .wav file, I must retrieve the fundamental and the armonics,
> then compare these with fundamental and armonics of other .wav files
> previously archived to choose the vowel "most" closeness.
>
> i followed these steps:
> - first, I load the samples from the .wav file into an array of
> fftw_complex, using 0.0 as imaginary parts;
> - then, perform a c2c DFT using FFTW_ESTIMATE as flag; the length of the
> DFT is the number of samples (say NS) in the .wav file (in general, this
> number ISN'T power of 2);
> - last, i've got an array of fftw_complex; the length of the array is NS.
>
> now, I must retrieve the fundamental and the armonics from this array.
> how can I interpretate the values of the array? i've read the manual of
> FFTW, but the problem is still unresolved.

Sounds like you may want a quick and dirty approximation rather
than the better techniques previously mentioned in this subject thread.

Do you know how to convert complex data in magnitudes?
Do you know how to find the local maxima and minima in an array?
Do you know what frequency the second bin in your vector
represents?  (you may not want to answer that one :)
Do you know any polynomial interpolation formulas?
Do you know that the frequency with the most energy doesn't
necessarily represent the fundamental pitch?

-- 
rhn A.T nicholson d.0.t C-o-M

Richard Owlett wrote:

> 
>> In any case, the database of vowel format frequencies is independent 
>> of the spoken/sung pitch.
> 
> Can you suggest a page that discusses that?
> 

The best I can find on a quick Google is:

http://www2.sfu.ca/sonic-studio/handbook/Formant.html

You need to look for publications by Johan Sundberg, he did the original 
research on vocal formants, some time ago now. There is relatively 
little of his material directly on the net, most is in books, journals.
If you Google on "formant" + "Sundberg", you should find most of 
whatever is available.

Richard Dobson

Richard Dobson wrote:

> In any case, the database of vowel format 
> frequencies is independent of the spoken/sung pitch.

Can you suggest a page that discusses that?

I explored
http://hyperphysics.phy-astr.gsu.edu/hbase/music/vowel.html .

That site is geared towards pseudo-random exploration.
I'm looking for something more akin to a "guided tour".

acid_burn@inwind wrote:
..
> 
> ok, but how i perform this whit FFTW? can you post a simple pseudo-code to
> do that?
> 

Doing the FFT is just the first stage. Posting pseudo-code that would be 
of any use is more than I can take on right now. I suggest you look at 
the CLAM sources:

http://www.iua.upf.es/mtg/clam/

This has loads of C++ code (using FFTW, but possibly still v2) for 
extracting spectral envelopes, finding peaks, pitch extraction, etc. You 
may find CLAM of interest anyway, it is a widely used library of classes 
for sound analysis and processing, with some very cool GUI tools as well.

Richard Dobson

Hi! 

>Extracting a spectral envelope is in effect a low-pass filtering process

>on a frame of FFT amplitudes (I am used to thinking in terms of the 
>phase vocoder, so these are  the amplitudes calculated with "hypot()" 
>from the raw complex output of the FFT), to find the overall shape of 
>the spectrum, and indeed to ignore small-scale deviations representing 
>individual partials.
>
>Many vowels are  dipthongs, and (for speech especially) are 
>characterised by pitch rises or falls, so one does need to extract the 
>pitch trajectory from the sound as well to identify these. Finding the 
>fundamental is sufficient; but one may prefer to derive this from 
>detected harmonics as FFT resolution is typically better "up there". 
>This in turn implies that one needs to detect the actual (or relative) 
>pitch of a vowel combination, and not to normalise everything to a 
>single reference pitch. In any case, the database of vowel format 
>frequencies is independent of the spoken/sung pitch.

ok, but how i perform this whit FFTW? can you post a simple pseudo-code to
do that?

thank in advance

gianluca

Hi!

>It seems you could benefit from reading a text on DSP. Try

this is right, i understand my lacks in dsp theories, but i've no time to
read entirely a book...

i need to simply undestand how to interpretate the FFTW's output array and
how to extract from there the fundamental and the armonics

i've seen that trasforming all the samples in one step, i have in output a
wave that has all frequencies near the 0-frequency. maybe must to extract a
little subset of pitchs from the recorded wave?

thanks in advance

gianluca

Rune Allnor wrote:
> acid_burn@inwind wrote:
>> Hi! I must build a vowel recognizer using the library FFTW:
>> analyzing a .wav file, I must retrieve the fundamental and the armonics,
>> then compare these with fundamental and armonics of other .wav files
>> previously archived to choose the vowel "most" closeness.
>..
> First, the problem consits of two parts: Pitch and wovel. It is not
> reasonable
> to expect that a given wovel at a high pitch should compare well with
> the
> same wovel at a lower pitch, so the first task would be to normalize
> the signal spectrum. One could, for instance, use some sort of AM
> scheme
> to modulat the detected pitch to some normalized refernce pitch.

This sounds unnecessary. The primary task in vowel recognition is to 
extract and identify the vocal formants which in turn make up the 
spectral envelope, all of which is independent of pitch. For an 
illustration see e.g.:

http://hyperphysics.phy-astr.gsu.edu/hbase/music/vowel.html

Extracting a spectral envelope is in effect a low-pass filtering process 
on a frame of FFT amplitudes (I am used to thinking in terms of the 
phase vocoder, so these are  the amplitudes calculated with "hypot()" 
from the raw complex output of the FFT), to find the overall shape of 
the spectrum, and indeed to ignore small-scale deviations representing 
individual partials.

Many vowels are  dipthongs, and (for speech especially) are 
characterised by pitch rises or falls, so one does need to extract the 
pitch trajectory from the sound as well to identify these. Finding the 
fundamental is sufficient; but one may prefer to derive this from 
detected harmonics as FFT resolution is typically better "up there". 
This in turn implies that one needs to detect the actual (or relative) 
pitch of a vowel combination, and not to normalise everything to a 
single reference pitch. In any case, the database of vowel format 
frequencies is independent of the spoken/sung pitch.

Richard Dobson.