Hi!
local maxima and minima in an array really means peak detection.
where can i find an algorithm to perform peak detection?
thanks in advance
Reply by acid...@inwind●March 26, 20062006-03-26
Hi!
>Do you know how to find the local maxima and minima in an array?
I've solved my problems reading "The Scientist and Engineer's
Guide to Digital Signal Processing".
Now, i've another problem: how to find the local maxima and minima in an
array?
after computed the DFT, i've extracted the spectra of the signal: now, how
to find the first high peak (the fundamental) and the other high peaks (the
armonics) from the array-output of the DFT?
thanks in advance
Reply by Richard Owlett●March 23, 20062006-03-23
Richard Dobson wrote:
> Richard Owlett wrote:
>
>>
>>> In any case, the database of vowel format frequencies is independent
>>> of the spoken/sung pitch.
>>
>>
>> Can you suggest a page that discusses that?
>>
>
> The best I can find on a quick Google is:
>
> http://www2.sfu.ca/sonic-studio/handbook/Formant.html
>
> You need to look for publications by Johan Sundberg, he did the original
> research on vocal formants, some time ago now. There is relatively
> little of his material directly on the net, most is in books, journals.
> If you Google on "formant" + "Sundberg", you should find most of
> whatever is available.
>
> Richard Dobson
>
>
I added "pitch" to your suggested search and hit paydirt. None of them
told me everything but with several starting points and their links I
got a satisfying picture. I hadn't overtly/consciously/??? realized that
the vocal cords were vibrating an octave below lowest formant. Logical
in retrospect ;}
Thanks
Reply by Ron N.●March 23, 20062006-03-23
acid_burn@inwind wrote:
> Hi! I must build a vowel recognizer using the library FFTW:
> analyzing a .wav file, I must retrieve the fundamental and the armonics,
> then compare these with fundamental and armonics of other .wav files
> previously archived to choose the vowel "most" closeness.
>
> i followed these steps:
> - first, I load the samples from the .wav file into an array of
> fftw_complex, using 0.0 as imaginary parts;
> - then, perform a c2c DFT using FFTW_ESTIMATE as flag; the length of the
> DFT is the number of samples (say NS) in the .wav file (in general, this
> number ISN'T power of 2);
> - last, i've got an array of fftw_complex; the length of the array is NS.
>
> now, I must retrieve the fundamental and the armonics from this array.
> how can I interpretate the values of the array? i've read the manual of
> FFTW, but the problem is still unresolved.
Sounds like you may want a quick and dirty approximation rather
than the better techniques previously mentioned in this subject thread.
Do you know how to convert complex data in magnitudes?
Do you know how to find the local maxima and minima in an array?
Do you know what frequency the second bin in your vector
represents? (you may not want to answer that one :)
Do you know any polynomial interpolation formulas?
Do you know that the frequency with the most energy doesn't
necessarily represent the fundamental pitch?
--
rhn A.T nicholson d.0.t C-o-M
Reply by Richard Dobson●March 23, 20062006-03-23
Richard Owlett wrote:
>
>> In any case, the database of vowel format frequencies is independent
>> of the spoken/sung pitch.
>
> Can you suggest a page that discusses that?
>
The best I can find on a quick Google is:
http://www2.sfu.ca/sonic-studio/handbook/Formant.html
You need to look for publications by Johan Sundberg, he did the original
research on vocal formants, some time ago now. There is relatively
little of his material directly on the net, most is in books, journals.
If you Google on "formant" + "Sundberg", you should find most of
whatever is available.
Richard Dobson
Reply by Richard Owlett●March 23, 20062006-03-23
Richard Dobson wrote:
> In any case, the database of vowel format
> frequencies is independent of the spoken/sung pitch.
>
> ok, but how i perform this whit FFTW? can you post a simple pseudo-code to
> do that?
>
Doing the FFT is just the first stage. Posting pseudo-code that would be
of any use is more than I can take on right now. I suggest you look at
the CLAM sources:
http://www.iua.upf.es/mtg/clam/
This has loads of C++ code (using FFTW, but possibly still v2) for
extracting spectral envelopes, finding peaks, pitch extraction, etc. You
may find CLAM of interest anyway, it is a widely used library of classes
for sound analysis and processing, with some very cool GUI tools as well.
Richard Dobson
Reply by acid...@inwind●March 22, 20062006-03-22
Hi!
>Extracting a spectral envelope is in effect a low-pass filtering process
>on a frame of FFT amplitudes (I am used to thinking in terms of the
>phase vocoder, so these are the amplitudes calculated with "hypot()"
>from the raw complex output of the FFT), to find the overall shape of
>the spectrum, and indeed to ignore small-scale deviations representing
>individual partials.
>
>Many vowels are dipthongs, and (for speech especially) are
>characterised by pitch rises or falls, so one does need to extract the
>pitch trajectory from the sound as well to identify these. Finding the
>fundamental is sufficient; but one may prefer to derive this from
>detected harmonics as FFT resolution is typically better "up there".
>This in turn implies that one needs to detect the actual (or relative)
>pitch of a vowel combination, and not to normalise everything to a
>single reference pitch. In any case, the database of vowel format
>frequencies is independent of the spoken/sung pitch.
ok, but how i perform this whit FFTW? can you post a simple pseudo-code to
do that?
thank in advance
gianluca
Reply by acid...@inwind●March 22, 20062006-03-22
Hi!
>It seems you could benefit from reading a text on DSP. Try
this is right, i understand my lacks in dsp theories, but i've no time to
read entirely a book...
i need to simply undestand how to interpretate the FFTW's output array and
how to extract from there the fundamental and the armonics
i've seen that trasforming all the samples in one step, i have in output a
wave that has all frequencies near the 0-frequency. maybe must to extract a
little subset of pitchs from the recorded wave?
thanks in advance
gianluca
Reply by Richard Dobson●March 22, 20062006-03-22
Rune Allnor wrote:
> acid_burn@inwind wrote:
>> Hi! I must build a vowel recognizer using the library FFTW:
>> analyzing a .wav file, I must retrieve the fundamental and the armonics,
>> then compare these with fundamental and armonics of other .wav files
>> previously archived to choose the vowel "most" closeness.
>..
> First, the problem consits of two parts: Pitch and wovel. It is not
> reasonable
> to expect that a given wovel at a high pitch should compare well with
> the
> same wovel at a lower pitch, so the first task would be to normalize
> the signal spectrum. One could, for instance, use some sort of AM
> scheme
> to modulat the detected pitch to some normalized refernce pitch.
This sounds unnecessary. The primary task in vowel recognition is to
extract and identify the vocal formants which in turn make up the
spectral envelope, all of which is independent of pitch. For an
illustration see e.g.:
http://hyperphysics.phy-astr.gsu.edu/hbase/music/vowel.html
Extracting a spectral envelope is in effect a low-pass filtering process
on a frame of FFT amplitudes (I am used to thinking in terms of the
phase vocoder, so these are the amplitudes calculated with "hypot()"
from the raw complex output of the FFT), to find the overall shape of
the spectrum, and indeed to ignore small-scale deviations representing
individual partials.
Many vowels are dipthongs, and (for speech especially) are
characterised by pitch rises or falls, so one does need to extract the
pitch trajectory from the sound as well to identify these. Finding the
fundamental is sufficient; but one may prefer to derive this from
detected harmonics as FFT resolution is typically better "up there".
This in turn implies that one needs to detect the actual (or relative)
pitch of a vowel combination, and not to normalise everything to a
single reference pitch. In any case, the database of vowel format
frequencies is independent of the spoken/sung pitch.
Richard Dobson.