Reply by uppe...@rice.edu December 1, 20052005-12-01
Hi All -

I am trying to write a simple speaker-independent speech recognition program in C, and I am wondering if using MFCC to recognize the words is the way to go.

I plan to provide the program with a codebook of approximately 20 words spoken by the same speaker, then input a sample of another speaker saying one of these words, and the program will identify which of the words in the codebook was spoken. Can I just compute the 12 mel-cepstral coefficients of a speech sample containing an entire recorded word, or should I break it into small segments first? (If so, what length segments are good?) Would comparing the mel-cepstral coefficients of one word to those of the codebook words (and finding the one that has the least squared-difference, for example) be a viable method for speech recognition? I had tried this with LPC coefficients earlier, but that seems to only work for speaker-dependent speech recognition.

Finally, does anyone know of a simple program in C to calculate the mel-cepstral coefficients, given a speech sample?

I am a beginner to speech processing, so any help would be greatly appreciated.

Thank you very much,
Gina Upperman