Hey!

You haven't mentioned which approach does your speech recognizer take; if its DTW, NN or HMM.

If its HMM, you can model the noisy portions as a state (in your word lattice if you have one, or just fit stick it in appropriately). I'm sure you can incorporate the same in the other approaches as well.

However, silence/noisy model estimation is unreliable (as the noise environment might keep changing : babble noise, cocktail party env.) so you would want to use a noise canceller as a preprocessor.

Again, if you are working with a clean speech environment, forget about all these and just introduce a silence model :)

Regards,
Abhilash.

i m doing a project on speech recognition.
1) i've tried to determine the end-points of connected words and get the separate words. but the prob is that it is not removing complete silence portion. And in some cases it is considering even the noise portion as a seperate word.
2) What if we dont separate the words and take MFCCs of both words together as one matrix. will that be ok?
Cn any1 help.