Technical discussions about the implementation and research of speech recognition algorithms.
Hi All. I'm bothered by the following in the process of implementing a speaker independent ASR system. Say i have say 6 word utterances(6 speakers) per word to be recognized. I wish to compute the distance between an incoming speech sample and these 12-coefficient template vectors. How will i go about computing the distances, since speech is going to be more than one frame, in fact several frames in duration and considering the fact that these utterances are likely to be of unequal duration. Am afraid i have not been able to get this info from available documentation, as everybody talks as if you only need to use a single feature vector( that is for one frame only), for the necessary computations and decision. Kindly help me resolve this confusion. Thanks ______________________________ New Year Gift for Members of DSPRelated.com. Details here.