DSPRelated.com
Forums

Hidden markov models implementaion

Started by ahmad June 9, 2005
Sir i have problem using HMM for speech recognition....
My problem statement is that i want to identify the sequence of arabic
phonemes present in the speech file... I have calculated certain
features like MFCC, wavelet, spectrogram etc for the speech file. These
features are in numerical format ( not probabilities ). There are 28
arabic phonemes. I have studied some literature related to HMM, which
require some data as emission matrix, transition matrix, states,
observations/emissions/outcomes....

sir how can i model my problem using HMM. How can i calculate
transition matrix ??? as phoneme can occur in either order in speech
file. There is no particular rule which can tell me about the
probability of a phoneme to be the next one. Similarly how can i use
feature vector in emission matrix as emission marix requires that every
row's probability sum should be 1.

which item is representing my states. ( feature vector or phoneme set )
which item is representing my observations/emissions. ( feature vector
or phoneme set )

If u have some idea to use HMM for my problem, kindly send me your
suggestion in this regard....

ahmad wrote:

> Sir i have problem using HMM for speech recognition.... > My problem statement is that i want to identify the sequence of arabic > phonemes present in the speech file... I have calculated certain > features like MFCC, wavelet, spectrogram etc for the speech file. These > features are in numerical format ( not probabilities ). There are 28 > arabic phonemes. I have studied some literature related to HMM, which > require some data as emission matrix, transition matrix, states, > observations/emissions/outcomes....
OK, so the number of states is 28 (or 29 if you want to include silences).
> sir how can i model my problem using HMM. How can i calculate > transition matrix ??? as phoneme can occur in either order in speech > file. There is no particular rule which can tell me about the > probability of a phoneme to be the next one. Similarly how can i use > feature vector in emission matrix as emission marix requires that every > row's probability sum should be 1.
Google on "Baum-Welch". Any HMM is usually described by the triple (A, B, pi). A contains your state transition probabilities (29 x 29). B contains the probability that state n generates observation m; If your phonemes are ordered so that like ones are adjacent states, then you could model B as a 29 x 29 block-diagonal matrix (the off-diagonal probabilities indicating the probability that an adjacent phoneme is mis-recognised). pi is the initial state probability (set this to 1 in the "silence" state and zero for all 28 phoneme states). Then the Baum-Welch algorithm allows you to "learn" (A, B, pi) given sufficient data.
> which item is representing my states. ( feature vector or phoneme set )
Your phonemes are the unknowns (and unmeasureables). These are your states.
> which item is representing my observations/emissions. ( feature vector > or phoneme set )
The things you can measure are your feature set, so these are "observable" and are your observations.
> If u have some idea to use HMM for my problem, kindly send me your > suggestion in this regard....
I hope this isn't a homework problem. :-) Ciao, Peter K.