Sir i have problem using HMM for speech recognition.... My problem statement is that i want to identify the sequence of arabic phonemes present in the speech file... I have calculated certain features like MFCC, wavelet, spectrogram etc for the speech file. These features are in numerical format ( not probabilities ). There are 28 arabic phonemes. I have studied some literature related to HMM, which require some data as emission matrix, transition matrix, states, observations/emissions/outcomes.... sir how can i model my problem using HMM. How can i calculate transition matrix ??? as phoneme can occur in either order in speech file. There is no particular rule which can tell me about the probability of a phoneme to be the next one. Similarly how can i use feature vector in emission matrix as emission marix requires that every row's probability sum should be 1. which item is representing my states. ( feature vector or phoneme set ) which item is representing my observations/emissions. ( feature vector or phoneme set ) If u have some idea to use HMM for my problem, kindly send me your suggestion in this regard....
Hidden markov models implementaion
Started by ●June 9, 2005
Reply by ●June 9, 20052005-06-09
ahmad wrote:> Sir i have problem using HMM for speech recognition.... > My problem statement is that i want to identify the sequence of arabic > phonemes present in the speech file... I have calculated certain > features like MFCC, wavelet, spectrogram etc for the speech file. These > features are in numerical format ( not probabilities ). There are 28 > arabic phonemes. I have studied some literature related to HMM, which > require some data as emission matrix, transition matrix, states, > observations/emissions/outcomes....OK, so the number of states is 28 (or 29 if you want to include silences).> sir how can i model my problem using HMM. How can i calculate > transition matrix ??? as phoneme can occur in either order in speech > file. There is no particular rule which can tell me about the > probability of a phoneme to be the next one. Similarly how can i use > feature vector in emission matrix as emission marix requires that every > row's probability sum should be 1.Google on "Baum-Welch". Any HMM is usually described by the triple (A, B, pi). A contains your state transition probabilities (29 x 29). B contains the probability that state n generates observation m; If your phonemes are ordered so that like ones are adjacent states, then you could model B as a 29 x 29 block-diagonal matrix (the off-diagonal probabilities indicating the probability that an adjacent phoneme is mis-recognised). pi is the initial state probability (set this to 1 in the "silence" state and zero for all 28 phoneme states). Then the Baum-Welch algorithm allows you to "learn" (A, B, pi) given sufficient data.> which item is representing my states. ( feature vector or phoneme set )Your phonemes are the unknowns (and unmeasureables). These are your states.> which item is representing my observations/emissions. ( feature vector > or phoneme set )The things you can measure are your feature set, so these are "observable" and are your observations.> If u have some idea to use HMM for my problem, kindly send me your > suggestion in this regard....I hope this isn't a homework problem. :-) Ciao, Peter K.