The most popular schemes used for speech recognition currently are Hidden Markov models (HMM). In addition for a simple task like isolated word, small vocabulary speech maybe Dynamic time warping (DTW) can also be used. However the accuracy of DTW will be much lesser than HMM. A good reference book is Fundamentals of speech recognition by Rabiner and Juang or the paper by Rabiner L. Rabiner. A tutorial on Hidden Markov Models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257-286, 1989 Best, Naveen --- In speech-recognition@y..., "Renato D'Antonio" <rd@d...> wrote: > From: Dr. Renato D'Antonio, President, DSP Global, 33 Plan Way, Bldg.4, > Warwick, RI 02886, 401-737-9900, Toll Free: 800-437-3282 > Fax: 401-739-4197, Web: > <www.dspglobal.com>, e-mail: > <rd@d...> > > Hi all, > What is the answer to his question, "What is the simple speech recognition > method for > > > isolated word, small vocabulary speech (not speaker) > > > recognition system?". Is there a simple answer to this question or a > referral to some easy reading paper or book? > Thanks, > Renato > > ----- Original Message ----- > From: simha j <simha_aj@y...> > To: Sanjay Patil <yashorath@y...>; <speech-recognition@y...> > Sent: Thursday, January 17, 2002 11:23 PM > Subject: Re: [speech-recognition] few queries! > > --- Sanjay Patil <yashorath@y...> wrote: > > > 1. when a raw PCM data file is considered, what is > > > usually sampling frequency for the same? > > > > If the data file corresponds to Telephonic speech > > it is fs=8khz. > > If it is a audio file then fs is 44khz > > so fs purely depends on the bandwidth of the i/p > > signal. > > > > > > > > > 2. How many bits should be used to sample and > > > digitise > > > the speech signals, 8 bits or 16 bits. Purpose is to > > > have speech recognition done on the samples. > > > > To reconstruct the speech at the Rx end, speech > > samples must be coded with at least 14bits ( or 16bits > > with zero pads) at the Tx end. > > Also for these 16bit data a 1st level of compression > > can be given through A-law or mu-law which reduces > > 16bit to 8bit. But remember that these 8bit data must > > be expanded back to 16bits before it is played in any > > player. > > Using Goldwave s/w you can do lot of experiments on > > speech signals. Try to get it from net. > > > > > > > 3. What is the simple speech recognition method for > > > isolated word, small vocabulary speech (not speaker) > > > recognition system? > > > > refer 'digital processing of speech signals ' by > > rabinar & Gold. > > > > > Please help and mail at the earliest! > > > yashorath > > > > > > > > > > > > > > > > > _____________________________________ > > Note: If you do a simple "reply" with your email client, only the author > of this message will receive your answer. You need to do a "reply all" if > you want your answer to be distributed to the entire group. > > > > _____________________________________ > > About this discussion group: > > > > To Join: speech-recognition-subscribe@y... > > > > To Post: speech-recognition@y... > > > > To Leave: speech-recognition-unsubscribe@y... > > > > Archives: http://www.yahoogroups.com/group/speech-recognition > > > > Other DSP-Related Groups: http://www.dsprelated.com > > > > ">http://docs.yahoo.com/info/terms/ > > > >

From: Dr. Renato D'Antonio, President, DSP Global, 33 Plan Way, Bldg.4, Warwick, RI 02886, 401-737-9900, Toll Free: 800-437-3282 Fax: 401-739-4197, Web: <www.dspglobal.com>, e-mail: <> Hi all, What is the answer to his question, "What is the simple speech recognition method for > > isolated word, small vocabulary speech (not speaker) > > recognition system?". Is there a simple answer to this question or a referral to some easy reading paper or book? Thanks, Renato ----- Original Message ----- From: simha j <> To: Sanjay Patil <>; <> Sent: Thursday, January 17, 2002 11:23 PM Subject: Re: [speech-recognition] few queries! > --- Sanjay Patil <> wrote: > > 1. when a raw PCM data file is considered, what is > > usually sampling frequency for the same? > > If the data file corresponds to Telephonic speech > it is fs=8khz. > If it is a audio file then fs is 44khz > so fs purely depends on the bandwidth of the i/p > signal. > > > 2. How many bits should be used to sample and > > digitise > > the speech signals, 8 bits or 16 bits. Purpose is to > > have speech recognition done on the samples. > > To reconstruct the speech at the Rx end, speech > samples must be coded with at least 14bits ( or 16bits > with zero pads) at the Tx end. > Also for these 16bit data a 1st level of compression > can be given through A-law or mu-law which reduces > 16bit to 8bit. But remember that these 8bit data must > be expanded back to 16bits before it is played in any > player. > Using Goldwave s/w you can do lot of experiments on > speech signals. Try to get it from net. > > 3. What is the simple speech recognition method for > > isolated word, small vocabulary speech (not speaker) > > recognition system? > > refer 'digital processing of speech signals ' by > rabinar & Gold. > > > Please help and mail at the earliest! > > yashorath > > > > > > > _____________________________________ > Note: If you do a simple "reply" with your email client, only the author of this message will receive your answer. You need to do a "reply all" if you want your answer to be distributed to the entire group. > > _____________________________________ > About this discussion group: > > To Join: > > To Post: > > To Leave: > > Archives: http://www.yahoogroups.com/group/speech-recognition > > Other DSP-Related Groups: http://www.dsprelated.com > > ">http://docs.yahoo.com/info/terms/

--- Sanjay Patil <> wrote: > 1. when a raw PCM data file is considered, what is > usually sampling frequency for the same? If the data file corresponds to Telephonic speech it is fs=8khz. If it is a audio file then fs is 44khz so fs purely depends on the bandwidth of the i/p signal. > 2. How many bits should be used to sample and > digitise > the speech signals, 8 bits or 16 bits. Purpose is to > have speech recognition done on the samples. To reconstruct the speech at the Rx end, speech samples must be coded with at least 14bits ( or 16bits with zero pads) at the Tx end. Also for these 16bit data a 1st level of compression can be given through A-law or mu-law which reduces 16bit to 8bit. But remember that these 8bit data must be expanded back to 16bits before it is played in any player. Using Goldwave s/w you can do lot of experiments on speech signals. Try to get it from net. > 3. What is the simple speech recognition method for > isolated word, small vocabulary speech (not speaker) > recognition system? refer 'digital processing of speech signals ' by rabinar & Gold. > Please help and mail at the earliest! > yashorath >