Hi,here I am now with a project of speech pause detection. I want to replace the function of an MOTOROLA IC(MC34118) with a microprocessor. Thus I should compare the amplitude of two speech signal, then decide wich is dominating. When both are noise(i.e. non-speech),then processor turn the status of "idle". However,it's difficult to distinguish the status of speech and non-speech timingly with the computation of an 8-bits microprocessor. Could anyone help me. many thanks
speech/non-speech detection
Started by ●October 24, 2005
Reply by ●October 24, 20052005-10-24
Ha - you're kidding! To detect speech from speech is really hard unless you define the speech to be within a zone and calculate time-differences of arrivals into that zone.(the desired speech being within the zone and noise or undesired speech being outside). There is a hugh literature on this and the best ideas use multiple microphones. Naebad
Reply by ●October 24, 20052005-10-24
Reply by ●October 24, 20052005-10-24
"jia" <jia.qinghua@gmail.com> writes:> Hi,here > > I am now with a project of speech pause detection. > > I want to replace the function of an MOTOROLA IC(MC34118) with a > microprocessor. Thus I should compare the amplitude of two speech > signal, then decide wich is dominating. When both are noise(i.e. > non-speech),then processor turn the status of "idle". > > However,it's difficult to distinguish the status of speech and > non-speech timingly with the computation of an 8-bits microprocessor. > > Could anyone help me. > many thanksYou might get some leads if you try searching on the keywords "voice activity detector," or the corresponding acronym "VAD." -- % Randy Yates % "The dreamer, the unwoken fool - %% Fuquay-Varina, NC % in dreams, no pain will kiss the brow..." %%% 919-577-9882 % %%%% <yates@ieee.org> % 'Eldorado Overture', *Eldorado*, ELO http://home.earthlink.net/~yatescr
Reply by ●October 24, 20052005-10-24
Randy Yates Thank you for yr reply. Someone also recommended to try the algo. of VAD. But I feel ...it is "waste the algorithm's talent on a petty job". I will try.
Reply by ●October 25, 20052005-10-25
jia schrieb: [...]> signal, then decide wich is dominating. When both are noise(i.e. > non-speech),then processor turn the status of "idle".Noise is an inevitable part of speech so this strategy may not work well. Thomas
Reply by ●October 25, 20052005-10-25
Dr. Thomas Radtke wrote:> jia schrieb: > > [...] > > > signal, then decide wich is dominating. When both are noise(i.e. > > non-speech),then processor turn the status of "idle". > > Noise is an inevitable part of speech so this strategy may not work well.Just curious here, but if the problem is defined as "separation of speech-like signals from noises that aren't speech at all", then speech (in English at least) has a pattern of alternating voiced consonants, unvoiced consonants, and gaps. Without being an expert in VAD or any other specialist algorithms, I would have thought that after some sort of segmentation of the original signal a Markov model should be able to recognise speech-like signals from non-speech. Cheers, Ross-c
Reply by ●October 25, 20052005-10-25
clemenr@wmin.ac.uk schrieb:> Dr. Thomas Radtke wrote: > >>jia schrieb: >> >>[...] >> >>>signal, then decide wich is dominating. When both are noise(i.e. >>>non-speech),then processor turn the status of "idle". >> >>Noise is an inevitable part of speech so this strategy may not work well. > > Just curious here, but if the problem is defined as "separation of > speech-like signals from noises that aren't speech at all", then speech > (in English at least) has a pattern of alternating voiced consonants, > unvoiced consonants, and gaps. Without being an expert in VAD or any > other specialist algorithms, I would have thought that after some sort > of segmentation of the original signal a Markov model should be able to > recognise speech-like signals from non-speech.I would think so too. In *certain* cases, you would probably not even need a sophisticated segmentation, just check for a small enough SNR for about 0.5s or so. Other cases might require an ANN or similar. I was just commenting jia's 'noise(i.e. non-speech)'. Was there a touch of offense in your reply or am I too sensible? I'm sure you knew the answer already didn't you. Thomas
Reply by ●October 25, 20052005-10-25
Hi. There was no offense meant in my answer or felt by me. I thought that a Markov Model would work for this application to a reasonable level of accuracy, but wondered if there were more sophisticated approaches in use that would be better than a MM. So my uncertainty wasn't whether a MM would be a reasonably accurate method of identifying voice-like versus non-voicelike signals, but whether there are better approaches that would be preferable. At the time I posted, I didn't know what the VAD algorithm was and was thinking that it identified signals as voice or non-voice, rather than just identifying the presence or absence of an audio signal. As it's difficult to read tone into an email, can you please read the following as friendly academic debate :-) However assuming that by ANN you mean an "artificial neural network" I must say that I'm not a great fan of NNs. Personally I believe that if you want a classifier that is semi "ad-hoc" that it's better to use a Decision Tree induction algorithm as the decision tree created is simple and readable. If a classifier with a more sound theoretical basis is required, then a support vector machine has that. I do admit that the research I've seen does not always suggest that SVMs outperform ANNs on the same problems, but I generally prefer DT and SVM algorithms to ANNs. Comments? Cheers, Ross-c
Reply by ●October 25, 20052005-10-25
clemenr@wmin.ac.uk schrieb:> > As it's difficult to read tone into an email, can you please read the > following as friendly academic debate :-)*lol*, sorry for getting you wrong, that was most probably just the language barrier.> However assuming that by ANN you mean an "artificial neural network" I > must say that I'm not a great fan of NNs. Personally I believe that if > you want a classifier that is semi "ad-hoc" that it's better to use a > Decision Tree induction algorithm as the decision tree created is > simple and readable. If a classifier with a more sound theoretical > basis is required, then a support vector machine has that. I do admit > that the research I've seen does not always suggest that SVMs > outperform ANNs on the same problems, but I generally prefer DT and SVM > algorithms to ANNs.I mentioned ANNs just as an alternative. They currently work quite well for me but I'm far from claiming that they are superior for speech detection. Anyway, methinks Randy already pointed the OP in the right direction. Thomas






