DSPRelated.com
Forums

speech/non-speech detection

Started by jia October 24, 2005
Hi,here

I am now with a project of speech pause detection.

I want to replace the function of an MOTOROLA IC(MC34118) with a
microprocessor. Thus I should compare the amplitude of two speech
signal, then decide wich is dominating. When both are noise(i.e.
non-speech),then processor turn the status of "idle".

However,it's difficult to distinguish the status of speech and
non-speech timingly with the computation of an 8-bits microprocessor.

Could anyone help me.
many thanks

Ha - you're kidding! To detect speech from speech is really hard unless
you define the speech to be within a zone and calculate
time-differences of arrivals into that zone.(the desired speech being
within the zone and noise or undesired speech being outside). There is
a hugh literature on this and the best ideas use multiple microphones.

Naebad

Hi,Naebad

What is the concept of "zone", referring to amplitude, spectrum range,
or anything else?

"jia" <jia.qinghua@gmail.com> writes:

> Hi,here > > I am now with a project of speech pause detection. > > I want to replace the function of an MOTOROLA IC(MC34118) with a > microprocessor. Thus I should compare the amplitude of two speech > signal, then decide wich is dominating. When both are noise(i.e. > non-speech),then processor turn the status of "idle". > > However,it's difficult to distinguish the status of speech and > non-speech timingly with the computation of an 8-bits microprocessor. > > Could anyone help me. > many thanks
You might get some leads if you try searching on the keywords "voice activity detector," or the corresponding acronym "VAD." -- % Randy Yates % "The dreamer, the unwoken fool - %% Fuquay-Varina, NC % in dreams, no pain will kiss the brow..." %%% 919-577-9882 % %%%% <yates@ieee.org> % 'Eldorado Overture', *Eldorado*, ELO http://home.earthlink.net/~yatescr
Randy Yates

Thank you for yr reply. Someone also recommended to try the algo. of
VAD. But I feel ...it is "waste the algorithm's talent on a petty job".
I will try.

jia schrieb:

[...]

> signal, then decide wich is dominating. When both are noise(i.e. > non-speech),then processor turn the status of "idle".
Noise is an inevitable part of speech so this strategy may not work well. Thomas
Dr. Thomas Radtke wrote:
> jia schrieb: > > [...] > > > signal, then decide wich is dominating. When both are noise(i.e. > > non-speech),then processor turn the status of "idle". > > Noise is an inevitable part of speech so this strategy may not work well.
Just curious here, but if the problem is defined as "separation of speech-like signals from noises that aren't speech at all", then speech (in English at least) has a pattern of alternating voiced consonants, unvoiced consonants, and gaps. Without being an expert in VAD or any other specialist algorithms, I would have thought that after some sort of segmentation of the original signal a Markov model should be able to recognise speech-like signals from non-speech. Cheers, Ross-c
clemenr@wmin.ac.uk schrieb:
> Dr. Thomas Radtke wrote: > >>jia schrieb: >> >>[...] >> >>>signal, then decide wich is dominating. When both are noise(i.e. >>>non-speech),then processor turn the status of "idle". >> >>Noise is an inevitable part of speech so this strategy may not work well. > > Just curious here, but if the problem is defined as "separation of > speech-like signals from noises that aren't speech at all", then speech > (in English at least) has a pattern of alternating voiced consonants, > unvoiced consonants, and gaps. Without being an expert in VAD or any > other specialist algorithms, I would have thought that after some sort > of segmentation of the original signal a Markov model should be able to > recognise speech-like signals from non-speech.
I would think so too. In *certain* cases, you would probably not even need a sophisticated segmentation, just check for a small enough SNR for about 0.5s or so. Other cases might require an ANN or similar. I was just commenting jia's 'noise(i.e. non-speech)'. Was there a touch of offense in your reply or am I too sensible? I'm sure you knew the answer already didn't you. Thomas
Hi. There was no offense meant in my answer or felt by me. I thought
that a Markov Model would work for this application to a reasonable
level of accuracy, but wondered if there were more sophisticated
approaches in use that would be better than a MM. So my uncertainty
wasn't whether a MM would be a reasonably accurate method of
identifying voice-like versus non-voicelike signals, but whether there
are better approaches that would be preferable. At the time I posted, I
didn't know what the VAD algorithm was and was thinking that it
identified signals as voice or non-voice, rather than just identifying
the presence or absence of an audio signal.

As it's difficult to read tone into an email, can you please read the
following as friendly academic debate :-)

However assuming that by ANN you mean an "artificial neural network" I
must say that I'm not a great fan of NNs. Personally I believe that if
you want a classifier that is semi "ad-hoc" that it's better to use a
Decision Tree induction algorithm as the decision tree created is
simple and readable. If a classifier with a more sound theoretical
basis is required, then a support vector machine has that. I do admit
that the research I've seen does not always suggest that SVMs
outperform ANNs on the same problems, but I generally prefer DT and SVM
algorithms to ANNs.

Comments?

Cheers,

Ross-c

clemenr@wmin.ac.uk schrieb:
> > As it's difficult to read tone into an email, can you please read the > following as friendly academic debate :-)
*lol*, sorry for getting you wrong, that was most probably just the language barrier.
> However assuming that by ANN you mean an "artificial neural network" I > must say that I'm not a great fan of NNs. Personally I believe that if > you want a classifier that is semi "ad-hoc" that it's better to use a > Decision Tree induction algorithm as the decision tree created is > simple and readable. If a classifier with a more sound theoretical > basis is required, then a support vector machine has that. I do admit > that the research I've seen does not always suggest that SVMs > outperform ANNs on the same problems, but I generally prefer DT and SVM > algorithms to ANNs.
I mentioned ANNs just as an alternative. They currently work quite well for me but I'm far from claiming that they are superior for speech detection. Anyway, methinks Randy already pointed the OP in the right direction. Thomas