Sign in

username or email:

password:



Not a member?
Forgot your password?

Search speech-recognition



Search tips

Subscribe to speech-recognition



Ads

Discussion Groups

See Also

Embedded SystemsFPGA

Discussion Groups | Speech Recognition | Issue implementing Energy threshold algorithm for Voice Activity Detection

Hi allI am trying to implement the energy threshold algorithm for voice activity detection and not getting meaningful values for energy for frames of size wL.wL = 1784 // about 40 ms (const double decay_constant = 0...

  

Post a new Thread



Is this thread worth a thumbs up?

0

Issue implementing Energy threshold algorithm for Voice Activity Detection - kash...@gmail.com - May 31 6:29:03 2011

Hi all

I am trying to implement the energy threshold algorithm for voice activity
detection and not getting meaningful values for energy for frames of size wL.

wL = 1784 // about 40 ms (
const double decay_constant = 0.90 // some optimal value between 0 and 1
double prevrms = 1.0  // avoid DivideByZero
double threshold = some optimal value after some experimentation

for (int i = 0; i < noSamples ; i += wL)
{
   for (int j = 0; j < wL; j++)
   {
     // Exponential decay
     total = total * decay_constant;
     total += (audioSample[j] * audioSample[j]); // sum of squares
   }
                
   double mean = total / wL;
   double rms = Math.Round(Math.Sqrt(mean),2); // root mean sqare
   double prevrms = 1.0;

   if(rms/prevrms > threshold)
   {
      // voice detected
   }

   prevrms = rms;
   rms = 0.0;
}

Please advise what is wrong with the above implementation as rms computed for
every frame is calculated as 0.19.

The other issue is speed as it took about 30 minutes to execute the above.
Currently implemented as O(n2). Working with offline data so not a big deal as
achieving a accuracy is the main objective ut any suggestions to improve
efficiency would be highly appreciated.

Also, would you recommend using other factors like auto-correlation,
zero-crossing rate or energy alone be sufficient.

Following is the summary of the WAV file (only considering clean conversational
speech) i am using:

// WAV file information
Sampling Frequency: 44100     Bits Per Sample:  16 
Channels: 2    nBlockAlign: 4   wavdata size: 557941248 bytes
Duration: 3162.932 sec    Samples: 139485312    Time between samples: 0.0227 ms
Byte position at start of samples: 44 bytes  (0x2C)

Chosen first sample to display:  1   (0.000 ms)
Chosen end  sample to display:  1784   (40.431 ms)

16 bit max possible value is:  32767  (0x7FFF)
16 bit min possible value is: -32768  (0x8000)

Regards,

Vineet

______________________________
New Code Sharing Section now Live on DSPRelated.com. Learn about the Reward Program for Contributors here.



(You need to be a member of speech-recognition -- send a blank email to speech-recognition-subscribe@yahoogroups.com )