Hi Jeff/Brant

Thanks a lot for your suggestions. I will revise the algorithm and post my

findings.

I also found another major flaw in my code where my second for loop *for

(int j = 0; j < wL; j++)* was not setup correctly which is why i was getting

the same value as it was going over the same sample values over and over

again.

Basically, the second for loop should be something like this:

*for(j = i; j <= i + wL ;j++)

*

*Machine Information*

System Type x64-based PC

Processor Pentium(R) Dual-Core CPU T4400 @ 2.20GHz, 2200 Mhz, 2

Core(s), 2 Logical Processor(s)

Installed Physical Memory (RAM) 4.00 GB

I have never used OpenMP but have knowledge using MPI. I will work on this

and update the post.

Regards

Vineet

On Wed, Jun 1, 2011 at 9:34 AM, Jeff Brower wrote:

> Vineet-

>

> > I am trying to implement the energy threshold algorithm for

> > voice activity detection and not getting meaningful values

> > for energy for frames of size wL.

> >

> > wL = 1784 // about 40 ms

> > const double decay_constant = 0.90 // some optimal value

> > between 0 and 1

> > double prevrms = 1.0 // avoid DivideByZero

> > double threshold = some optimal value after some experimentation

> >

> > for (int i = 0; i < noSamples ; i += wL)

> > {

> > for (int j = 0; j < wL; j++)

> > {

> > // Exponential decay

> > total = total * decay_constant;

> > total += (audioSample[j] * audioSample[j]); // sum of squares

> > }

> >

> > double mean = total / wL;

> > double rms = Math.Round(Math.Sqrt(mean),2); // root mean sqare

> > double prevrms = 1.0;

> >

> > if(rms/prevrms > threshold)

> > {

> > // voice detected

> > }

> >

> > prevrms = rms;

> > rms = 0.0;

> > }

> >

> > Please advise what is wrong with the above implementation

> > as rms computed for every frame is calculated as 0.19.

>

> I don't know which specific algorithm you're trying to implement, but just

> guessing it may be this:

>

> y[n] = a*x[n] + b*y[n-1]

>

> where a + b = 1. That will give you an exponential decay. In your case,

> you may want x[n] to be abs(x[n]) or

> sqr(x[n]), and try a = 0.1 and b = 0.9.

>

> Your code looks similar, but you have no coefficient for your input term,

> which leads me to guess that "total" in your

> code will not decay, or at least not properly. Unless a + b = 1, then I

> believe you have an unstable situation.

>

> > The other issue is speed as it took about 30 minutes to

> > execute the above. Currently implemented as O(n2). Working

> > with offline data so not a big deal as achieving a

> > accuracy is the main objective ut any suggestions to improve

> > efficiency would be highly appreciated.

>

> 30 minutes for approx 248 bil multiplies... well, could be. What type of

> machine are you using? Your loop can be

> parallelized -- did you try OpenMP?

>

> -Jeff

>

> > Also, would you recommend using other factors like

> > auto-correlation, zero-crossing rate or energy alone be

> > sufficient.

> >

> > Following is the summary of the WAV file (only considering

> > clean conversational speech) i am using:

> >

> > // WAV file information

> > Sampling Frequency: 44100 Bits Per Sample: 16

> > Channels: 2 nBlockAlign: 4 wavdata size: 557941248 bytes

> > Duration: 3162.932 sec Samples: 139485312 Time between samples:

> 0.0227 ms

> > Byte position at start of samples: 44 bytes (0x2C)

> >

> > Chosen first sample to display: 1 (0.000 ms)

> > Chosen end sample to display: 1784 (40.431 ms)

> >

> > 16 bit max possible value is: 32767 (0x7FFF)

> > 16 bit min possible value is: -32768 (0x8000)

> >

> > Regards,

> >

> > Vineet