Hi Jeff -

Thanks for your input. yes, the way I worked around it, is by using FFT
overlap with a hopsize of 180 samples. i.e. a new spectrum output is
generated @245hertz!
Also, dB-normalizing and summing magnitude spectrum across the bins can, for
analysis purposes, be viewed as a (non-time domain) loudness sample;
consolidated over time, it looks like a pretty faithful 'volume/loudness'
curve.

Thanks all, for your suggestions! Helped me view the problem from various
angles.
On Tue, Mar 15, 2011 at 11:31 AM, Jeff Brower wrote:

> Vamsi-
>
> > That doesn't somehow seem right to me. Ok, so here is the process so far.
> > An audio signal at Fs = 44.1KHz is run through an FFT every 512 samples.
> All
> > the frequency components of the FFT frame are collated into one signal
> > sample. Such samples are collected over time, and for our purposes, can
> be
> > called the amplitude envelope of the incoming audio.
>
> The FFT averaging operations you're describing will produce power
> spectrum/density results, not an "amplitude
> envelope", which typically refers to time domain data.
>
> > Thus a new envelope
> > sample is generated at a rate of (44100/512=) 86.13Hz.
> > Now the paper that I am referring to (which if you are interested is
> 'Sound
> > Onset Detection by Applying Psychoacoutic knowledge' by Anssi Klapuri)
> asks
> > to full-wave rectify and decimate by 180
>
> "Full wave rectify" implies a time-domain operation -- it would make no
> sense for frequency domain magnitude data. So
> I assume that "decimate by 180" means to reduce the effective sampling rate
> of the amplitude envelope to 245 Hz.
>
> in order to remove much of
> > the instability in the envelope signal. This is where I needed the
> > clarification. I understand decimating by small factors such as 2,4 etc
> in
> > order to perform the low-pass filtering. But I was wondering if 180 was
> > possibly specified with some phase metric in mind.
> > As you suggested, reducing the rate to 0.47Hz, might be the way to
> approach
> > it, but it doesn't necessarily feel right to me, as this would mean
> losing a
> > lot of data as well as timing resolution/accuracy.
>
> My guess at this point is that your time and frequency domain operations
> are supposed to produce separate results
> which are then looked at in some decision making process, i.e. something
> like this:
> 44.1 kHz
> audio data
> |
> _________|__________
> | |
> | |
> amplitude envelope FFT mag averaging
> LPF, decimate by 180 (power spectrum)
> | |
> |____________________|
> |
> |
> combine data,
> make estimation
> or decision
>
> -Jeff
>
> > On Sun, Feb 27, 2011 at 10:45 PM, Jeff Brower > >wrote:
> >
> >> Vamsi-
> >>
> >> > The signal I need to operate on is an amplitude envelope of incoming
> >> audio.
> >> > It's sample rate is 86 hertz. I now need to decimate this by factor
> 180.
> >>
> >> Well, it seems like you really want to decimate by 1/180, with a
> resulting
> >> sampling rate of 0.478 Hz. If that doesn't
> >> sound right to you, then give more explanation about your problem and
> what
> >> is your actual objective.
> >>
> >> One question: 86 Hz is a very low sampling rate for any type of audio
> >> signal -- the highest content would be around
> >> 40 Hz, barely qualifying as "audio". Are you sure your envelope signal
> >> isn't already decimated, for example starting
> >> with 16 kHz audio?
> >>
> >> -Jeff
> >>
> >> > On Sun, Feb 27, 2011 at 8:47 PM, Jeff Brower
> >> wrote:
> >> >
> >> >> MV-
> >> >>
> >> >> > Could anyone please translate this for me:
> >> >> >
> >> >> > a signal is "decimated by factor 180"..?
> >> >> >
> >> >> > Does this mean that the signal is down-sampled such
> >> >> > that only its alternate samples are picked?
> >> >>
> >> >> What is the sampling rate and what type of signal?
> >> >>
> >> >> -Jeff
> >> >>
> >> >>
> >> >
> >> >
> >> > --
> >> > Vamsi B. M.
> >> > +1 903 326 3404
> >> >
> >>
> >>
> >
> >
> > --
> > Vamsi B. M.
> > +1 903 326 3404
> >
--
Vamsi B. M.
+1 903 326 3404

Vamsi-

> That doesn't somehow seem right to me. Ok, so here is the process so far.
> An audio signal at Fs = 44.1KHz is run through an FFT every 512 samples. All
> the frequency components of the FFT frame are collated into one signal
> sample. Such samples are collected over time, and for our purposes, can be
> called the amplitude envelope of the incoming audio.

The FFT averaging operations you're describing will produce power spectrum/density results, not an "amplitude
envelope", which typically refers to time domain data.

> Thus a new envelope
> sample is generated at a rate of (44100/512=) 86.13Hz.
> Now the paper that I am referring to (which if you are interested is 'Sound
> Onset Detection by Applying Psychoacoutic knowledge' by Anssi Klapuri) asks
> to full-wave rectify and decimate by 180

"Full wave rectify" implies a time-domain operation -- it would make no sense for frequency domain magnitude data. So
I assume that "decimate by 180" means to reduce the effective sampling rate of the amplitude envelope to 245 Hz.

in order to remove much of
> the instability in the envelope signal. This is where I needed the
> clarification. I understand decimating by small factors such as 2,4 etc in
> order to perform the low-pass filtering. But I was wondering if 180 was
> possibly specified with some phase metric in mind.
> As you suggested, reducing the rate to 0.47Hz, might be the way to approach
> it, but it doesn't necessarily feel right to me, as this would mean losing a
> lot of data as well as timing resolution/accuracy.

My guess at this point is that your time and frequency domain operations are supposed to produce separate results
which are then looked at in some decision making process, i.e. something like this:
44.1 kHz
audio data
|
_________|__________
| |
| |
amplitude envelope FFT mag averaging
LPF, decimate by 180 (power spectrum)
| |
|____________________|
|
|
combine data,
make estimation
or decision

-Jeff

> On Sun, Feb 27, 2011 at 10:45 PM, Jeff Brower wrote:
>
>> Vamsi-
>>
>> > The signal I need to operate on is an amplitude envelope of incoming
>> audio.
>> > It's sample rate is 86 hertz. I now need to decimate this by factor 180.
>>
>> Well, it seems like you really want to decimate by 1/180, with a resulting
>> sampling rate of 0.478 Hz. If that doesn't
>> sound right to you, then give more explanation about your problem and what
>> is your actual objective.
>>
>> One question: 86 Hz is a very low sampling rate for any type of audio
>> signal -- the highest content would be around
>> 40 Hz, barely qualifying as "audio". Are you sure your envelope signal
>> isn't already decimated, for example starting
>> with 16 kHz audio?
>>
>> -Jeff
>>
>> > On Sun, Feb 27, 2011 at 8:47 PM, Jeff Brower
>> wrote:
>> >
>> >> MV-
>> >>
>> >> > Could anyone please translate this for me:
>> >> >
>> >> > a signal is "decimated by factor 180"..?
>> >> >
>> >> > Does this mean that the signal is down-sampled such
>> >> > that only its alternate samples are picked?
>> >>
>> >> What is the sampling rate and what type of signal?
>> >>
>> >> -Jeff
>> >>
>> >>
>> >
>> >
>> > --
>> > Vamsi B. M.
>> > +1 903 326 3404
>> >
> --
> Vamsi B. M.
> +1 903 326 3404
>

Do you mean that the successive samples of individual frequency bins
from the ffts are decimated by 180? This would mean filtering out the
modulation of an individual frequency bin to less than 0.25 Hz. Is
this something to do with a VOX algorithm or noise reduction system?
If memory serves speech modulates individual frequency bins between
0.5 and a few Hz. By filtering to below .25 Hz any speech component
should be removed leaving an estimate of the background noise level of
that frequency bin. There is a patented, and astonishingly effect,
noise reduction system for voice based on this idea. ( But I guess it
costs several MIPS to implement) If I get the chance I'll post
references to this system (which is commercially available) in the
next few days.

Hope that helps,
John Pote

On 28 Feb 2011, at 07:42, vamsi Bharadwaj wrote:

> That doesn't somehow seem right to me. Ok, so here is the process so
> far.
>
> An audio signal at Fs = 44.1KHz is run through an FFT every 512
> samples. All the frequency components of the FFT frame are collated
> into one signal sample. Such samples are collected over time, and
> for our purposes, can be called the amplitude envelope of the
> incoming audio. Thus a new envelope sample is generated at a rate of
> (44100/512=) 86.13Hz.
> Now the paper that I am referring to (which if you are interested is
> 'Sound Onset Detection by Applying Psychoacoutic knowledge' by Anssi
> Klapuri) asks to full-wave rectify and decimate by 180 in order to
> remove much of the instability in the envelope signal. This is where
> I needed the clarification. I understand decimating by small factors
> such as 2,4 etc in order to perform the low-pass filtering. But I
> was wondering if 180 was possibly specified with some phase metric
> in mind.
> As you suggested, reducing the rate to 0.47Hz, might be the way to
> approach it, but it doesn't necessarily feel right to me, as this
> would mean losing a lot of data as well as timing resolution/accuracy.
>
> On Sun, Feb 27, 2011 at 10:45 PM, Jeff Brower
> wrote:
> Vamsi-
>
> > The signal I need to operate on is an amplitude envelope of
> incoming audio.
> > It's sample rate is 86 hertz. I now need to decimate this by
> factor 180.
>
> Well, it seems like you really want to decimate by 1/180, with a
> resulting sampling rate of 0.478 Hz. If that doesn't
> sound right to you, then give more explanation about your problem
> and what is your actual objective.
>
> One question: 86 Hz is a very low sampling rate for any type of
> audio signal -- the highest content would be around
> 40 Hz, barely qualifying as "audio". Are you sure your envelope
> signal isn't already decimated, for example starting
> with 16 kHz audio?
>
> -Jeff
>
> > On Sun, Feb 27, 2011 at 8:47 PM, Jeff Brower
> wrote:
> >
> >> MV-
> >>
> >> > Could anyone please translate this for me:
> >> >
> >> > a signal is "decimated by factor 180"..?
> >> >
> >> > Does this mean that the signal is down-sampled such
> >> > that only its alternate samples are picked?
> >>
> >> What is the sampling rate and what type of signal?
> >>
> >> -Jeff
> >>
> >>
> >
> >
> > --
> > Vamsi B. M.
> > +1 903 326 3404
> >
> --
> Vamsi B. M.
> +1 903 326 3404

That doesn't somehow seem right to me. Ok, so here is the process so far.
An audio signal at Fs = 44.1KHz is run through an FFT every 512 samples. All
the frequency components of the FFT frame are collated into one signal
sample. Such samples are collected over time, and for our purposes, can be
called the amplitude envelope of the incoming audio. Thus a new envelope
sample is generated at a rate of (44100/512=) 86.13Hz.
Now the paper that I am referring to (which if you are interested is 'Sound
Onset Detection by Applying Psychoacoutic knowledge' by Anssi Klapuri) asks
to full-wave rectify and decimate by 180 in order to remove much of
the instability in the envelope signal. This is where I needed the
clarification. I understand decimating by small factors such as 2,4 etc in
order to perform the low-pass filtering. But I was wondering if 180 was
possibly specified with some phase metric in mind.
As you suggested, reducing the rate to 0.47Hz, might be the way to approach
it, but it doesn't necessarily feel right to me, as this would mean losing a
lot of data as well as timing resolution/accuracy.

On Sun, Feb 27, 2011 at 10:45 PM, Jeff Brower wrote:

> Vamsi-
>
> > The signal I need to operate on is an amplitude envelope of incoming
> audio.
> > It's sample rate is 86 hertz. I now need to decimate this by factor 180.
>
> Well, it seems like you really want to decimate by 1/180, with a resulting
> sampling rate of 0.478 Hz. If that doesn't
> sound right to you, then give more explanation about your problem and what
> is your actual objective.
>
> One question: 86 Hz is a very low sampling rate for any type of audio
> signal -- the highest content would be around
> 40 Hz, barely qualifying as "audio". Are you sure your envelope signal
> isn't already decimated, for example starting
> with 16 kHz audio?
>
> -Jeff
>
> > On Sun, Feb 27, 2011 at 8:47 PM, Jeff Brower
> wrote:
> >
> >> MV-
> >>
> >> > Could anyone please translate this for me:
> >> >
> >> > a signal is "decimated by factor 180"..?
> >> >
> >> > Does this mean that the signal is down-sampled such
> >> > that only its alternate samples are picked?
> >>
> >> What is the sampling rate and what type of signal?
> >>
> >> -Jeff
> >>
> >>
> >
> >
> > --
> > Vamsi B. M.
> > +1 903 326 3404
> >
--
Vamsi B. M.
+1 903 326 3404

The signal I need to operate on is an amplitude envelope of incoming audio.
It's sample rate is 86 hertz. I now need to decimate this by factor 180.

On Sun, Feb 27, 2011 at 8:47 PM, Jeff Brower wrote:

> MV-
>
> > Could anyone please translate this for me:
> >
> > a signal is "decimated by factor 180"..?
> >
> > Does this mean that the signal is down-sampled such
> > that only its alternate samples are picked?
>
> What is the sampling rate and what type of signal?
>
> -Jeff
--
Vamsi B. M.
+1 903 326 3404

Vamsi-

> The signal I need to operate on is an amplitude envelope of incoming audio.
> It's sample rate is 86 hertz. I now need to decimate this by factor 180.

Well, it seems like you really want to decimate by 1/180, with a resulting sampling rate of 0.478 Hz. If that doesn't
sound right to you, then give more explanation about your problem and what is your actual objective.

One question: 86 Hz is a very low sampling rate for any type of audio signal -- the highest content would be around
40 Hz, barely qualifying as "audio". Are you sure your envelope signal isn't already decimated, for example starting
with 16 kHz audio?

-Jeff

> On Sun, Feb 27, 2011 at 8:47 PM, Jeff Brower wrote:
>
>> MV-
>>
>> > Could anyone please translate this for me:
>> >
>> > a signal is "decimated by factor 180"..?
>> >
>> > Does this mean that the signal is down-sampled such
>> > that only its alternate samples are picked?
>>
>> What is the sampling rate and what type of signal?
>>
>> -Jeff
> --
> Vamsi B. M.
> +1 903 326 3404
>