comp.dsp | Logarithmically spaced bins

Hi everybody,
I got stuck with a simple DSP problem, and I'm not able to find a any
resource that might help me. I was reading a well known article (Content
Fingerprinting Using Wavelets, M.Covell, S.Baluja), in which there is a
small description of audio processing: 
"We start our processing by converting the audio input (sampled at 5512 Hz)
into a spectrogram. We create our spectrograms using the following
settings: from the initial song, we use slices that are 371 ms long, taken
every 11.6 ms, reduced to 32 logarithmically spaced frequency bins between
318 Hz and 2 kHz.". 
As far as I understand the procedure is as follows:
a) Cut 371 ms long snippets (equal to 2048 samples on 5512 Hz frequency
rate).
b) Apply window function (Hanning window is used in their studies), with
the same length = 2048.
c) Apply Fast Fourier Transform  on the signal (FFT with N = 2048).
d) As I understand, after FFT we get 1027 linearly spaced frequency bins
(corresponding to 0 - 2756 Hz).
At this point I do not understand how can you reduce 1027 bins to 32
logarithmically spaced bins between 318 - 2000 Hz. What kind of procedure
is used? Do they sum the frequency bins using a custom logarithm base? And
what kind of transformation they use in order to obtain the spectrogram
(Magnitude, Squared Magnitude, 20log10 rule)? 
Any help will be greatly appreciated!

Reply by dbd ●December 25, 20102010-12-25

On Dec 25, 6:14&#4294967295;am, "December16" <december16@n_o_s_p_a_m.mail.ru>
wrote:
> ...

> At this point I do not understand how can you reduce 1027 bins to 32
> logarithmically spaced bins between 318 - 2000 Hz. What kind of procedure
> is used? Do they sum the frequency bins using a custom logarithm base? And
> what kind of transformation they use in order to obtain the spectrogram
> (Magnitude, Squared Magnitude, 20log10 rule)?
> Any help will be greatly appreciated!

Look for constant Q transform for one approach. A classic paper:

http://www.wellesley.edu/Physics/brown/pubs/effalgV92P2698-P2701.pdf

Dale B. Dalrymple

Reply by HardySpicer ●December 25, 20102010-12-25

On Dec 26, 3:14&#4294967295;am, "December16" <december16@n_o_s_p_a_m.mail.ru>
wrote:
I find that our refuse men don't really understand logarithmically
space bins.

Reply by December16 ●December 28, 20102010-12-28

>Look for constant Q transform for one approach. A classic paper:
>
>http://www.wellesley.edu/Physics/brown/pubs/effalgV92P2698-P2701.pdf
>
>Dale B. Dalrymple
>
Thank you Dale for your help,
I've considered the algorithm written in the article, but there is one
minor detail that I would like to clarify. As the algorithm suggests us, we
shall use the following formulas in order to obtain transformation
parameters:
  
Q=1/(2^(1/bins)-1); (for 12 bins Q =~ 17)
K=ceil(bins*log2(maxFreq/minFreq)); (K=32,according to the max, min freq)
fftLen=2^nextpow2(ceil(Q*fs/minFreq)); (fftLen = 512)

The authors of the article suggest us to use snippets of length 2048. The
algorithm results in 512 FFT length. What is a preferred way of dealing
with this inequality between snippet length and FFT length. Use FFT with
N=512, and therefore truncate initial 2048 signal, or hard code the length
of FFT N=2048 in the algorithm?

Reply by dbd ●December 28, 20102010-12-28

On Dec 28, 12:04&#4294967295;am, "December16" <december16@n_o_s_p_a_m.mail.ru>
wrote:
> ...

> K=ceil(bins*log2(maxFreq/minFreq)); (K=32,according to the max, min freq)
> fftLen=2^nextpow2(ceil(Q*fs/minFreq)); (fftLen = 512)

These don't seem to come from the content of either article. How can
anyone comment on them?

>
> The authors of the article suggest us to use snippets of length 2048. The
> algorithm results in 512 FFT length. What is a preferred way of dealing
> with this inequality between snippet length and FFT length. Use FFT with
> N=512, and therefore truncate initial 2048 signal, or hard code the length
> of FFT N=2048 in the algorithm?

If you are applying Brown's coefficient pruning scheme, a larger
transform and smaller MINVAL would allow more accurate calculation at
the cost of greater processing load. Take your pick.

Dale B. Dalrymple

Reply by December16 ●December 29, 20102010-12-29

Thanks Dale for your response,
>> ...
>These don't seem to come from the content of either article. How can
>anyone comment on them?

Sorry for using a non quoted example. The algorithm's code comes from the
implementation of the efficient algorithm for the constant Q transform
[Brown and Puckette 92] coded by Benjamin Blankertztaken (can be found
here:
http://wwwmath.uni-muenster.de/logik/Personen/blankertz/constQ/constQ.html
)

>If you are applying Brown's coefficient pruning scheme, a larger
>transform and smaller MINVAL would allow more accurate calculation at
>the cost of greater processing load.

Thanks for clarification. Indeed larger number of FFT points, will result
in a better frequency resolution (at a corresponding processing cost), so
it seems to be just a trade off.

Your comments helped a lot.
Regards

Reply by Chris Bore ●December 29, 20102010-12-29

On Dec 25, 2:14&#4294967295;pm, "December16" <december16@n_o_s_p_a_m.mail.ru>
wrote:
> Hi everybody,
> I got stuck with a simple DSP problem, and I'm not able to find a any
> resource that might help me. I was reading a well known article (Content
> Fingerprinting Using Wavelets, M.Covell, S.Baluja), in which there is a
> small description of audio processing:
> "We start our processing by converting the audio input (sampled at 5512 Hz)
> into a spectrogram. We create our spectrograms using the following
> settings: from the initial song, we use slices that are 371 ms long, taken
> every 11.6 ms, reduced to 32 logarithmically spaced frequency bins between
> 318 Hz and 2 kHz.".
> As far as I understand the procedure is as follows:
> a) Cut 371 ms long snippets (equal to 2048 samples on 5512 Hz frequency
> rate).
> b) Apply window function (Hanning window is used in their studies), with
> the same length = 2048.
> c) Apply Fast Fourier Transform &#4294967295;on the signal (FFT with N = 2048).
> d) As I understand, after FFT we get 1027 linearly spaced frequency bins
> (corresponding to 0 - 2756 Hz).
> At this point I do not understand how can you reduce 1027 bins to 32
> logarithmically spaced bins between 318 - 2000 Hz. What kind of procedure
> is used? Do they sum the frequency bins using a custom logarithm base? And
> what kind of transformation they use in order to obtain the spectrogram
> (Magnitude, Squared Magnitude, 20log10 rule)?
> Any help will be greatly appreciated!

You don't have to use an FFT. It is perfectly acceptable to simply do
the Fourier Transform directly, for the frequencies you need. That is,
you calculate the frequency using the FT rather than the FFT.

If this was a Q-Transform (which also yields logarithmically spaced
frequency bins, but with the proviso that each bin's width is
proportional to its centre frequency, then you would also need to
adapt the window for each bin. However, the article you cite does not
seem to do this, instead using a fixed window length, so is just an
FFT-based approach to re-sampling the frequency output of an FFT into
logarithmic bins - also perfectly reasonable.

Chris
===========================
Chris Bore
BORES Signal Processing
www.bores.com

Reply by dbd ●December 29, 20102010-12-29

On Dec 29, 12:31&#4294967295;pm, Chris Bore <chris.b...@gmail.com> wrote:
> ...

> If this was a Q-Transform (which also yields logarithmically spaced
> frequency bins, but with the proviso that each bin's width is
> proportional to its centre frequency, then you would also need to
> adapt the window for each bin. However, the article you cite does not
> seem to do this, instead using a fixed window length, so is just an
> FFT-based approach to re-sampling the frequency output of an FFT into
> logarithmic bins - also perfectly reasonable.
>
> Chris

It is not clear to which of the articles cited in this thread you are
referring.

The OP's originally referenced Covell and Baluja paper does not
provide enough detail to verify whether if the filters implemented are
constant-Q. The Brown and Puckette JASA paper I referenced illustrates
the scaled kernels used to achieve constant-Q in Figure 1. These
kernels are transformed to generate the frequency domain coefficients
used to generate the constant-Q outputs. The OP's second reference is
to a url where the window is properly scaled for constant-Q as well.

Dale B. Dalrymple

Reply by Chris Bore ●December 30, 20102010-12-30

On Dec 29, 9:31&#4294967295;pm, dbd <d...@ieee.org> wrote:
> On Dec 29, 12:31&#4294967295;pm, Chris Bore <chris.b...@gmail.com> wrote:
>
> > ...
> > If this was a Q-Transform (which also yields logarithmically spaced
> > frequency bins, but with the proviso that each bin's width is
> > proportional to its centre frequency, then you would also need to
> > adapt the window for each bin. However, the article you cite does not
> > seem to do this, instead using a fixed window length, so is just an
> > FFT-based approach to re-sampling the frequency output of an FFT into
> > logarithmic bins - also perfectly reasonable.
>
> > Chris
>
> It is not clear to which of the articles cited in this thread you are
> referring.
>
> The OP's originally referenced Covell and Baluja paper does not
> provide enough detail to verify whether if the filters implemented are
> constant-Q. The Brown and Puckette JASA paper I referenced illustrates
> the scaled kernels used to achieve constant-Q in Figure 1. These
> kernels are transformed to generate the frequency domain coefficients
> used to generate the constant-Q outputs. The OP's second reference is
> to a url where the window is properly scaled for constant-Q as well.
>
> Dale B. Dalrymple

Thanks for clarifying. I meant the OP's reference to Covell and
Baluja. I took the fact that it did not specify scaled kernels as
raising a suspicion they did not use them. For the OP I suppose it is
important to determine what they actually do mean, and as you say that
is not clear.

Chris
===================
Chris Bore
BORES Signal Processing
www.bores.com

Reply by December16 ●December 30, 20102010-12-30

>Thanks for clarifying. I meant the OP's reference to Covell and
>Baluja. I took the fact that it did not specify scaled kernels as
>raising a suspicion they did not use them. For the OP I suppose it is
>important to determine what they actually do mean, and as you say that
>is not clear.
>
>Chris

Thanks a lot for your comments,
The initial article (Covell, Baluja) does not specify anything about scaled
kernels. The only justification related to logarithm is as follows:
"The use of logarithmical spacing in frequency was selected based for
simplicity, since the detailed band-edge locations are unlikely to have a
strong effect under such coarse sampling (only 32 samples across
frequency)."
After Dale suggested a constant Q transform, I assumed that Google's
researchers indeed use it (I have no idea whether it is true or not, at
least there is no reference in their work to constant Q transformation).

>However, the article you cite does not
>seem to do this, instead using a fixed window length, so is just an
>FFT-based approach to re-sampling the frequency output of an FFT into
>logarithmic bins - also perfectly reasonable.

Would you be so kind in giving me a reference how re-sampling the frequency
of an FFT into logarithmic bins is done. It would be of a great help. What
kind of FFT-based approach you mean?
Thanks in advance,
Kind regards, Sergiu

Previous12 Next

Logarithmically spaced bins

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group