>Hi Dale,
>As I've understood you, in order to have non-coherent implementation, it
is
>enough to create a log-spaced array of frequencies (32 elements between
>[318 ; 2000], which yields to 32 items: 
>
>
318.000000000000	
337.433738089353	
358.055118241995	
379.936720095667	
403.155558802797	
427.793356095377	
453.936827915262	
481.677989622234	
511.114479854074	
542.349904178478	
..                
2000.00000000000
>
>After I have these frequency values, I can sum up frequencies between
each
>two items from the log spaced array:
>
>spectrum = FFT(signal);
>for (int i = 0; i < logBins /*32 log spaced bins*/; i++)
>{
>   int lowBound = _log[i]; /*start index*/
>   int hiBound = _log[i + 1]; /*end index*/
>   for (int k = lowBound; k < hiBound; k++)
>   {
>      sumFreq[i] += spectrum[k].SquaredMagnitude;  
>   }
>   sumFreq[i] = sumFreq[i]/(hiBound - lowBound); /*Normalize*/           

>}
>
>This seems to be a reasonable implementation, also a fast one. The only
>question is whether to sum up SquaredMagnitude of FFT bin value or just
the
>Magnitude.
>
>Thank you for your clarifying posts,
>Kind regards, Sergiu C.
>

So does anybody know what's the preferred way of summing the FFT bins?
Absolute value or Absolute value squared?

> ...
>You already have two reasonable implementations of coherent fft
>implemented constant Q logarithmically spaced filters. If you don't
>need a coherent implementation for your audio fingerprinting, you can
>simply sum the power of closest fft bins into each log spaced center
>frequency and normalize each such incoherent filter by the number of
>bins summed into it. This would produce a set of incoherent fft
>implemented roughly constant Q logarithmically spaced filters and
>require less processing.
>
>Dale B. Dalrymple
>

Hi Dale,
As I've understood you, in order to have non-coherent implementation, it is
enough to create a log-spaced array of frequencies (32 elements between
[318 ; 2000], which yields to 32 items: 

318.000000000000	337.433738089353	358.055118241995	379.936720095667	403.155558802797	427.793356095377	453.936827915262	481.677989622234	511.114479854074	542.349904178478	...
                    2000.00000000000

After I have these frequency values, I can sum up frequencies between each
two items from the log spaced array:

spectrum = FFT(signal);
for (int i = 0; i < logBins /*32 log spaced bins*/; i++)
{
   int lowBound = _log[i]; /*start index*/
   int hiBound = _log[i + 1]; /*end index*/
   for (int k = lowBound; k < hiBound; k++)
   {
      sumFreq[i] += spectrum[k].SquaredMagnitude;  
   }
   sumFreq[i] = sumFreq[i]/(hiBound - lowBound); /*Normalize*/            
}

This seems to be a reasonable implementation, also a fast one. The only
question is whether to sum up SquaredMagnitude of FFT bin value or just the
Magnitude.

Thank you for your clarifying posts,
Kind regards, Sergiu C.

On Dec 30, 5:16&#4294967295;am, "December16" <december16@n_o_s_p_a_m.mail.ru>
wrote:
> ...

>
> Would you be so kind in giving me a reference how re-sampling the frequency
> of an FFT into logarithmic bins is done. It would be of a great help. What
> kind of FFT-based approach you mean?
> Thanks in advance,
> Kind regards, Sergiu

You already have two reasonable implementations of coherent fft
implemented constant Q logarithmically spaced filters. If you don't
need a coherent implementation for your audio fingerprinting, you can
simply sum the power of closest fft bins into each log spaced center
frequency and normalize each such incoherent filter by the number of
bins summed into it. This would produce a set of incoherent fft
implemented roughly constant Q logarithmically spaced filters and
require less processing.

Dale B. Dalrymple

On Dec 30, 5:16&#4294967295;am, "December16" <december16@n_o_s_p_a_m.mail.ru>
wrote:
> Would you be so kind in giving me a reference how re-sampling the frequency
> of an FFT into logarithmic bins is done. It would be of a great help. What
> kind of FFT-based approach you mean?

You can interpolate FFT results using an interpolation
kernel consisting of the transform of the window used
before the FFT.  In the case of a rectangular window,
this would be the common Sinc interpolation method,
similar to that use in resampling audio.

If the resulting bin spacing is much wider than the
transform of the window, you can get other interesting
or useless results by fattening the interpolation
kernel to simulate the application of an narrower time
domain window, of the width which would have produced
a filter closer to constant Q over frequency.

IMHO. YMMV.
--
rhn A.T nicholson d.0.t C-o-M
 http://www.nicholson.com/rhn/dsp.html

>Thanks for clarifying. I meant the OP's reference to Covell and
>Baluja. I took the fact that it did not specify scaled kernels as
>raising a suspicion they did not use them. For the OP I suppose it is
>important to determine what they actually do mean, and as you say that
>is not clear.
>
>Chris

Thanks a lot for your comments,
The initial article (Covell, Baluja) does not specify anything about scaled
kernels. The only justification related to logarithm is as follows:
"The use of logarithmical spacing in frequency was selected based for
simplicity, since the detailed band-edge locations are unlikely to have a
strong effect under such coarse sampling (only 32 samples across
frequency)."
After Dale suggested a constant Q transform, I assumed that Google's
researchers indeed use it (I have no idea whether it is true or not, at
least there is no reference in their work to constant Q transformation).

>However, the article you cite does not
>seem to do this, instead using a fixed window length, so is just an
>FFT-based approach to re-sampling the frequency output of an FFT into
>logarithmic bins - also perfectly reasonable.

Would you be so kind in giving me a reference how re-sampling the frequency
of an FFT into logarithmic bins is done. It would be of a great help. What
kind of FFT-based approach you mean?
Thanks in advance,
Kind regards, Sergiu

On Dec 29, 9:31&#4294967295;pm, dbd <d...@ieee.org> wrote:
> On Dec 29, 12:31&#4294967295;pm, Chris Bore <chris.b...@gmail.com> wrote:
>
> > ...
> > If this was a Q-Transform (which also yields logarithmically spaced
> > frequency bins, but with the proviso that each bin's width is
> > proportional to its centre frequency, then you would also need to
> > adapt the window for each bin. However, the article you cite does not
> > seem to do this, instead using a fixed window length, so is just an
> > FFT-based approach to re-sampling the frequency output of an FFT into
> > logarithmic bins - also perfectly reasonable.
>
> > Chris
>
> It is not clear to which of the articles cited in this thread you are
> referring.
>
> The OP's originally referenced Covell and Baluja paper does not
> provide enough detail to verify whether if the filters implemented are
> constant-Q. The Brown and Puckette JASA paper I referenced illustrates
> the scaled kernels used to achieve constant-Q in Figure 1. These
> kernels are transformed to generate the frequency domain coefficients
> used to generate the constant-Q outputs. The OP's second reference is
> to a url where the window is properly scaled for constant-Q as well.
>
> Dale B. Dalrymple

Thanks for clarifying. I meant the OP's reference to Covell and
Baluja. I took the fact that it did not specify scaled kernels as
raising a suspicion they did not use them. For the OP I suppose it is
important to determine what they actually do mean, and as you say that
is not clear.

Chris
===================
Chris Bore
BORES Signal Processing
www.bores.com

On Dec 29, 12:31&#4294967295;pm, Chris Bore <chris.b...@gmail.com> wrote:
> ...

> If this was a Q-Transform (which also yields logarithmically spaced
> frequency bins, but with the proviso that each bin's width is
> proportional to its centre frequency, then you would also need to
> adapt the window for each bin. However, the article you cite does not
> seem to do this, instead using a fixed window length, so is just an
> FFT-based approach to re-sampling the frequency output of an FFT into
> logarithmic bins - also perfectly reasonable.
>
> Chris

It is not clear to which of the articles cited in this thread you are
referring.

The OP's originally referenced Covell and Baluja paper does not
provide enough detail to verify whether if the filters implemented are
constant-Q. The Brown and Puckette JASA paper I referenced illustrates
the scaled kernels used to achieve constant-Q in Figure 1. These
kernels are transformed to generate the frequency domain coefficients
used to generate the constant-Q outputs. The OP's second reference is
to a url where the window is properly scaled for constant-Q as well.

Dale B. Dalrymple

On Dec 25, 2:14&#4294967295;pm, "December16" <december16@n_o_s_p_a_m.mail.ru>
wrote:
> Hi everybody,
> I got stuck with a simple DSP problem, and I'm not able to find a any
> resource that might help me. I was reading a well known article (Content
> Fingerprinting Using Wavelets, M.Covell, S.Baluja), in which there is a
> small description of audio processing:
> "We start our processing by converting the audio input (sampled at 5512 Hz)
> into a spectrogram. We create our spectrograms using the following
> settings: from the initial song, we use slices that are 371 ms long, taken
> every 11.6 ms, reduced to 32 logarithmically spaced frequency bins between
> 318 Hz and 2 kHz.".
> As far as I understand the procedure is as follows:
> a) Cut 371 ms long snippets (equal to 2048 samples on 5512 Hz frequency
> rate).
> b) Apply window function (Hanning window is used in their studies), with
> the same length = 2048.
> c) Apply Fast Fourier Transform &#4294967295;on the signal (FFT with N = 2048).
> d) As I understand, after FFT we get 1027 linearly spaced frequency bins
> (corresponding to 0 - 2756 Hz).
> At this point I do not understand how can you reduce 1027 bins to 32
> logarithmically spaced bins between 318 - 2000 Hz. What kind of procedure
> is used? Do they sum the frequency bins using a custom logarithm base? And
> what kind of transformation they use in order to obtain the spectrogram
> (Magnitude, Squared Magnitude, 20log10 rule)?
> Any help will be greatly appreciated!

You don't have to use an FFT. It is perfectly acceptable to simply do
the Fourier Transform directly, for the frequencies you need. That is,
you calculate the frequency using the FT rather than the FFT.

If this was a Q-Transform (which also yields logarithmically spaced
frequency bins, but with the proviso that each bin's width is
proportional to its centre frequency, then you would also need to
adapt the window for each bin. However, the article you cite does not
seem to do this, instead using a fixed window length, so is just an
FFT-based approach to re-sampling the frequency output of an FFT into
logarithmic bins - also perfectly reasonable.

Chris
===========================
Chris Bore
BORES Signal Processing
www.bores.com

Thanks Dale for your response,
>> ...
>These don't seem to come from the content of either article. How can
>anyone comment on them?

Sorry for using a non quoted example. The algorithm's code comes from the
implementation of the efficient algorithm for the constant Q transform
[Brown and Puckette 92] coded by Benjamin Blankertztaken (can be found
here:
http://wwwmath.uni-muenster.de/logik/Personen/blankertz/constQ/constQ.html
)

>If you are applying Brown's coefficient pruning scheme, a larger
>transform and smaller MINVAL would allow more accurate calculation at
>the cost of greater processing load.

Thanks for clarification. Indeed larger number of FFT points, will result
in a better frequency resolution (at a corresponding processing cost), so
it seems to be just a trade off.

Your comments helped a lot.
Regards

On Dec 28, 12:04&#4294967295;am, "December16" <december16@n_o_s_p_a_m.mail.ru>
wrote:
> ...

> K=ceil(bins*log2(maxFreq/minFreq)); (K=32,according to the max, min freq)
> fftLen=2^nextpow2(ceil(Q*fs/minFreq)); (fftLen = 512)

These don't seem to come from the content of either article. How can
anyone comment on them?

>
> The authors of the article suggest us to use snippets of length 2048. The
> algorithm results in 512 FFT length. What is a preferred way of dealing
> with this inequality between snippet length and FFT length. Use FFT with
> N=512, and therefore truncate initial 2048 signal, or hard code the length
> of FFT N=2048 in the algorithm?

If you are applying Brown's coefficient pruning scheme, a larger
transform and smaller MINVAL would allow more accurate calculation at
the cost of greater processing load. Take your pick.

Dale B. Dalrymple