Reply by December16 January 6, 20112011-01-06
>Hi Dale, >As I've understood you, in order to have non-coherent implementation, it
is
>enough to create a log-spaced array of frequencies (32 elements between >[318 ; 2000], which yields to 32 items: > >
318.000000000000 337.433738089353 358.055118241995 379.936720095667 403.155558802797 427.793356095377 453.936827915262 481.677989622234 511.114479854074 542.349904178478 .. 2000.00000000000
> >After I have these frequency values, I can sum up frequencies between
each
>two items from the log spaced array: > >spectrum = FFT(signal); >for (int i = 0; i < logBins /*32 log spaced bins*/; i++) >{ > int lowBound = _log[i]; /*start index*/ > int hiBound = _log[i + 1]; /*end index*/ > for (int k = lowBound; k < hiBound; k++) > { > sumFreq[i] += spectrum[k].SquaredMagnitude; > } > sumFreq[i] = sumFreq[i]/(hiBound - lowBound); /*Normalize*/
>} > >This seems to be a reasonable implementation, also a fast one. The only >question is whether to sum up SquaredMagnitude of FFT bin value or just
the
>Magnitude. > >Thank you for your clarifying posts, >Kind regards, Sergiu C. >
So does anybody know what's the preferred way of summing the FFT bins? Absolute value or Absolute value squared?
Reply by December16 January 3, 20112011-01-03
> ... >You already have two reasonable implementations of coherent fft >implemented constant Q logarithmically spaced filters. If you don't >need a coherent implementation for your audio fingerprinting, you can >simply sum the power of closest fft bins into each log spaced center >frequency and normalize each such incoherent filter by the number of >bins summed into it. This would produce a set of incoherent fft >implemented roughly constant Q logarithmically spaced filters and >require less processing. > >Dale B. Dalrymple >
Hi Dale, As I've understood you, in order to have non-coherent implementation, it is enough to create a log-spaced array of frequencies (32 elements between [318 ; 2000], which yields to 32 items: 318.000000000000 337.433738089353 358.055118241995 379.936720095667 403.155558802797 427.793356095377 453.936827915262 481.677989622234 511.114479854074 542.349904178478 ... 2000.00000000000 After I have these frequency values, I can sum up frequencies between each two items from the log spaced array: spectrum = FFT(signal); for (int i = 0; i < logBins /*32 log spaced bins*/; i++) { int lowBound = _log[i]; /*start index*/ int hiBound = _log[i + 1]; /*end index*/ for (int k = lowBound; k < hiBound; k++) { sumFreq[i] += spectrum[k].SquaredMagnitude; } sumFreq[i] = sumFreq[i]/(hiBound - lowBound); /*Normalize*/ } This seems to be a reasonable implementation, also a fast one. The only question is whether to sum up SquaredMagnitude of FFT bin value or just the Magnitude. Thank you for your clarifying posts, Kind regards, Sergiu C.
Reply by dbd December 30, 20102010-12-30
On Dec 30, 5:16&#4294967295;am, "December16" <december16@n_o_s_p_a_m.mail.ru>
wrote:
> ...
> > Would you be so kind in giving me a reference how re-sampling the frequency > of an FFT into logarithmic bins is done. It would be of a great help. What > kind of FFT-based approach you mean? > Thanks in advance, > Kind regards, Sergiu
You already have two reasonable implementations of coherent fft implemented constant Q logarithmically spaced filters. If you don't need a coherent implementation for your audio fingerprinting, you can simply sum the power of closest fft bins into each log spaced center frequency and normalize each such incoherent filter by the number of bins summed into it. This would produce a set of incoherent fft implemented roughly constant Q logarithmically spaced filters and require less processing. Dale B. Dalrymple
Reply by Ron N. December 30, 20102010-12-30
On Dec 30, 5:16&#4294967295;am, "December16" <december16@n_o_s_p_a_m.mail.ru>
wrote:
> Would you be so kind in giving me a reference how re-sampling the frequency > of an FFT into logarithmic bins is done. It would be of a great help. What > kind of FFT-based approach you mean?
You can interpolate FFT results using an interpolation kernel consisting of the transform of the window used before the FFT. In the case of a rectangular window, this would be the common Sinc interpolation method, similar to that use in resampling audio. If the resulting bin spacing is much wider than the transform of the window, you can get other interesting or useless results by fattening the interpolation kernel to simulate the application of an narrower time domain window, of the width which would have produced a filter closer to constant Q over frequency. IMHO. YMMV. -- rhn A.T nicholson d.0.t C-o-M http://www.nicholson.com/rhn/dsp.html
Reply by December16 December 30, 20102010-12-30
>Thanks for clarifying. I meant the OP's reference to Covell and >Baluja. I took the fact that it did not specify scaled kernels as >raising a suspicion they did not use them. For the OP I suppose it is >important to determine what they actually do mean, and as you say that >is not clear. > >Chris
Thanks a lot for your comments, The initial article (Covell, Baluja) does not specify anything about scaled kernels. The only justification related to logarithm is as follows: "The use of logarithmical spacing in frequency was selected based for simplicity, since the detailed band-edge locations are unlikely to have a strong effect under such coarse sampling (only 32 samples across frequency)." After Dale suggested a constant Q transform, I assumed that Google's researchers indeed use it (I have no idea whether it is true or not, at least there is no reference in their work to constant Q transformation).
>However, the article you cite does not >seem to do this, instead using a fixed window length, so is just an >FFT-based approach to re-sampling the frequency output of an FFT into >logarithmic bins - also perfectly reasonable.
Would you be so kind in giving me a reference how re-sampling the frequency of an FFT into logarithmic bins is done. It would be of a great help. What kind of FFT-based approach you mean? Thanks in advance, Kind regards, Sergiu
Reply by Chris Bore December 30, 20102010-12-30
On Dec 29, 9:31&#4294967295;pm, dbd <d...@ieee.org> wrote:
> On Dec 29, 12:31&#4294967295;pm, Chris Bore <chris.b...@gmail.com> wrote: > > > ... > > If this was a Q-Transform (which also yields logarithmically spaced > > frequency bins, but with the proviso that each bin's width is > > proportional to its centre frequency, then you would also need to > > adapt the window for each bin. However, the article you cite does not > > seem to do this, instead using a fixed window length, so is just an > > FFT-based approach to re-sampling the frequency output of an FFT into > > logarithmic bins - also perfectly reasonable. > > > Chris > > It is not clear to which of the articles cited in this thread you are > referring. > > The OP's originally referenced Covell and Baluja paper does not > provide enough detail to verify whether if the filters implemented are > constant-Q. The Brown and Puckette JASA paper I referenced illustrates > the scaled kernels used to achieve constant-Q in Figure 1. These > kernels are transformed to generate the frequency domain coefficients > used to generate the constant-Q outputs. The OP's second reference is > to a url where the window is properly scaled for constant-Q as well. > > Dale B. Dalrymple
Thanks for clarifying. I meant the OP's reference to Covell and Baluja. I took the fact that it did not specify scaled kernels as raising a suspicion they did not use them. For the OP I suppose it is important to determine what they actually do mean, and as you say that is not clear. Chris =================== Chris Bore BORES Signal Processing www.bores.com
Reply by dbd December 29, 20102010-12-29
On Dec 29, 12:31&#4294967295;pm, Chris Bore <chris.b...@gmail.com> wrote:
> ...
> If this was a Q-Transform (which also yields logarithmically spaced > frequency bins, but with the proviso that each bin's width is > proportional to its centre frequency, then you would also need to > adapt the window for each bin. However, the article you cite does not > seem to do this, instead using a fixed window length, so is just an > FFT-based approach to re-sampling the frequency output of an FFT into > logarithmic bins - also perfectly reasonable. > > Chris
It is not clear to which of the articles cited in this thread you are referring. The OP's originally referenced Covell and Baluja paper does not provide enough detail to verify whether if the filters implemented are constant-Q. The Brown and Puckette JASA paper I referenced illustrates the scaled kernels used to achieve constant-Q in Figure 1. These kernels are transformed to generate the frequency domain coefficients used to generate the constant-Q outputs. The OP's second reference is to a url where the window is properly scaled for constant-Q as well. Dale B. Dalrymple
Reply by Chris Bore December 29, 20102010-12-29
On Dec 25, 2:14&#4294967295;pm, "December16" <december16@n_o_s_p_a_m.mail.ru>
wrote:
> Hi everybody, > I got stuck with a simple DSP problem, and I'm not able to find a any > resource that might help me. I was reading a well known article (Content > Fingerprinting Using Wavelets, M.Covell, S.Baluja), in which there is a > small description of audio processing: > "We start our processing by converting the audio input (sampled at 5512 Hz) > into a spectrogram. We create our spectrograms using the following > settings: from the initial song, we use slices that are 371 ms long, taken > every 11.6 ms, reduced to 32 logarithmically spaced frequency bins between > 318 Hz and 2 kHz.". > As far as I understand the procedure is as follows: > a) Cut 371 ms long snippets (equal to 2048 samples on 5512 Hz frequency > rate). > b) Apply window function (Hanning window is used in their studies), with > the same length = 2048. > c) Apply Fast Fourier Transform &#4294967295;on the signal (FFT with N = 2048). > d) As I understand, after FFT we get 1027 linearly spaced frequency bins > (corresponding to 0 - 2756 Hz). > At this point I do not understand how can you reduce 1027 bins to 32 > logarithmically spaced bins between 318 - 2000 Hz. What kind of procedure > is used? Do they sum the frequency bins using a custom logarithm base? And > what kind of transformation they use in order to obtain the spectrogram > (Magnitude, Squared Magnitude, 20log10 rule)? > Any help will be greatly appreciated!
You don't have to use an FFT. It is perfectly acceptable to simply do the Fourier Transform directly, for the frequencies you need. That is, you calculate the frequency using the FT rather than the FFT. If this was a Q-Transform (which also yields logarithmically spaced frequency bins, but with the proviso that each bin's width is proportional to its centre frequency, then you would also need to adapt the window for each bin. However, the article you cite does not seem to do this, instead using a fixed window length, so is just an FFT-based approach to re-sampling the frequency output of an FFT into logarithmic bins - also perfectly reasonable. Chris =========================== Chris Bore BORES Signal Processing www.bores.com
Reply by December16 December 29, 20102010-12-29
Thanks Dale for your response,
>> ... >These don't seem to come from the content of either article. How can >anyone comment on them?
Sorry for using a non quoted example. The algorithm's code comes from the implementation of the efficient algorithm for the constant Q transform [Brown and Puckette 92] coded by Benjamin Blankertztaken (can be found here: http://wwwmath.uni-muenster.de/logik/Personen/blankertz/constQ/constQ.html )
>If you are applying Brown's coefficient pruning scheme, a larger >transform and smaller MINVAL would allow more accurate calculation at >the cost of greater processing load.
Thanks for clarification. Indeed larger number of FFT points, will result in a better frequency resolution (at a corresponding processing cost), so it seems to be just a trade off. Your comments helped a lot. Regards
Reply by dbd December 28, 20102010-12-28
On Dec 28, 12:04&#4294967295;am, "December16" <december16@n_o_s_p_a_m.mail.ru>
wrote:
> ...
> K=ceil(bins*log2(maxFreq/minFreq)); (K=32,according to the max, min freq) > fftLen=2^nextpow2(ceil(Q*fs/minFreq)); (fftLen = 512)
These don't seem to come from the content of either article. How can anyone comment on them?
> > The authors of the article suggest us to use snippets of length 2048. The > algorithm results in 512 FFT length. What is a preferred way of dealing > with this inequality between snippet length and FFT length. Use FFT with > N=512, and therefore truncate initial 2048 signal, or hard code the length > of FFT N=2048 in the algorithm?
If you are applying Brown's coefficient pruning scheme, a larger transform and smaller MINVAL would allow more accurate calculation at the cost of greater processing load. Take your pick. Dale B. Dalrymple