Forums

Logarithmically spaced bins

Started by December16 December 25, 2010
Hi everybody,
I got stuck with a simple DSP problem, and I'm not able to find a any
resource that might help me. I was reading a well known article (Content
Fingerprinting Using Wavelets, M.Covell, S.Baluja), in which there is a
small description of audio processing: 
"We start our processing by converting the audio input (sampled at 5512 Hz)
into a spectrogram. We create our spectrograms using the following
settings: from the initial song, we use slices that are 371 ms long, taken
every 11.6 ms, reduced to 32 logarithmically spaced frequency bins between
318 Hz and 2 kHz.". 
As far as I understand the procedure is as follows:
a) Cut 371 ms long snippets (equal to 2048 samples on 5512 Hz frequency
rate).
b) Apply window function (Hanning window is used in their studies), with
the same length = 2048.
c) Apply Fast Fourier Transform  on the signal (FFT with N = 2048).
d) As I understand, after FFT we get 1027 linearly spaced frequency bins
(corresponding to 0 - 2756 Hz).
At this point I do not understand how can you reduce 1027 bins to 32
logarithmically spaced bins between 318 - 2000 Hz. What kind of procedure
is used? Do they sum the frequency bins using a custom logarithm base? And
what kind of transformation they use in order to obtain the spectrogram
(Magnitude, Squared Magnitude, 20log10 rule)? 
Any help will be greatly appreciated!



On Dec 25, 6:14&#2013266080;am, "December16" <december16@n_o_s_p_a_m.mail.ru>
wrote:
> ...
> At this point I do not understand how can you reduce 1027 bins to 32 > logarithmically spaced bins between 318 - 2000 Hz. What kind of procedure > is used? Do they sum the frequency bins using a custom logarithm base? And > what kind of transformation they use in order to obtain the spectrogram > (Magnitude, Squared Magnitude, 20log10 rule)? > Any help will be greatly appreciated!
Look for constant Q transform for one approach. A classic paper: http://www.wellesley.edu/Physics/brown/pubs/effalgV92P2698-P2701.pdf Dale B. Dalrymple
On Dec 26, 3:14&#2013266080;am, "December16" <december16@n_o_s_p_a_m.mail.ru>
wrote:
I find that our refuse men don't really understand logarithmically
space bins.
>Look for constant Q transform for one approach. A classic paper: > >http://www.wellesley.edu/Physics/brown/pubs/effalgV92P2698-P2701.pdf > >Dale B. Dalrymple >
Thank you Dale for your help, I've considered the algorithm written in the article, but there is one minor detail that I would like to clarify. As the algorithm suggests us, we shall use the following formulas in order to obtain transformation parameters: Q=1/(2^(1/bins)-1); (for 12 bins Q =~ 17) K=ceil(bins*log2(maxFreq/minFreq)); (K=32,according to the max, min freq) fftLen=2^nextpow2(ceil(Q*fs/minFreq)); (fftLen = 512) The authors of the article suggest us to use snippets of length 2048. The algorithm results in 512 FFT length. What is a preferred way of dealing with this inequality between snippet length and FFT length. Use FFT with N=512, and therefore truncate initial 2048 signal, or hard code the length of FFT N=2048 in the algorithm?
On Dec 28, 12:04&#2013266080;am, "December16" <december16@n_o_s_p_a_m.mail.ru>
wrote:
> ...
> K=ceil(bins*log2(maxFreq/minFreq)); (K=32,according to the max, min freq) > fftLen=2^nextpow2(ceil(Q*fs/minFreq)); (fftLen = 512)
These don't seem to come from the content of either article. How can anyone comment on them?
> > The authors of the article suggest us to use snippets of length 2048. The > algorithm results in 512 FFT length. What is a preferred way of dealing > with this inequality between snippet length and FFT length. Use FFT with > N=512, and therefore truncate initial 2048 signal, or hard code the length > of FFT N=2048 in the algorithm?
If you are applying Brown's coefficient pruning scheme, a larger transform and smaller MINVAL would allow more accurate calculation at the cost of greater processing load. Take your pick. Dale B. Dalrymple
Thanks Dale for your response,
>> ... >These don't seem to come from the content of either article. How can >anyone comment on them?
Sorry for using a non quoted example. The algorithm's code comes from the implementation of the efficient algorithm for the constant Q transform [Brown and Puckette 92] coded by Benjamin Blankertztaken (can be found here: http://wwwmath.uni-muenster.de/logik/Personen/blankertz/constQ/constQ.html )
>If you are applying Brown's coefficient pruning scheme, a larger >transform and smaller MINVAL would allow more accurate calculation at >the cost of greater processing load.
Thanks for clarification. Indeed larger number of FFT points, will result in a better frequency resolution (at a corresponding processing cost), so it seems to be just a trade off. Your comments helped a lot. Regards
On Dec 25, 2:14&#2013266080;pm, "December16" <december16@n_o_s_p_a_m.mail.ru>
wrote:
> Hi everybody, > I got stuck with a simple DSP problem, and I'm not able to find a any > resource that might help me. I was reading a well known article (Content > Fingerprinting Using Wavelets, M.Covell, S.Baluja), in which there is a > small description of audio processing: > "We start our processing by converting the audio input (sampled at 5512 Hz) > into a spectrogram. We create our spectrograms using the following > settings: from the initial song, we use slices that are 371 ms long, taken > every 11.6 ms, reduced to 32 logarithmically spaced frequency bins between > 318 Hz and 2 kHz.". > As far as I understand the procedure is as follows: > a) Cut 371 ms long snippets (equal to 2048 samples on 5512 Hz frequency > rate). > b) Apply window function (Hanning window is used in their studies), with > the same length = 2048. > c) Apply Fast Fourier Transform &#2013266080;on the signal (FFT with N = 2048). > d) As I understand, after FFT we get 1027 linearly spaced frequency bins > (corresponding to 0 - 2756 Hz). > At this point I do not understand how can you reduce 1027 bins to 32 > logarithmically spaced bins between 318 - 2000 Hz. What kind of procedure > is used? Do they sum the frequency bins using a custom logarithm base? And > what kind of transformation they use in order to obtain the spectrogram > (Magnitude, Squared Magnitude, 20log10 rule)? > Any help will be greatly appreciated!
You don't have to use an FFT. It is perfectly acceptable to simply do the Fourier Transform directly, for the frequencies you need. That is, you calculate the frequency using the FT rather than the FFT. If this was a Q-Transform (which also yields logarithmically spaced frequency bins, but with the proviso that each bin's width is proportional to its centre frequency, then you would also need to adapt the window for each bin. However, the article you cite does not seem to do this, instead using a fixed window length, so is just an FFT-based approach to re-sampling the frequency output of an FFT into logarithmic bins - also perfectly reasonable. Chris =========================== Chris Bore BORES Signal Processing www.bores.com
On Dec 29, 12:31&#2013266080;pm, Chris Bore <chris.b...@gmail.com> wrote:
> ...
> If this was a Q-Transform (which also yields logarithmically spaced > frequency bins, but with the proviso that each bin's width is > proportional to its centre frequency, then you would also need to > adapt the window for each bin. However, the article you cite does not > seem to do this, instead using a fixed window length, so is just an > FFT-based approach to re-sampling the frequency output of an FFT into > logarithmic bins - also perfectly reasonable. > > Chris
It is not clear to which of the articles cited in this thread you are referring. The OP's originally referenced Covell and Baluja paper does not provide enough detail to verify whether if the filters implemented are constant-Q. The Brown and Puckette JASA paper I referenced illustrates the scaled kernels used to achieve constant-Q in Figure 1. These kernels are transformed to generate the frequency domain coefficients used to generate the constant-Q outputs. The OP's second reference is to a url where the window is properly scaled for constant-Q as well. Dale B. Dalrymple
On Dec 29, 9:31&#2013266080;pm, dbd <d...@ieee.org> wrote:
> On Dec 29, 12:31&#2013266080;pm, Chris Bore <chris.b...@gmail.com> wrote: > > > ... > > If this was a Q-Transform (which also yields logarithmically spaced > > frequency bins, but with the proviso that each bin's width is > > proportional to its centre frequency, then you would also need to > > adapt the window for each bin. However, the article you cite does not > > seem to do this, instead using a fixed window length, so is just an > > FFT-based approach to re-sampling the frequency output of an FFT into > > logarithmic bins - also perfectly reasonable. > > > Chris > > It is not clear to which of the articles cited in this thread you are > referring. > > The OP's originally referenced Covell and Baluja paper does not > provide enough detail to verify whether if the filters implemented are > constant-Q. The Brown and Puckette JASA paper I referenced illustrates > the scaled kernels used to achieve constant-Q in Figure 1. These > kernels are transformed to generate the frequency domain coefficients > used to generate the constant-Q outputs. The OP's second reference is > to a url where the window is properly scaled for constant-Q as well. > > Dale B. Dalrymple
Thanks for clarifying. I meant the OP's reference to Covell and Baluja. I took the fact that it did not specify scaled kernels as raising a suspicion they did not use them. For the OP I suppose it is important to determine what they actually do mean, and as you say that is not clear. Chris =================== Chris Bore BORES Signal Processing www.bores.com
>Thanks for clarifying. I meant the OP's reference to Covell and >Baluja. I took the fact that it did not specify scaled kernels as >raising a suspicion they did not use them. For the OP I suppose it is >important to determine what they actually do mean, and as you say that >is not clear. > >Chris
Thanks a lot for your comments, The initial article (Covell, Baluja) does not specify anything about scaled kernels. The only justification related to logarithm is as follows: "The use of logarithmical spacing in frequency was selected based for simplicity, since the detailed band-edge locations are unlikely to have a strong effect under such coarse sampling (only 32 samples across frequency)." After Dale suggested a constant Q transform, I assumed that Google's researchers indeed use it (I have no idea whether it is true or not, at least there is no reference in their work to constant Q transformation).
>However, the article you cite does not >seem to do this, instead using a fixed window length, so is just an >FFT-based approach to re-sampling the frequency output of an FFT into >logarithmic bins - also perfectly reasonable.
Would you be so kind in giving me a reference how re-sampling the frequency of an FFT into logarithmic bins is done. It would be of a great help. What kind of FFT-based approach you mean? Thanks in advance, Kind regards, Sergiu