Hi everybody, I got stuck with a simple DSP problem, and I'm not able to find a any resource that might help me. I was reading a well known article (Content Fingerprinting Using Wavelets, M.Covell, S.Baluja), in which there is a small description of audio processing: "We start our processing by converting the audio input (sampled at 5512 Hz) into a spectrogram. We create our spectrograms using the following settings: from the initial song, we use slices that are 371 ms long, taken every 11.6 ms, reduced to 32 logarithmically spaced frequency bins between 318 Hz and 2 kHz.". As far as I understand the procedure is as follows: a) Cut 371 ms long snippets (equal to 2048 samples on 5512 Hz frequency rate). b) Apply window function (Hanning window is used in their studies), with the same length = 2048. c) Apply Fast Fourier Transform on the signal (FFT with N = 2048). d) As I understand, after FFT we get 1027 linearly spaced frequency bins (corresponding to 0 - 2756 Hz). At this point I do not understand how can you reduce 1027 bins to 32 logarithmically spaced bins between 318 - 2000 Hz. What kind of procedure is used? Do they sum the frequency bins using a custom logarithm base? And what kind of transformation they use in order to obtain the spectrogram (Magnitude, Squared Magnitude, 20log10 rule)? Any help will be greatly appreciated!
Logarithmically spaced bins
Started by ●December 25, 2010
Reply by ●December 25, 20102010-12-25
On Dec 25, 6:14�am, "December16" <december16@n_o_s_p_a_m.mail.ru> wrote:> ...> At this point I do not understand how can you reduce 1027 bins to 32 > logarithmically spaced bins between 318 - 2000 Hz. What kind of procedure > is used? Do they sum the frequency bins using a custom logarithm base? And > what kind of transformation they use in order to obtain the spectrogram > (Magnitude, Squared Magnitude, 20log10 rule)? > Any help will be greatly appreciated!Look for constant Q transform for one approach. A classic paper: http://www.wellesley.edu/Physics/brown/pubs/effalgV92P2698-P2701.pdf Dale B. Dalrymple
Reply by ●December 25, 20102010-12-25
On Dec 26, 3:14�am, "December16" <december16@n_o_s_p_a_m.mail.ru> wrote: I find that our refuse men don't really understand logarithmically space bins.
Reply by ●December 28, 20102010-12-28
>Look for constant Q transform for one approach. A classic paper: > >http://www.wellesley.edu/Physics/brown/pubs/effalgV92P2698-P2701.pdf > >Dale B. Dalrymple >Thank you Dale for your help, I've considered the algorithm written in the article, but there is one minor detail that I would like to clarify. As the algorithm suggests us, we shall use the following formulas in order to obtain transformation parameters: Q=1/(2^(1/bins)-1); (for 12 bins Q =~ 17) K=ceil(bins*log2(maxFreq/minFreq)); (K=32,according to the max, min freq) fftLen=2^nextpow2(ceil(Q*fs/minFreq)); (fftLen = 512) The authors of the article suggest us to use snippets of length 2048. The algorithm results in 512 FFT length. What is a preferred way of dealing with this inequality between snippet length and FFT length. Use FFT with N=512, and therefore truncate initial 2048 signal, or hard code the length of FFT N=2048 in the algorithm?
Reply by ●December 28, 20102010-12-28
On Dec 28, 12:04�am, "December16" <december16@n_o_s_p_a_m.mail.ru> wrote:> ...> K=ceil(bins*log2(maxFreq/minFreq)); (K=32,according to the max, min freq) > fftLen=2^nextpow2(ceil(Q*fs/minFreq)); (fftLen = 512)These don't seem to come from the content of either article. How can anyone comment on them?> > The authors of the article suggest us to use snippets of length 2048. The > algorithm results in 512 FFT length. What is a preferred way of dealing > with this inequality between snippet length and FFT length. Use FFT with > N=512, and therefore truncate initial 2048 signal, or hard code the length > of FFT N=2048 in the algorithm?If you are applying Brown's coefficient pruning scheme, a larger transform and smaller MINVAL would allow more accurate calculation at the cost of greater processing load. Take your pick. Dale B. Dalrymple
Reply by ●December 29, 20102010-12-29
Thanks Dale for your response,>> ... >These don't seem to come from the content of either article. How can >anyone comment on them?Sorry for using a non quoted example. The algorithm's code comes from the implementation of the efficient algorithm for the constant Q transform [Brown and Puckette 92] coded by Benjamin Blankertztaken (can be found here: http://wwwmath.uni-muenster.de/logik/Personen/blankertz/constQ/constQ.html )>If you are applying Brown's coefficient pruning scheme, a larger >transform and smaller MINVAL would allow more accurate calculation at >the cost of greater processing load.Thanks for clarification. Indeed larger number of FFT points, will result in a better frequency resolution (at a corresponding processing cost), so it seems to be just a trade off. Your comments helped a lot. Regards
Reply by ●December 29, 20102010-12-29
On Dec 25, 2:14�pm, "December16" <december16@n_o_s_p_a_m.mail.ru> wrote:> Hi everybody, > I got stuck with a simple DSP problem, and I'm not able to find a any > resource that might help me. I was reading a well known article (Content > Fingerprinting Using Wavelets, M.Covell, S.Baluja), in which there is a > small description of audio processing: > "We start our processing by converting the audio input (sampled at 5512 Hz) > into a spectrogram. We create our spectrograms using the following > settings: from the initial song, we use slices that are 371 ms long, taken > every 11.6 ms, reduced to 32 logarithmically spaced frequency bins between > 318 Hz and 2 kHz.". > As far as I understand the procedure is as follows: > a) Cut 371 ms long snippets (equal to 2048 samples on 5512 Hz frequency > rate). > b) Apply window function (Hanning window is used in their studies), with > the same length = 2048. > c) Apply Fast Fourier Transform �on the signal (FFT with N = 2048). > d) As I understand, after FFT we get 1027 linearly spaced frequency bins > (corresponding to 0 - 2756 Hz). > At this point I do not understand how can you reduce 1027 bins to 32 > logarithmically spaced bins between 318 - 2000 Hz. What kind of procedure > is used? Do they sum the frequency bins using a custom logarithm base? And > what kind of transformation they use in order to obtain the spectrogram > (Magnitude, Squared Magnitude, 20log10 rule)? > Any help will be greatly appreciated!You don't have to use an FFT. It is perfectly acceptable to simply do the Fourier Transform directly, for the frequencies you need. That is, you calculate the frequency using the FT rather than the FFT. If this was a Q-Transform (which also yields logarithmically spaced frequency bins, but with the proviso that each bin's width is proportional to its centre frequency, then you would also need to adapt the window for each bin. However, the article you cite does not seem to do this, instead using a fixed window length, so is just an FFT-based approach to re-sampling the frequency output of an FFT into logarithmic bins - also perfectly reasonable. Chris =========================== Chris Bore BORES Signal Processing www.bores.com
Reply by ●December 29, 20102010-12-29
On Dec 29, 12:31�pm, Chris Bore <chris.b...@gmail.com> wrote:> ...> If this was a Q-Transform (which also yields logarithmically spaced > frequency bins, but with the proviso that each bin's width is > proportional to its centre frequency, then you would also need to > adapt the window for each bin. However, the article you cite does not > seem to do this, instead using a fixed window length, so is just an > FFT-based approach to re-sampling the frequency output of an FFT into > logarithmic bins - also perfectly reasonable. > > ChrisIt is not clear to which of the articles cited in this thread you are referring. The OP's originally referenced Covell and Baluja paper does not provide enough detail to verify whether if the filters implemented are constant-Q. The Brown and Puckette JASA paper I referenced illustrates the scaled kernels used to achieve constant-Q in Figure 1. These kernels are transformed to generate the frequency domain coefficients used to generate the constant-Q outputs. The OP's second reference is to a url where the window is properly scaled for constant-Q as well. Dale B. Dalrymple
Reply by ●December 30, 20102010-12-30
On Dec 29, 9:31�pm, dbd <d...@ieee.org> wrote:> On Dec 29, 12:31�pm, Chris Bore <chris.b...@gmail.com> wrote: > > > ... > > If this was a Q-Transform (which also yields logarithmically spaced > > frequency bins, but with the proviso that each bin's width is > > proportional to its centre frequency, then you would also need to > > adapt the window for each bin. However, the article you cite does not > > seem to do this, instead using a fixed window length, so is just an > > FFT-based approach to re-sampling the frequency output of an FFT into > > logarithmic bins - also perfectly reasonable. > > > Chris > > It is not clear to which of the articles cited in this thread you are > referring. > > The OP's originally referenced Covell and Baluja paper does not > provide enough detail to verify whether if the filters implemented are > constant-Q. The Brown and Puckette JASA paper I referenced illustrates > the scaled kernels used to achieve constant-Q in Figure 1. These > kernels are transformed to generate the frequency domain coefficients > used to generate the constant-Q outputs. The OP's second reference is > to a url where the window is properly scaled for constant-Q as well. > > Dale B. DalrympleThanks for clarifying. I meant the OP's reference to Covell and Baluja. I took the fact that it did not specify scaled kernels as raising a suspicion they did not use them. For the OP I suppose it is important to determine what they actually do mean, and as you say that is not clear. Chris =================== Chris Bore BORES Signal Processing www.bores.com
Reply by ●December 30, 20102010-12-30
>Thanks for clarifying. I meant the OP's reference to Covell and >Baluja. I took the fact that it did not specify scaled kernels as >raising a suspicion they did not use them. For the OP I suppose it is >important to determine what they actually do mean, and as you say that >is not clear. > >ChrisThanks a lot for your comments, The initial article (Covell, Baluja) does not specify anything about scaled kernels. The only justification related to logarithm is as follows: "The use of logarithmical spacing in frequency was selected based for simplicity, since the detailed band-edge locations are unlikely to have a strong effect under such coarse sampling (only 32 samples across frequency)." After Dale suggested a constant Q transform, I assumed that Google's researchers indeed use it (I have no idea whether it is true or not, at least there is no reference in their work to constant Q transformation).>However, the article you cite does not >seem to do this, instead using a fixed window length, so is just an >FFT-based approach to re-sampling the frequency output of an FFT into >logarithmic bins - also perfectly reasonable.Would you be so kind in giving me a reference how re-sampling the frequency of an FFT into logarithmic bins is done. It would be of a great help. What kind of FFT-based approach you mean? Thanks in advance, Kind regards, Sergiu