DSPRelated.com
Forums

FFT re-scaling or interpolation

Started by Evan Olcott June 13, 2005
Hello everyone.

Have an interesting problem. I have an FFT of an audio signal and I 
want to interpolate the bins into another frequency set... instead of a 
linear set of frequencies, I'd like the frequencies I get (in the end) 
to be based on an exponential/logarithmic formula, based on the twelfth 
root of 2...

What I'd like to do is take the FFT results and re-interpolate (or 
re-scale, I'm not sure what the proper term is for this) to my new 
(more useful) scale and use those results. At the low end of the 
spectrum, my resulting bins may be only fractions of FFT bins, whereas 
at the top of the spectrum they are combinations of many FFT bins (as 
well as fractions of neighboring bins)..

for frequency bins f, for sampling rate f0:

FFT scale: f(n) = f0/N * n  :  N = total bins (4096 in my case), n = 
whole number (< N/2)
My scale: f(n) = 20 < 440 * pow((12th root of 2), n) < f0/2  :  n = 
&#4294967295;whole number

The FFT scale ends up with 2048 bins, and my scale should be 120 bins.

Is this best calculated via a set of tables, or is there a calculation 
(in C, hopefully) that would solve this better? Do I need to concern 
myself with the "bandwidths" of each bin in order to calculate the 
interpolation properly? Either way, it's over my head right off the bat 
- I'm not sure how to approach this!

Thanks for any help/advice!

Ev
Technical Knowledge Officer
Head Programmer/Designer
Audiofile Engineering

http://www.audiofile-engineering.com/

"Evan Olcott" <eolcott@triplo.com> wrote in message 
news:2005061310050716807%eolcott@triplocom...
> Hello everyone. > > Have an interesting problem. I have an FFT of an audio > signal and I want to interpolate the bins into another > frequency set... instead of a linear set of frequencies, > I'd like the frequencies I get (in the end) to be based on > an exponential/logarithmic formula, based on the twelfth > root of 2... > > What I'd like to do is take the FFT results and > re-interpolate (or re-scale, I'm not sure what the proper > term is for this) to my new (more useful) scale and use > those results.
[snip]
> - I'm not sure how to approach this! >
Look into the Goertzel Algorithm. If this doesn't do exactly what you want, there may be alternatives. Here is one web reference. I don't know how useful it might be to you. http://citeseer.ist.psu.edu/651574.html Here is another reference, but it's probably useless since I very much doubt that the book is available today: Gold, Bernard and Rader, Charles M., Digital Processing of Signals, (c) 1969, McGraw Hill, New York, Page 171 and following.
"Evan Olcott" <eolcott@triplo.com> wrote in message 
news:2005061310050716807%eolcott@triplocom...
> Hello everyone. > > Have an interesting problem. I have an FFT of an audio signal and I want > to interpolate the bins into another frequency set... instead of a linear > set of frequencies, I'd like the frequencies I get (in the end) to be > based on an exponential/logarithmic formula, based on the twelfth root of > 2...
Since you're asking for semitones, I'm guessing that you want to track musical notes. I'm going to skip ahead of the part where everyone asks what kind of window you're using, and tells you about the trade-offs between time and frequency resolution and why you can't effectively do what you think you want to do. Instead, if you actually do want to track musical notes, then I suggest you look at the paper called "A high resolution fundamental frequency determination based on phase changes in the Fourier transform", which describes what I believe is the standard way to accomplish this. You can get it from Judith Brown's home page here: (there's also a link to the djvu plugin which you will have to install to see it). http://www.wellesley.edu/Physics/brown/jbrown.html -- Matt
Evan Olcott wrote:

> Hello everyone. > > Have an interesting problem. I have an FFT of an audio signal and I > want to interpolate the bins into another frequency set... instead of a > linear set of frequencies, I'd like the frequencies I get (in the end) > to be based on an exponential/logarithmic formula, based on the twelfth > root of 2... > > What I'd like to do is take the FFT results and re-interpolate (or > re-scale, I'm not sure what the proper term is for this) to my new > (more useful) scale and use those results. At the low end of the > spectrum, my resulting bins may be only fractions of FFT bins, whereas > at the top of the spectrum they are combinations of many FFT bins (as > well as fractions of neighboring bins)..
You may have a look into this function. It converts bins scaled in linear frequency into bins scaled in logarithmic frequency. The bins are summed up to represent the energy, also the maximum peak is stored. It can handle all cases (less than one input bin = 1 output bin, one inoput bin = 1 output bin, many input bins = 1 output bin). I reduced the code to its bare minimum, so this version has not yet been tested. Also this is not DSP code, just plain C++. The original code is covered by the GPL, but since I'm the author and since this is just standard stuff you may take this version for free: data: the input bins insize: number of input bins fMax: highed frequency of input bins avout: array to store the summed up bins pkout: array to store the peak value bins outsize: number of bins in avout and pkout f1: start of desired frequency range f2: end of desired frequency range void processLogXScale(const double* data,unsigned int insize, double fMax, double* avout, double* pkout, unsigned int outsize double f1, double f2) { const double fin_inc = fMax/(double)(insize - 1); double fout = f1; const double fout_inc = pow(f2/f1, 1.0 / (outsize - 1.0)); const double flim_inc = pow(f2/f1, 0.5 / (outsize - 1.0)); for (unsigned int out = 0; out < outsize; ++out, fout *= fout_inc) { // determine frequency span of this output frequency: double fs1 = fout / flim_inc; double fs2 = fout * flim_inc; if (fs1 < 0) fs1 = 0; if (fs2 > fMax) fs2 = fMax; // determine index span (float) of this output frequency: double fi1 = fs1 / fin_inc; double fi2 = fs2 / fin_inc; // determine first and last index: int bi1 = (int)(fi1 + 0.5); int bi2 = (int)(fi2 + 0.5); if (bi1 < 0) bi1 = 0; if (bi2 >= (int)insize) bi1 = insize - 1; // determine proportional parts of first and last index: double pi1; // proportional part of bi1 double pi2; // proportional part of bi2 if (bi1 == bi2) { pi1 = 0.5 * fabs(fi2 - fi1); pi2 = pi1; } else { pi1 = (double)bi1 + 0.5 - fi1; pi2 = fabs(fi2 - (double)bi2 + 0.5); } // sum up over all indices: double max = data[bi1] > data[bi2] ? data[bi1] : data[bi2]; double sum = pi1 * data[bi1] + pi2 * data[bi2]; double cnt = pi1 + pi2; for (int i = bi1+1; i < bi2; ++i) { sum += data[i]; if (data[i] > max) max = data[i]; ++cnt; } // I do not really understand why we must divide the sum // by sqrt(cnt), but if we don't, then a pink noise spectrum // rises by 10 dB per octave. if (avout) avout[out] = sum / sqrt(cnt); if (pkout) pkout[out] = max; } } // end of processLogXScale() Bye Andreas -- Andreas H&#4294967295;nnebeck | email: ah@despammed.com ----- privat ---- | www : http://www.huennebeck-online.de Fax/Anrufbeantworter: 0721/151-284301 GPG-Key: http://www.huennebeck-online.de/public_keys/andreas.asc
In article <igsre.5859$yU.287575@news20.bellglobal.com>,
Matt Timmermans <mt0000@sympatico.nospam-remove.ca> wrote:
> >"Evan Olcott" <eolcott@triplo.com> wrote in message >news:2005061310050716807%eolcott@triplocom... >> Hello everyone. >> >> Have an interesting problem. I have an FFT of an audio signal and I want >> to interpolate the bins into another frequency set... instead of a linear >> set of frequencies, I'd like the frequencies I get (in the end) to be >> based on an exponential/logarithmic formula, based on the twelfth root of >> 2... > >Since you're asking for semitones, I'm guessing that you want to track >musical notes. I'm going to skip ahead of the part where everyone asks what >kind of window you're using, and tells you about the trade-offs between time >and frequency resolution and why you can't effectively do what you think you >want to do. > >Instead, if you actually do want to track musical notes, then I suggest you >look at the paper called "A high resolution fundamental frequency >determination based on phase changes in the Fourier transform", which >describes what I believe is the standard way to accomplish this. > >You can get it from Judith Brown's home page here: (there's also a link to >the djvu plugin which you will have to install to see it). > >http://www.wellesley.edu/Physics/brown/jbrown.html
Never heard of djvu, and it only seems to be supported on a tiny number of types of browsers and OS platforms. Is the above phase change frequency determination method similar to the analysis front-end of a phase vocoder using successive FFT frames? In my experience, phase vocoder techniques seem to work well even in the presence of some noise, as long as that noise doesn't include tones closely spaced to the fundamental of interest. Note that the strongest frequency present in an FFT (or vocoder analysis) may not be the musical note being played if the tone consists of a diminished or even missing fundamental frequency content compared to the harmonic energy. IMHO. YMMV. -- Ron Nicholson rhn AT nicholson DOT com http://www.nicholson.com/rhn/ #include <canonical.disclaimer> // only my own opinions, etc.
Ronald H. Nicholson Jr. wrote:

> In article <igsre.5859$yU.287575@news20.bellglobal.com>, > Matt Timmermans <mt0000@sympatico.nospam-remove.ca> wrote: >> >>You can get it from Judith Brown's home page here: (there's also a link >>to the djvu plugin which you will have to install to see it). >> >>http://www.wellesley.edu/Physics/brown/jbrown.html > > Never heard of djvu,
Neither did I.
> and it only seems to be supported on a tiny number > of types of browsers and OS platforms.
I don't think so. Binaries exist for Windows and MacOS, and source code is available for UNIX style OS. The binary plugin for RedHat 7.3 works fine with Netscape, Mozilla and Konqueror. bye Andreas -- Andreas H&#4294967295;nnebeck | email: ah@despammed.com ----- privat ---- | www : http://www.huennebeck-online.de Fax/Anrufbeantworter: 0721/151-284301 GPG-Key: http://www.huennebeck-online.de/public_keys/andreas.asc
Matt Timmermans wrote:
...
> Instead, if you actually do want to track musical notes, then I suggest you > look at the paper called "A high resolution fundamental frequency > determination based on phase changes in the Fourier transform", which > describes what I believe is the standard way to accomplish this. > > You can get it from Judith Brown's home page here: (there's also a link to > the djvu plugin which you will have to install to see it). > > http://www.wellesley.edu/Physics/brown/jbrown.html
Matt, thanks for the link! I wasn't aware of Judith Brown's paper "Calculation of a constant Q spectral transform". In it she describes a transform similar to the DFT, albeit at logarithmically spaced frequencies. This transform consists of taking the inner product of the signal vector with frequency vectors, but instead as in the DFT where each frequency vector is equal in length (and the k-th vector contains k cycles of the frequency it represents), the logarithmically spaced vectors are adjusted in length to contain a constant multiple of cycles of the frequency they represent. This is also the reason why it is not, in contrast to the DFT, invertible as not all input values are used to compute an output value, making it unsuited for frequency domain processing. I'm wondering, is this equivalent to applying Goertzel filters with the k's logarithmically spaced, where each filter's N is actually a function of k, N(k+1) = c^k N(k) (and c < 1)?. My second question is: in this paper she mentions the Mellin transform, another constant Q transform, which has an inverse and therefore is suited for frequency domain processing. Without having to dig out the reference, can anybody comment on the essence of this transform? Regards, Andor
Hi Andor,

"Andor" <an2or@mailcircuit.com> wrote in message 
news:1118904008.586422.316930@z14g2000cwz.googlegroups.com...
> [...] "Calculation of a constant Q spectral transform". [...] > I'm wondering, is this equivalent to applying Goertzel filters with the > k's logarithmically spaced, where each filter's N is actually a > function of k, N(k+1) = c^k N(k) (and c < 1)?.
You really want to use a good window. All the magic is in the windows and varying window size. It's easier to understand if you think of the windows as low-pass filters, which are modulated by the center frequency to make Hilbert-transform pairs of bandpass filters at the frequency of interest. Changing the window size changes the bandwidth of the low-pass prototype. It is this that makes the filter bank constant-Q.
> My second question is: in this paper she mentions the Mellin transform, > another constant Q transform, which has an inverse and therefore is > suited for frequency domain processing. Without having to dig out the > reference, can anybody comment on the essence of this transform?
The Mellin transform is like a constant Q transform with infinite support. It gives you logarithmic frequency spacing, but doesn't use a window. -- Matt
Djvu is worth having. It goves much better access to the online version
of The New Century* Dictionary than does HTML. The NCD is next-best to
the OED, and sometimes better, Illustrated.

Jerry
_______________________________________
* That's 1900, but then my copy of the OED is prety old too.

"Ronald H. Nicholson Jr." <rhn@mauve.rahul.net> wrote in message 
news:d8od70$6rb$1@blue.rahul.net...
> Is the above phase change frequency determination method similar to > the analysis front-end of a phase vocoder using successive FFT frames?
The first part of their process is to calculate a constant Q transform with semitone or quartertone spacing, which they cleverly accellerate using an FFT. Then they do phase analysis for finer resolution. This part isn't completely dissimilar to the method you mention, but the way they do it is better.
> Note that the strongest frequency present in an FFT (or vocoder analysis) > may not be the musical note being played if the tone consists of a > diminished or even missing fundamental frequency content compared to > the harmonic energy.
They add a good deal of cleverness here, too. Since they use a constant Q transform, they get log-frequency output. In log-frequency, a sum of harmonic frequencies (f, 2f, 3f, 4f, etc.) shows up as the same pattern, modulo a shift, regardless of f. They cross-correlate the log-frequency spectrum with an "ideal" harmonic pattern to pick the fundamental. -- Matt