DSPRelated.com
Forums

Help with ideas for alternatives to the Fourier transform

Started by inhahe June 1, 2009
Hi, I am looking for an alternative solution to FFT to get a real-time
spectral analysis of a signal. If I understand it correctly, there are some
problems incurred by using FFT because it can only be done on blocks of data
at a time.

1) if I wanted to analyze at least down to 20hz, my block sizes would have
to be at least 1/20 seconds long, which means there would be a
50-millisecond+ delay between the input signal and the frequency data
output, which is way beyond acceptable response time for a delay being
imperceptible by humans. I cannot just buffer the audio output to sync it
up with the FFT output because I'll also want to do this to real-time audio
input from a mic, midi, etc.

2) at that rate, frequency data would only be updated 20 times a second,
which means that the higher-resolution frequencies get much coarser
time-granularity than could be possible. For example, at 22050 hz, I could
theoretically get an amplitude 22050 times a second, but instead I must
settle for 20 times a second. And if I increase the time-granularity then I
raise the minimum frequency I can analyze.

3) If I were to do a manipulation the signal in the frequency domain and
then convert back to the time domain, and I have to do that in blocks, I'm
not sure but it seems likely that would create pops in between each block of
converted sound that would then have to be eliminated?

So I have some ideas for doing a more real-time frequency analysis, but I
don't know the maths needed to work these out.

One idea is to do a physical simulation on an array of somewhere between 20
and 22030 oscillators each at a different frequency. For each input
sample, I treat that as a push on each oscillator of a force equal to the
sample's value. According to http://en.wikipedia.org/wiki/Damping the
damping equation is mx''+cx'+kx=0, so I guess I would be adding the sample
value to x' (since the first derivative is speed, and the speed would
increase/decrease in proportion to the force applied, and this would be done
in inverse proportion to m of course) and then solving for the x and x' to
be used in the next cycle, etc. That's the next x and x' after 1/y seconds
where y is my sampling frequency. The problem here is I don't know calculus
and I can't solve for that. Also, I'd need to be able to derive my
amplitude based on x and x' for each cycle so that I can get anything useful
out of it.

Also I don't know how I need to set my damping coefficient, spring constant,
and mass to properly emulate the oscillator's frequency, the input range,
the sampling frequency, and the oscillator's range of sensitivity so that it
doesn't overlap or underlap its neighboring frequencies. I figure too much
damping and it won't retain enough vibration from one sample to the next,
too little damping and given the wrong input the amplitude could keep
increasing indefinitely, and maybe it would be too slow to respond to
changing input also.

Also I worry about how slow it would be to respond to a change in phase of
the same frequency, or I don't even know if if an ideal oscillator can
accomodate a changing phase at all, and if not, what adjustments have to be
made to allow for that.
My other idea is to perform convolution on the input signal and a single
wave (or half-wave?) for each frequency I'm trying to get the amplitude of,
because I've read that convolving a signal with something else determines
how well the signal matches that something else. So to get an amplitude at
1000 hz, if my sampling frequency is 44100 hz, I'd continually convolve the
last 44.1 samples with a 44.1-sample sine wave that does one cycle.

That's a lot of work, though. That's ~44 calculations just for one sample,
and that's only to extract one single frequency. And at 20 hz that would
require 2205 calculations per sample. So I was thinking, can I reduce the
number of calculations needed, by down-sampling the audio input and the sine
wave before convolving? But then the question is, exactly how much can I
down-sample without losing essential information for any given frequency?
(I'm assuming this would depend on the frequency I'm convolving by). What's
the formula for this?

Another question, which applies to both of those two methods, is exactly how
many frequency bands I would need to extract in order to do manipulation on
the frequency domain, convert back to the time domain, and not lose any
sound quality. A fourier transform, in contrast, would give me 2205 bands
for a 44100-hz input doing a minimum frequency of 20 hz (therefore
2205-sample blocks), but A) in that case you're only updating, e.g., your
22050-hz band 20 times a second, as opposed to something much more
instantaneous as in my above methods, and B) I think I've read that mp3
compression reduces everything to a mere 20 bands, and it works fine. How
do I know how many bands I need to emulate? Do I convolve by a full wave or
a half wave? And for some of the above formulae, is it too idealistic to
expect that total bytes of spectrum data output should equal total bytes of
audio input when everything is done The Right Way?

I hope my text is coherent.. I'm a novice at this DSP stuff. If not I could
elaborate on some things.

Thanks for any help.