Technical discussions related to Audio Signal Processing (digital effects, acoustics, noise reduction, musical signal processing, etc).
Hi, I am looking for an alternative solution to FFT to get a real-time spectral analysis of a signal. If I understand it correctly, there are some problems incurred by using FFT because it can only be done on blocks of data at a time. 1) if I wanted to analyze at least down to 20hz, my block sizes would have to be at least 1/20 seconds long, which means there would be a 50-millisecond+ delay between the input signal and the frequency data output, which is way beyond acceptable response time for a delay being imperceptible by humans. I cannot just buffer the audio output to sync it up with the FFT output because I'll also want to do this to real-time audio input from a mic, midi, etc. 2) at that rate, frequency data would only be updated 20 times a second, which means that the higher-resolution frequencies get much coarser time-granularity than could be possible. For example, at 22050 hz, I could theoretically get an amplitude 22050 times a second, but instead I must settle for 20 times a second. And if I increase the time-granularity then I raise the minimum frequency I can analyze. 3) If I were to do a manipulation the signal in the frequency domain and then convert back to the time domain, and I have to do that in blocks, I'm not sure but it seems likely that would create pops in between each block of converted sound that would then have to be eliminated? So I have some ideas for doing a more real-time frequency analysis, but I don't know the maths needed to work these out. One idea is to do a physical simulation on an array of somewhere between 20 and 22030 oscillators each at a different frequency. For each input sample, I treat that as a push on each oscillator of a force equal to the sample's value. According to http://en.wikipedia.org/wiki/Damping the damping equation is mx''+cx'+kx=0, so I guess I would be adding the sample value to x' (since the first derivative is speed, and the speed would increase/decrease in proportion to the force applied, and this would be done in inverse proportion to m of course) and then solving for the x and x' to be used in the next cycle, etc. That's the next x and x' after 1/y seconds where y is my sampling frequency. The problem here is I don't know calculus and I can't solve for that. Also, I'd need to be able to derive my amplitude based on x and x' for each cycle so that I can get anything useful out of it. Also I don't know how I need to set my damping coefficient, spring constant, and mass to properly emulate the oscillator's frequency, the input range, the sampling frequency, and the oscillator's range of sensitivity so that it doesn't overlap or underlap its neighboring frequencies. I figure too much damping and it won't retain enough vibration from one sample to the next, too little damping and given the wrong input the amplitude could keep increasing indefinitely, and maybe it would be too slow to respond to changing input also. Also I worry about how slow it would be to respond to a change in phase of the same frequency, or I don't even know if if an ideal oscillator can accomodate a changing phase at all, and if not, what adjustments have to be made to allow for that. My other idea is to perform convolution on the input signal and a single wave (or half-wave?) for each frequency I'm trying to get the amplitude of, because I've read that convolving a signal with something else determines how well the signal matches that something else. So to get an amplitude at 1000 hz, if my sampling frequency is 44100 hz, I'd continually convolve the last 44.1 samples with a 44.1-sample sine wave that does one cycle. That's a lot of work, though. That's ~44 calculations just for one sample, and that's only to extract one single frequency. And at 20 hz that would require 2205 calculations per sample. So I was thinking, can I reduce the number of calculations needed, by down-sampling the audio input and the sine wave before convolving? But then the question is, exactly how much can I down-sample without losing essential information for any given frequency? (I'm assuming this would depend on the frequency I'm convolving by). What's the formula for this? Another question, which applies to both of those two methods, is exactly how many frequency bands I would need to extract in order to do manipulation on the frequency domain, convert back to the time domain, and not lose any sound quality. A fourier transform, in contrast, would give me 2205 bands for a 44100-hz input doing a minimum frequency of 20 hz (therefore 2205-sample blocks), but A) in that case you're only updating, e.g., your 22050-hz band 20 times a second, as opposed to something much more instantaneous as in my above methods, and B) I think I've read that mp3 compression reduces everything to a mere 20 bands, and it works fine. How do I know how many bands I need to emulate? Do I convolve by a full wave or a half wave? And for some of the above formulae, is it too idealistic to expect that total bytes of spectrum data output should equal total bytes of audio input when everything is done The Right Way? I hope my text is coherent.. I'm a novice at this DSP stuff. If not I could elaborate on some things. Thanks for any help.______________________________