DSPRelated.com
Forums

Processing incoming buffers using DFT

Started by jungledmnc June 13, 2008
Hi there,

I'm working with audio signal, hence let's say that I'm handling buffers
of size 4096 one by one. For each buffer I perform RDFT, modify some
coefficients in the frequency domain and do IRDFT. So far so good. But the
problem is, that any change in frequency domain changes the first and last
samples (obviously :-)). Next processed buffer might then have very
different first sample, than is the last sample of the previous buffer. 
Well, so this leads to incontinuities between buffers. In audio signal
this means clicks and plops... Too bad..

I solve it by moving the processing window by fftsize/2 and cross-fade the
result : process buffer B0, send the first half to the output, remember the
second half, process buffer B1 (with source data moved by fftsize/2),
cross-fade the first part of B1 with the pending second half of B0 and so
on...

Surprisingly this works, however it seems to me quite awkward. I'm also
worried about phase cancelation problems. Please someone help and tell me,
how is this totally basic problem solved.
Thanks! :-)



On Jun 13, 8:42 am, "jungledmnc" <jungled...@gmail.com> wrote:
> Hi there, > > I'm working with audio signal, hence let's say that I'm handling buffers > of size 4096 one by one. For each buffer I perform RDFT, modify some > coefficients in the frequency domain and do IRDFT. So far so good. But the > problem is, that any change in frequency domain changes the first and last > samples (obviously :-)). Next processed buffer might then have very > different first sample, than is the last sample of the previous buffer. > Well, so this leads to incontinuities between buffers. In audio signal > this means clicks and plops... Too bad.. > > I solve it by moving the processing window by fftsize/2 and cross-fade the > result : process buffer B0, send the first half to the output, remember the > second half, process buffer B1 (with source data moved by fftsize/2), > cross-fade the first part of B1 with the pending second half of B0 and so > on... > > Surprisingly this works, however it seems to me quite awkward. I'm also > worried about phase cancelation problems. Please someone help and tell me, > how is this totally basic problem solved. > Thanks! :-)
The computer music community has a large volume of work on a variety of approaches, which vary depending on what characteristics you are modifying i.e. amplitude, frequency, line width. A search on overlap- add synthesis will get you a broad menu to choose from. Simple amplitude changes are performed by choosing windows for the overlapped time domain data segments that you are adding. Dale B. Dalrymple http://dbdimages.com
>The computer music community has a large volume of work on a variety >of approaches, which vary depending on what characteristics you are >modifying i.e. amplitude, frequency, line width. A search on overlap- >add synthesis will get you a broad menu to choose from. >
So there is no general way? I thought, that there is a general algorithm, how to process the buffers, so you can modify the frequency content as you wish.
> Simple amplitude changes are performed by choosing windows for the
overlapped time domain data segments that you are adding. I probably don't get this. Can you please describe it a little bit more? Thanks.
On Jun 13, 11:42 am, "jungledmnc" <jungled...@gmail.com> wrote:
> Hi there, > > I'm working with audio signal, hence let's say that I'm handling buffers > of size 4096 one by one. For each buffer I perform RDFT, modify some > coefficients in the frequency domain and do IRDFT. So far so good. But the > problem is, that any change in frequency domain changes the first and last > samples (obviously :-)). Next processed buffer might then have very > different first sample, than is the last sample of the previous buffer. > Well, so this leads to incontinuities between buffers. In audio signal > this means clicks and plops... Too bad.. > > I solve it by moving the processing window by fftsize/2 and cross-fade the > result : process buffer B0, send the first half to the output, remember the > second half, process buffer B1 (with source data moved by fftsize/2), > cross-fade the first part of B1 with the pending second half of B0 and so > on... > > Surprisingly this works, however it seems to me quite awkward. I'm also > worried about phase cancelation problems. Please someone help and tell me, > how is this totally basic problem solved. > Thanks! :-)
By modifying the frequency-domain content, do you mean you're multiplying it by a mask? If so, that's the same as linear filtering, and what you're describing sounds like the overlap-add (OLA) technique, a common fast convolution approach. It does work, and is in fact equivalent to doing the filtering in the time domain. If you're doing some other type of modification that doesn't map to a linear filtering operation, then it's less obvious why/if it would work. Like dbd pointed out, though, there are lots of tricks in the audio processing domain that might be useful. Jason
On Jun 13, 12:28 pm, "jungledmnc" <jungled...@gmail.com> wrote:
> >The computer music community has a large volume of work on a variety > >of approaches, which vary depending on what characteristics you are > >modifying i.e. amplitude, frequency, line width. A search on overlap- > >add synthesis will get you a broad menu to choose from. > > > So there is no general way? I thought, that there is a general algorithm, > how to process the buffers, so you can modify the frequency content as you > wish. > > > Simple amplitude changes are performed by choosing windows for the > > overlapped time domain data segments that you are adding. > > I probably don't get this. Can you please describe it a little bit more? > Thanks.
Like many things in dsp, the right thing to do in a process depends on the nature of the signals. Are your signals 1) broadband noise 2) narrowband noise 3) constant frequency, constant amplitude tones 4) varying frequency, constant amplitude tones 5) constant frequency, varying amplitude tones 6) varying frequency, varying amplitude tones? 7) impulsive 8) transient Are the variations linear or non-linear? Different answers are better for different cases. Models are available in the literature for dealing with all of the above. If you don't care about accuracy, assume all energy is in constant frequency, constant amplitude tones. You'll generally be wrong, but it will be 'general'. Dale B. Dalrymple
I think this is what you are looking for.  
Steve


http://www.dspguide.com/ch18.htm
Thanks. The overlap-add method really is useful, but I don't think, this is
good enough for this case. The problem is, that I want to experiment with
the frequency domain :-), hence no simple multiplication or something.

When using FFT for convolution, you can assume, that the output signal
reaches zero after M points behind the window (let M be the convolution
signal length). Or something like that :). 

But imagine for example a FFT based synthesizer - you do not generate the
signal in time domain, but in the frequency domain. Somehow. I don't know,
if it is worth the work, but I'm too curious :-). Anyway this is just an
example, which leads to teoretically infinite signal with possible
discontinuities after each block.

Any ideas on that one?
jungledmnc wrote:

> But imagine for example a FFT based synthesizer - you do not generate the > signal in time domain, but in the frequency domain. Somehow. I don't know, > if it is worth the work, but I'm too curious :-).
Look up the "FFT-1" algorithm from IRCAM. They considered it worth the work, because they patented it. I have never tried it (buying IRCAM software is not for the faint-hearted), and have heard mixed opinions on just how fast it is. Anyway this is just an
> example, which leads to teoretically infinite signal with possible > discontinuities after each block. > > Any ideas on that one?
Sounds like you are after "musical" processes and transformations. In which case, the standard recommendation is the phase vocoder. There is a real-time form of it implemented in Csound, plus lots of opcodes for messing with amplitudes and frequencies ad lib. In the name is the clue - the phase vocoder keeps track (internally, usually) of the phases of all bins from one frame to the next, so everything joins up cleanly. In some formulations resynthesis of phase vocoder analysis data is by oscillator bank, rather than by FFT. There is good documentation and discussion of the phase vocoder on the dspdimension website, and source code there and many other places. Richard Dobson
>... But imagine for example a FFT based synthesizer - you do not generate
the
>signal in time domain, but in the frequency domain. Somehow. I don't
know,
>if it is worth the work, but I'm too curious :-). Anyway this is just an >example, which leads to teoretically infinite signal with possible >discontinuities after each block.
Here is one thought. Design the frequency domain as you want to, and then convert your design into a frequency spectrum that will fit within one period of the time domain. If you don&rsquo;t like the converted frequency spectrum, modify the design and repeat the process. For instance, consider an infinitely long signal, x[n]. To extract a section of it, we can multiply it by a window, such as a rectangle, triangle, or Hamming. In the frequency domain this corresponds to convolving the frequency spectrum of the signal with the Fourier Transform of the window. Say you want to use a triangle window, so that adjacent segments can be overlapped by 50% to form a smooth transition. Design the frequency spectrum as you like. Then convolve your frequency spectrum with sinc^2 (the FT of the triangle wave). See if you like it, and if not, try again. This can be thought of a "this is what I want" -- "This is what I can have" iteration. Regards, Steve
>jungledmnc wrote: > >> But imagine for example a FFT based synthesizer - you do not generate
the
>> signal in time domain, but in the frequency domain. Somehow. I don't
know,
>> if it is worth the work, but I'm too curious :-). > >Look up the "FFT-1" algorithm from IRCAM. They considered it worth the >work, because they patented it. I have never tried it (buying IRCAM >software is not for the faint-hearted), and have heard mixed opinions on
>just how fast it is. > > >Anyway this is just an >> example, which leads to teoretically infinite signal with possible >> discontinuities after each block. >> >> Any ideas on that one? > >Sounds like you are after "musical" processes and transformations. In >which case, the standard recommendation is the phase vocoder. There is a
>real-time form of it implemented in Csound, plus lots of opcodes for >messing with amplitudes and frequencies ad lib. In the name is the clue >- the phase vocoder keeps track (internally, usually) of the phases of >all bins from one frame to the next, so everything joins up cleanly. In >some formulations resynthesis of phase vocoder analysis data is by >oscillator bank, rather than by FFT. There is good documentation and >discussion of the phase vocoder on the dspdimension website, and source >code there and many other places. > >Richard Dobson >
Heh, thanks. I'll definitely look for it. Unfortunately I don't know much about vocoders. Do you know about some kind of ebook? Printable if possible.