DSPRelated.com
Forums

Frequency domain synthesis

Started by Ross Clement (Email address invalid - do not use) January 6, 2006
Hi everyone.

I was greatly interested in a thread a short while ago on performing
music-orientated audio synthesis in the frequency domain.

Initially I was confused about the low frequency accuracy of the
resynthesis. E.g. to keep the latency down to 1/50th of a second you'd
need a window size of 1/100th of a second. For the purpose of argument,
assume that I'm going to synthesise a single sinewave of 325hz. If I
follow the advice of one of the papers I'd create the spectrum for the
current window by finding the bin with a centre frequency closest to
325hz and assign it a complex frequency corresponding to the correct
magnitude and phase. But when I perform an inverse fft on this
spectrum, I get a sinewave at the bin's centre frequency. Without
calculating the true frequency assume that this is 320 hz or similar.
However, the output I produce is not the result of a single fft, but
overlapped windows. The next window will also contain a sine wave at
320 hz, but be out of phase. On paper if I use a triangular window
during overlapping, I see that crossfading between the two waves of the
same frequency but a different phase will raise the frequency slightly
as the output sinewave starts with the phase of the sinewave of the
first window being overlapped, but catches up to the phase of the
sinewave output from the next window. Have I understood this properly?

Assuming that is correct, is there a window that can be used so that
the shape of the sinewave is not distorted? I would have assumed that
the shape of the window would be crucial in avoiding distortion, but
haven't yet seen a definitive statement on the choice of window.

Secondly, assuming that I throw away any partials equal to or over the
Nyquist limit, does this form of synthesis guarantee that the resulting
signal is band-limited? I would have thought so, but a colleague
suggests that the distortion inherent in the overlapping of windows
would be a form of aliasing distorition.

Finally, if I'm going to filter the partials with a lowpass filter
before the ifft, then a seat of the pants method would be to simply
ignore any partials over and above the cutoff frequency. This could
lead to audible partials suddenly disappearing from a sound as the
filter cutoff is changed, so perhaps a better method would be to have a
transition band where partials are partially attenuated, in addition to
the pass and stop bands. But, if I wanted to simulate (say) an analogue
filter, then I should be simulating both the frequency response and the
phase response of the filter. How would this be done?

Thanks in anticipation,

Ross-c

From: "" <abariska@student.ethz.ch>
Newsgroups: comp.dsp
Subject: Re: Frequency domain synthesis
Date: Fri, 06 Jan 2006 09:32:12 -0800


Ross Clement wrote:

> Hi everyone.
Hi Ross!
> > I was greatly interested in a thread a short while ago on performing > music-orientated audio synthesis in the frequency domain.
I don't see the point in that - as you noticed, you are faced with frame reconstruction and latency issues. Furthermore, time-domain oscillators are cheap, simple and not restricted to the (linearly spaced) DFT frequencies. Synthesis post processing like filtering is also a lot easier to do in time-domain (using standard IIR filters). Frequency domain processing is really only interesting for - implementing large, huge, and astronomical FIR filters (I'm currently working on a 10-fold audio convolution engine allowing for 1.44 MTaps each, running at 24bit/96kHz) or - implementing non-linear, spectrum dependent processing (spectral subtraction and the likes). Regards, Andor
Hi. The reasons I'm interested in frequency domain synthesis are these:

(i) I am *assuming* that it will be possible to make the synthesiser
fully bandlimited as any partials equal to or above the nyquist limit
can be thrown away. Certainly for the synthesis libraries I'm using at
the moment the bandlimited time domain oscillators sound quite
different from their non-bandlimited equivalents, so something is being
lost in the band-limiting. I would like to be able to hear frequency
domain oscillators to hear what they sound like.

(ii) I am very interested in looking at the analysis and resynthesis of
voice in the long term. I *believe* that additive synthesis would be a
good way of doing this. Frequency domain synthesis is a far quicker way
of doing additive synthesisers which use many, many partials. I can see
some "problems" with frequency domain synthesis, e.g. implemeting an
echo or a reverb would be a complete pain. But for single instruments,
frequency domain synthesis is certainly interesting. Also if I wanted a
general synthesiser which would require echo, reverb, etc., I'd need a
separate fft process for each effect send. But certainly I wish to
continue my current test program so that the technique is in my
arsenal.

(iii) I'm also very intertested in looking at frequency domain
filtering of signals. Filtering is easier in the time domain if you
have a suitable IIR design. But filter design is much easier in the
time domain as you can apply your desired frequency and (I presume)
phase responses to the partials directly, without any stability
problems. Again, I would like to hear a variety of frequency domain
filters in a musical context to hear what they sound like.

I have a test engine ready for when I work out exactly how to create
the spectrum for my first stab at it. Using portaudio and the fftw3
libraries, only 2% of my cpu power is taken up by the inverse fft. So
there's a lot of power left over for synthesis.

Perhaps I'm misguided in doing what I'm doing, but I'm strongly of the
opinion that sometimes it's a good idea to try things out as even if
you don't get a usable "product" at the end, frequently you learn stuff
during the  journey which is useful. Especially "non-book" style
knowledge.

Cheers,

Ross-c

Ross Clement (Email address invalid - do not use) wrote:
..
> (ii) I am very interested in looking at the analysis and resynthesis of > voice in the long term. I *believe* that additive synthesis would be a > good way of doing this. Frequency domain synthesis is a far quicker way > of doing additive synthesisers which use many, many partials.
.. You will want to check out the "FFT-1" technique described here: http://www.cnmat.berkeley.edu/~adrian/FFT-1/FFT-1_ICSPAT.html But note, Ircam patented it. Richard Dobson
In article <1136557854.000028.168040@g47g2000cwa.googlegroups.com>,
Ross Clement (Email address invalid - do not use) <clemenr@wmin.ac.uk> wrote:
>I was greatly interested in a thread a short while ago on performing >music-orientated audio synthesis in the frequency domain. > >Initially I was confused about the low frequency accuracy of the >resynthesis. E.g. to keep the latency down to 1/50th of a second you'd >need a window size of 1/100th of a second. For the purpose of argument, >assume that I'm going to synthesise a single sinewave of 325hz. If I >follow the advice of one of the papers I'd create the spectrum for the >current window by finding the bin with a centre frequency closest to >325hz and assign it a complex frequency corresponding to the correct >magnitude and phase. But when I perform an inverse fft on this >spectrum, I get a sinewave at the bin's centre frequency. ...
You could try using longer FFT windows to get higher frequency resolution, but use only a small slice of each window to keep latency down. Note then that the phase changes will correspond to the slice size, not the FFT window size. For non-bin-centered frequencies, you might also want to experiment with assigning both adjacent bins their corresponding complex frequencies for synthesis. The "leakage" bin energy might help smooth your window join. IMHO. YMMV. -- Ron Nicholson rhn AT nicholson DOT com http://www.nicholson.com/rhn/ #include <canonical.disclaimer> // only my own opinions, etc.
Ross Clement (Email address invalid - do not use) wrote in message
<1136644370.031260.88500@g47g2000cwa.googlegroups.com>...

>(ii) I am very interested in looking at the analysis and resynthesis of >voice in the long term. I *believe* that additive synthesis would be a >good way of doing this. Frequency domain synthesis is a far quicker way >of doing additive synthesisers which use many, many partials.
Hi Ross, By "far quicker" do you mean "lower CPU?" - I ask because I suspect that frequency domain synthesis will give you higher latency than using time domain additive synthesis - unless you do extreme overlapping in which case the CPU cost will shoot up! Have you considered running at a lower sample rate (e.g. 11025 samples per second) and interpolating by a factor of four using polyphase filters. If you are using additive synthesis you should be aware that most real world sounds have the majority of their sinusoidal components < 5KHz and above that you are probably (computationally) cheaper using noise modelling. Note too that you need to be really careful synthesising each sinusoid with time domain additive synthesis - my application uses a wavetable holding a single cycle of a sinusoid however I originally came unstuck as casting a float to an int (from phase to index) is really expensive on a Pentium (to do with flushing the pipeline) however if you use the function "lrintf" instead of a simple cast this avoids the pipeline flush and gives a major performance improvement. HTH Fraser.