Additive Synthesis

Free Books Spectral Audio Signal Processing

Additive synthesis (now more often called ``sinusoidal modeling'') was one of the first computer-music synthesis methods, and it has been a mainstay ever since. In fact, it is extensively described in the first article of the first issue of the Computer Music Journal [186]. Some of the first high-quality synthetic musical instrument tones using additive synthesis were developed in the 1960s by Jean-Claude Risset at AT&T Bell Telephone Laboratories [233,232].

Additive synthesis was historically implemented using a sum of sinusoidal oscillators modulated by amplitude and frequency envelopes over time [186], and later using an inverse FFT [35,239] when the number of sinusoids is large.

Figure G.5 shows an example from John Grey's 1975 Ph.D. thesis (Psychology) illustrating the nature of partial amplitude envelopes computed for purposes of later resynthesis.

**Figure G.5:** John Grey 1975 -- CCRMA Tech. Reports 1 & 2 (CCRMA ``STANM'' reports -- available online)
$\includegraphics[width=0.8\twidth]{eps/grey-anal}$

Inverse FFT Synthesis

When the number of partial overtones is large, an explicit sinusoidal oscillator bank requires a significant amount of computation, and it becomes more efficient to use the inverse FFT to synthesize large ensembles of sinusoids [35,239,143,142,139]. This method gives the added advantage of allowing non-sinusoidal components such as filtered noise to be added in the frequency domain [246,249].

Inverse-FFT (IFFT) synthesis was apparently introduced by Hal Chamberlin in his classic 1980 book ``Musical Applications of Microprocessors'' [35]. His early method consisted of simply setting individual FFT bins to the desired amplitude and phase, so that an inverse FFT would efficiently synthesize a sum of fixed-amplitude, fixed-frequency sinusoids in the time domain.

This idea was extended by Rodet and Depalle [239] to include shaped amplitudes in the time domain. Instead of writing isolated FFT bins, they wrote entire main lobes into the buffer, where the main lobes corresponded to the desired window shape in the time domain.^G.7 (Side lobes of the window transform were neglected.) They chose the triangular window ( $\hbox{asinc}^2$ main-lobe shape), thereby implementing a linear cross-fade from one frame to the next in the time domain.

A remaining drawback of IFFT synthesis was that inverse FFTs generate sinusoids at fixed frequencies, so that a rapid glissando may become ``stair-cased'' in the resynthesis, stepping once in frequency per output frame.

An extension of IFFT synthesis to support linear frequency sweeps was devised by Goodwin and Kogon [93]. The basic idea was to tabulate window main-lobes for a variety of sweep rates. (The phase variation across the main lobe determines the frequency variation over time, and the width of the main lobe determines its extent.) In this way, frequencies could be swept within an FFT frame instead of having to be constant with a cross-fade from one static frame to the next.

Chirplet Synthesis

Independently of Goodwin and Kogon, Marques and Almeida introduced chirplet modeling of speech in 1989 [164]. This technique is based on the interesting mathematical fact that the Fourier transform of a Gaussian-windowed chirp remains a Gaussian pulse in the frequency domain (§10.6). Instead of measuring only amplitude and phase at each a spectral peak, the parameters of a complex Gaussian are fit to each peak. The (complex) parameters of each Gaussian peak in the spectral model determine a Gaussian amplitude-envelope and a linear chirp rate in the time domain. Thus, both cross-fading and frequency sweeping are handled automatically by the spectral model. A specific method for carrying this out is described in §10.6. More recent references on chirplet modeling include [197,90,91,89].

Nonparametric Spectral Peak Modeling

Beginning in 1999, Laroche and Dolson extended IFFT synthesis (§G.8.1) further by using raw spectral-peak regions from STFT analysis data [143,142,139]. By preserving the raw spectral peak (instead of modeling it mathematically as a window transform or complex Gaussian function), the original amplitude envelope and frequency variation are preserved for the signal component corresponding to the analyzed peak in the spectrum. To implement frequency-shifting, for example, the raw peaks (defined as ``regions of influence'' around a peak-magnitude bin) are shifted accordingly, preserving the original amplitude and phase of the FFT bins within each peak region.

Efficient Specialized Methods

We have already mentioned inverse-FFT synthesis as a means of greatly decreasing the cost of additive synthesis relative to a full-blown bank of sinusoidal oscillators. This section summarizes a number of more specialized methods which reduce the computational cost of additive synthesis and are widely used.

Wavetable Synthesis

For periodic sounds, the sinusoidal components are all harmonics of some fundamental frequency. If in addition they can be constrained to vary together in amplitude over time, then they can be implemented using a single wavetable containing one period of the sound. Amplitude shaping is handled by multiplying the output of the wavetable look-up by an amplitude-envelope generated separately [186,167]. Using interpolation (typically linear, but sometimes better), the table may be played back at any fundamental frequency, and its output is then multiplied by the amplitude envelope shared by all harmonics. (The harmonics may still have arbitrary relative levels.) This form of ``wavetable synthesis'' was commonly used in the early days of computer music. This method is still commonly used for synthesizing harmonic spectra.^G.8

Note that sometimes the term ``wavetable synthesis'' is used to refer to what was originally called sampling synthesis: playback of sampled tones from memory, with looping of the steady-state portion to create an arbitrarily long sustain [165,27,107,193]. This book adheres to the original terminology. For sampling synthesis, spectral phase-modifications (Chapter 8) can be used to provide perfectly seamless loops [165].

Group-Additive Synthesis

The basic idea of group-additive synthesis [130,69] is to employ a set of wavetables, each modeling a harmonic subset of the tonal components making up the overall spectrum of the synthesized tone. Since each wavetable oscillator is independent, inharmonic sounds can be synthesized to some degree of approximation, and the amplitude envelopes are not completely locked together. It is important to be aware that human audio perception cannot tell the difference between harmonic and inharmonic partials at high frequencies (where ``high'' depends on the fundamental frequency and timbre to some extent). Thus, group-additive synthesis provides a set of intermediate points between wavetable synthesis and all-out additive synthesis.