## Additive Synthesis

Additive synthesis (now more often called ``sinusoidal modeling'') was one of the first computer-music synthesis methods, and it has been a mainstay ever since. In fact, it is extensively described in the first article of the first issue of the Computer Music Journal [186]. Some of the first high-quality synthetic musical instrument tones using additive synthesis were developed in the 1960s by Jean-Claude Risset at AT&T Bell Telephone Laboratories [233,232].

Additive synthesis was historically implemented using a sum of sinusoidal oscillators modulated by amplitude and frequency envelopes over time [186], and later using an inverse FFT [35,239] when the number of sinusoids is large.

Figure G.5 shows an example from John Grey's 1975 Ph.D. thesis (Psychology) illustrating the nature of partial amplitude envelopes computed for purposes of later resynthesis.

### Inverse FFT Synthesis

When the number of partial overtones is large, an explicit sinusoidal
oscillator bank requires a significant amount of computation, and it
becomes more efficient to use the *inverse FFT* to synthesize
large ensembles of sinusoids
[35,239,143,142,139].
This method gives the added advantage of allowing non-sinusoidal
components such as filtered noise to be added in the frequency domain
[246,249].

Inverse-FFT (IFFT) synthesis was apparently introduced by Hal Chamberlin in his classic 1980 book ``Musical Applications of Microprocessors'' [35]. His early method consisted of simply setting individual FFT bins to the desired amplitude and phase, so that an inverse FFT would efficiently synthesize a sum of fixed-amplitude, fixed-frequency sinusoids in the time domain.

This idea was extended by Rodet and Depalle [239] to
include *shaped* amplitudes in the time domain. Instead of
writing isolated FFT bins, they wrote entire *main lobes* into
the buffer, where the main lobes corresponded to the desired window
shape in the time domain.^{G.7} (Side
lobes of the window transform were neglected.) They chose the
triangular window (
main-lobe shape), thereby implementing a
linear cross-fade from one frame to the next in the time domain.

A remaining drawback of IFFT synthesis was that inverse FFTs generate
sinusoids at *fixed* frequencies, so that a rapid glissando may
become ``stair-cased'' in the resynthesis, stepping once in frequency
per output frame.

An extension of IFFT synthesis to support linear *frequency
sweeps* was devised by Goodwin and Kogon [93]. The basic
idea was to tabulate window main-lobes for a variety of sweep rates.
(The phase variation across the main lobe determines the frequency
variation over time, and the width of the main lobe determines its
extent.) In this way, frequencies could be swept within an FFT frame
instead of having to be constant with a cross-fade from one static
frame to the next.

### Chirplet Synthesis

Independently of Goodwin and Kogon, Marques and Almeida introduced
*chirplet modeling* of speech in 1989 [164].
This technique is based on the interesting mathematical fact that the
Fourier transform of a Gaussian-windowed chirp remains a Gaussian
pulse in the frequency domain (§10.6). Instead of
measuring only amplitude and phase at each a spectral peak, the
parameters of a complex Gaussian are fit to each peak. The (complex)
parameters of each Gaussian peak in the spectral model determine a
Gaussian amplitude-envelope *and* a linear chirp rate in the time
domain. Thus, both cross-fading and frequency sweeping are handled
automatically by the spectral model. A specific method for carrying
this out is described in §10.6.
More recent references on chirplet modeling include
[197,90,91,89].

### Nonparametric Spectral Peak Modeling

Beginning in 1999, Laroche and Dolson extended IFFT synthesis
(§G.8.1) further by using raw spectral-peak *regions*
from STFT analysis data
[143,142,139]. By
preserving the raw spectral peak (instead of modeling it
mathematically as a window transform or complex Gaussian function),
the *original amplitude envelope and frequency variation are
preserved* for the signal component corresponding to the analyzed
peak in the spectrum. To implement frequency-shifting, for example,
the raw peaks (defined as ``regions of influence'' around a
peak-magnitude bin) are shifted accordingly, preserving the original
amplitude and phase of the FFT bins within each peak region.

### Efficient Specialized Methods

We have already mentioned inverse-FFT synthesis as a means of greatly decreasing the cost of additive synthesis relative to a full-blown bank of sinusoidal oscillators. This section summarizes a number of more specialized methods which reduce the computational cost of additive synthesis and are widely used.

#### Wavetable Synthesis

For *periodic sounds*, the sinusoidal components are all
*harmonics* of some fundamental frequency. If in addition they
can be constrained to *vary together in amplitude* over time,
then they can be implemented using a *single wavetable*
containing one period of the sound. Amplitude shaping is handled by
multiplying the output of the wavetable look-up by an
amplitude-envelope generated separately [186,167].
Using interpolation (typically linear, but sometimes better), the
table may be played back at any fundamental frequency, and its output
is then multiplied by the amplitude envelope shared by all harmonics.
(The harmonics may still have arbitrary *relative* levels.) This
form of ``wavetable synthesis'' was commonly used in the early days of
computer music. This method is still commonly used for synthesizing
harmonic spectra.^{G.8}

Note that sometimes the term ``wavetable synthesis'' is used to refer
to what was originally called *sampling synthesis*: playback of
sampled tones from memory, with looping of the steady-state portion to
create an arbitrarily long sustain
[165,27,107,193]. This
book adheres to the original terminology. For sampling synthesis,
spectral phase-modifications (Chapter 8) can be used to provide
perfectly seamless *loops* [165].

#### Group-Additive Synthesis

The basic idea of *group-additive synthesis*
[130,69] is to employ a *set*
of wavetables, each modeling a *harmonic subset* of the tonal
components making up the overall spectrum of the synthesized tone.
Since each wavetable oscillator is independent, inharmonic sounds can
be synthesized to some degree of approximation, and the amplitude
envelopes are not completely locked together. It is important to be
aware that human audio perception cannot tell the difference between
harmonic and inharmonic partials at high frequencies (where ``high''
depends on the fundamental frequency and timbre to some extent).
Thus, group-additive synthesis provides a set of intermediate points
between wavetable synthesis and all-out additive synthesis.

### Further Reading, Additive Synthesis

For more about the history of additive synthesis, see the chapter on
``Sampling and Additive Synthesis'' in [235]. For
``hands-on'' introductions to additive synthesis (with examples in
software), see [216]
(`pd`),
[60] (`Csound`
[19]), or [183] (`cmusic`). A
discussion of the phase vocoder in conjunction with additive synthesis
begins in §G.10.

**Next Section:**

Frequency Modulation (FM) Synthesis

**Previous Section:**

Phase Vocoder