### Sines + Noise Modeling

As mentioned in the introduction to this chapter, it takes many
sinusoidal components to synthesize noise well (as many as 25 per
critical band of hearing under certain conditions [85]).
When spectral peaks are that dense, they are no longer perceived
individually, and it suffices to match only their *statistics* to
a perceptually equivalent degree.

Sines+Noise (S+N) synthesis [249] generalizes the
sinusoidal signal models to include a *filtered noise component*,
as depicted in Fig.10.7. In that figure, white noise is
denoted by
, and the slowly changing linear filter applied to
the noise at time
is denoted
.

The time-varying spectrum of the signal is said to be made up of a
*deterministic* component (the sinusoids) and a *stochastic*
component (time-varying filtered noise) [246,249]:

(11.21) |

where and are the instantaneous amplitude and phase of the th sinusoidal component, and is the residual, or noise signal, assumed to be well modeled by filtered white noise:

(11.22) |

where is the white noise, and is the impulse response of a time varying linear filter at time . Specifically, is the response at time to an impulse at time .

Filtering white-noise to produce a desired timbre is an example of
*subtractive synthesis* [186]. Thus, additive
synthesis is nicely supplemented by subtractive synthesis as well.

#### Sines+Noise Analysis

The original sines+noise *analysis* method is shown in
Fig.10.11 [246,249]. The processing path along
the top from left to right measures the amplitude and frequency
trajectories from magnitude peaks in the STFT, as in Fig.10.10.
The peak amplitude and frequency trajectories are converted back to
the time domain by additive-synthesis (an oscillator bank or inverse
FFT), and this signal is windowed by the same analysis window and
forward-transformed back into the frequency domain. The
magnitude-spectrum of this sines-only data is then subtracted from the
originally computed magnitude-spectrum containing both peaks and
``noise''. The result of this subtraction is termed the
*residual signal*. The upper spectral envelope of the residual
magnitude spectrum is measured using, *e.g.*, linear prediction, cepstral
smoothing, as discussed in §10.3 above, or by simply
connecting peaks of the residual spectrum with linear segments to form
a more traditional (in computer music) piecewise linear spectral
envelope.

#### S+N Synthesis

A sines+noise *synthesis* diagram is shown in Fig.10.12.
The spectral-peak amplitude and frequency trajectories are possibly
modified (time-scaling, frequency scaling, virtual formants, etc.) and
then rendered into the time domain by additive synthesis. This is
termed the *deterministic part* of the synthesized signal.

The *stochastic part* is synthesized by applying the
residual-spectrum-envelope (a time-varying FIR filter) to white noise,
again after possible modifications to the envelope.

To synthesize a frame of filtered white noise, one can simply impart a
*random phase* to the spectral envelope, *i.e.*, multiply it by
, where
is random and
uniformly distributed between
and
. In the time domain,
the synthesized white noise will be approximately *Gaussian* due
to the *central limit theorem* (§D.9.1). Because the
filter (spectral envelope) is changing from frame to frame through
time, it is important to use at least 50% overlap and non-rectangular
windowing in the time domain. The window can be implemented directly
in the frequency domain by convolving its transform with the complex
white-noise spectrum (§3.3.5), leaving only overlap-add to be
carried out in the time domain. If the window side-lobes can be fully
neglected, it suffices to use only main lobe in such a convolution
[239].

In Fig.10.12, the deterministic and stochastic components are summed after transforming to the time domain, and this is the typical choice when an explicit oscillator bank is used for the additive synthesis. When the IFFT method is used for sinusoid synthesis [239,94,139], the sum can occur in the frequency domain, so that only one inverse FFT is required.

#### Sines+Noise Summary

To summarize, sines+noise modeling is carried out by a procedure such as the following:

- Compute a
*sinusoidal model*by tracking peaks across STFT frames, producing a set of amplitude envelopes and frequency envelopes , where is the frame number and is the spectral-peak number. - Also record
*phase*for frames containing a transient. - Subtract modeled peaks from each STFT spectrum to form a
*residual spectrum*. - Fit a smooth
*spectral envelope*to each residual spectrum. - Convert envelopes to reduced form,
*e.g.*, piecewise linear segments with nonuniformly distributed breakpoints (optimized to be maximally sparse without introducing audible distortion). - Resynthesize audio (along with any desired transformations) from
the amplitude, frequency, and noise-floor-filter envelopes.
- Alter frequency trajectories slightly to hit the desired phase
for transient frames (as described below equation
Eq.
(10.19)).

Because the signal model consists entirely of *envelopes*
(neglecting the phase data for transient frames), the signal model is
easily *time scaled*, as discussed further in §10.5 below.

For more information on sines+noise signal modeling, see, *e.g.*,
[146,10,223,248,246,149,271,248,271]. A discussion from an historical
perspective appears in §G.11.4.

**Next Section:**

Sines + Noise + Transients Models

**Previous Section:**

Additive Synthesis Analysis