To summarize, sines+noise modeling is carried out by a procedure such as the following:
- Compute a sinusoidal model by tracking peaks across STFT
frames, producing a set of amplitude envelopes
is the frame number and
is the spectral-peak number.
- Also record phase
containing a transient.
- Subtract modeled peaks from each STFT spectrum to form a
- Fit a smooth spectral envelope
- Convert envelopes to reduced form, e.g., piecewise linear
segments with nonuniformly distributed breakpoints (optimized to be
maximally sparse without introducing audible distortion).
- Resynthesize audio (along with any desired transformations) from
the amplitude, frequency, and noise-floor-filter envelopes.
- Alter frequency trajectories slightly to hit the desired phase
for transient frames (as described below equation
Sines+Noise+Transients Time-Frequency Maps