Search Spectral Audio Signal Processing
Would you like to be notified by email when Julius Orion Smith III publishes a new entry into his blog?
This section describes further details of the sines+noise+transients system of Scott Levine [132], which can be considered a modern descendent of the vocoder.
Figure 10.17 shows the time-frequency map used in the S+N+T system of Scott Levine [132]. Vertical line spacing in the time-frequency map indicates the time resolution of the underlying multiresolution STFT, and the horizontal line spacing indicates its frequency resolution. The time waveform appears below the time-frequency map. For transients, an interval of data including the transient is simply encoded using MPEG-2 AAC. The transient-time in Fig.10.17 extends from approximately 47 to 115 ms. (This interval can be tighter, as discussed further below.) Between transients, the signal model consists of sines+noise below 5 kHz and amplitude-modulated noise above. The spectrum from 0 to 5 kHz is divided into three octaves (``multiresolution sinusoidal modeling''). The time step-size varies from 25 ms in the low-frequency band (where the frequency resolution is highest), down to 6 ms in the third octave (where frequency resolution is four times lower). In the 0-5 kHz band, sines+noise modeling is carried out. Above 5 kHz, noise substition is performed, as discussed further below.
Figure 10.18 shows a similar frequency map in which the transient interval depends on frequency. This enables a tighter interval enclosing the transient, and follows audio perception more closely (see Appendix E).
Figure 10.19 illustrates the nature of the noise modeling used. The energy in each Bark band11.5 is summed, and this is used as the gain for the noise in that band at that frame time.
Figure 10.20 shows the frame gain versus time for a particular Bark band (top) and the piecewise linear envelope made from it (bottom). As illustrated in Figures 10.17 and 10.18, the step size for all of the Bark bands above 5 kHz is approximately 3 ms.
For more information on this sines+noise+transient system, see Scott Levine's CCRMA PhD/EE thesis [132].
