Once we have our data in the form of amplitude and frequency envelopes for each filter-bank channel, we can compress them by a large factor. If there are channels, we nominally expect to be able to downsample by a factor of , as discussed initially in Chapter 9 and more extensively in Chapter 11.
In early computer music [97,186], amplitude and frequency envelopes were ``downsampled'' by means of piecewise linear approximation. That is, a set of breakpoints were defined in time between which linear segments were used. These breakpoints correspond to ``knot points'' in the context of polynomial spline interpolation . Piecewise linear approximation yielded large compression ratios for relatively steady tonal signals.G.10For example, compression ratios of 100:1 were not uncommon for isolated ``toots'' on tonal orchestral instruments .
A more straightforward method is to simply downsample each envelope by some factor. Since each subband is bandlimited to the channel bandwidth, we expect a downsampling factor on the order of the number of channels in the filter bank. Using a hop size in the STFT results in downsampling by the factor (as discussed in §9.8). If channels are downsampled by , then the total number of samples coming out of the filter bank equals the number of samples going into the filter bank. This may be called critical downsampling, which is invariably used in filter banks for audio compression, as discussed further in Chapter 11. A benefit of converting a signal to critically sampled filter-bank form is that bits can be allocated based on the amount of energy in each subband relative to the psychoacoustic masking threshold in that band. Bit-allocation is typically different for tonal and noise signals in a band [113,25,16].
Vocoder-Based Additive-Synthesis Limitations