Filling the FFT Input Buffer (Step 2)
The FFT size is normally chosen to be the first power of two that is at least twice the window length , with the difference filled with zeros (``zero-padded''). The reason for increasing the FFT size and filling in with zeros is that zero-padding in the time domain corresponds to interpolation in the frequency domain, and interpolating the spectrum is useful in various ways. First, the problem of finding spectral peaks which are not exact bin frequencies is made easier when the spectrum is more densely sampled. Second, plots of the magnitude of the more smoothly sampled spectrum are less likely to confuse the untrained eye. (Only signals truly periodic in samples should not be zero-padded. They should also be windowed only by the Rectangular window.) Third, for overlap-add synthesis from spectral modifications, the zero-padding allows for multiplicative modification in the frequency domain (convolutional modification in the time domain) without time aliasing in the inverse FFT. The length of the allowed convolution in the time domain (the impulse response of the effective digital filter) equals the number of extra zeros (plus one) in the zero padding.
If is the number of samples in the main lobe when the zero-padding factor is 1 ( ), then a zero-padding factor of gives samples for the same main lobe (and same main-lobe bandwidth). The zero-padding (interpolation) factor should be large enough to enable accurate estimation of the true maximum of the main lobe after it has been frequency shifted by some arbitrary amount equal to the frequency of a sinusoidal component in the input signal. We have determined by computational search that, for a rectangularly windowed sinusoid (of any frequency), quadratic frequency interpolation (using the three highest bins) yields at least (of the distance from the sinc peak to the first zero-crossing) accuracy if the zero-padding factor is 5 or higher.
As mentioned in the previous section, we facilitate phase detection by using a zero-phase window, i.e., the windowed data (using an odd length window) is centered about the time origin. A zero-centered, length data frame appears in the length FFT input buffer as shown in Fig.H.1c. The first samples of the windowed data, the ``negative-time'' portion, will be stored at the end of the buffer, from sample to , and the remaining samples, the zero- and ``positive-time'' portion, will be stored starting at the beginning of the buffer, from sample 0 to . Thus, all zero padding occurs in the middle of the FFT input buffer.
Peak Detection (Steps 3 and 4)
Choice of Hop Size