Sign in

username:

password:



Not a member?

Search Online Books



Search tips

Free Online Books

Ads

Chapters

Chapter Contents:

Search Spectral Audio Signal Processing

  

Book Index | Global Index


Would you like to be notified by email when Julius Orion Smith III publishes a new entry into his blog?

  

Filling the FFT Input Buffer (Step 2)

The FFT size $ N$ is normally chosen to be the first power of two that is at least twice the window length $ M$, with the difference $ N-M$ filled with zeros (``zero-padded''). The reason for increasing the FFT size and filling in with zeros is that zero-padding in the time domain corresponds to interpolation in the frequency domain, and interpolating the spectrum is useful in various ways. First, the problem of finding spectral peaks which are not exact bin frequencies is made easier when the spectrum is more densely sampled. Second, plots of the magnitude of the more smoothly sampled spectrum are less likely to confuse the untrained eye. (Only signals truly periodic in $ M$ samples should not be zero-padded. They should also be windowed only by the Rectangular window.) Third, for overlap-add synthesis from spectral modifications, the zero-padding allows for multiplicative modification in the frequency domain (convolutional modification in the time domain) without time aliasing in the inverse FFT. The length of the allowed convolution in the time domain (the impulse response of the effective digital filter) equals the number of extra zeros (plus one) in the zero padding.

If $ K$ is the number of samples in the main lobe when the zero-padding factor is 1 ($ N=M$), then a zero-padding factor of $ N/M$ gives $ KN/M$ samples for the same main lobe (and same main-lobe bandwidth). The zero-padding (interpolation) factor $ N/M$ should be large enough to enable accurate estimation of the true maximum of the main lobe after it has been frequency shifted by some arbitrary amount equal to the frequency of a sinusoidal component in the input signal. We have determined by computational search that, for a rectangularly windowed sinusoid (of any frequency), quadratic frequency interpolation (using the three highest bins) yields at least $ 0.1\%$ (of the distance from the sinc peak to the first zero-crossing) accuracy if the zero-padding factor $ N/M$ is 5 or higher.

Figure 10.12: Illustration of the first two steps of PARSHL. (a) Input data. (b) Windowed input data. (c) FFT buffer with the windowed input data. (d) Resulting magnitude spectrum.
\includegraphics[width=\textwidth,height=0.9\textheight]{eps/fig3}

As mentioned in the previous section, we facilitate phase detection by using a zero-phase window, i.e., the windowed data (using an odd length window) is centered about the time origin. A zero-centered, length $ M$ data frame appears in the length $ N$ FFT input buffer as shown in Fig.10.12c. The first $ (M-1)/2$ samples of the windowed data, the ``negative-time'' portion, will be stored at the end of the buffer, from sample $ N-(M-1)/2$ to $ N-1$, and the remaining $ (M+1)/2$ samples, the zero- and ``positive-time'' portion, will be stored starting at the beginning of the buffer, from sample 0 to $ (M-1)/2$. Thus, all zero padding occurs in the middle of the FFT input buffer.


Order a Hardcopy of Spectral Audio Signal Processing

Previous: Choice of Hop Size
Next: Peak Detection (Steps 3 and 4)

written by Julius Orion Smith III
Julius Smith's background is in electrical engineering (BS Rice 1975, PhD Stanford 1983). He is presently Professor of Music and Associate Professor (by courtesy) of Electrical Engineering at Stanford's Center for Computer Research in Music and Acoustics (CCRMA), teaching courses and pursuing research related to signal processing applied to music and audio systems. See http://ccrma.stanford.edu/~jos/ for details.


Comments


No comments yet for this page


Add a Comment
You need to login before you can post a comment (best way to prevent spam). ( Not a member? )