### Summary of STFT Computation Using FFTs

1. Read samples of the input signal into a local buffer of length which is initially zeroed

We call the th frame of the input signal, and the th time normalized input frame (time-normalized by translating it to time zero). The frame length is , which we assume to be odd for reasons to be discussed later. The time advance (in samples) from one frame to the next is called the hop size or step size.

2. Multiply the data frame pointwise by a length spectrum analysis window to obtain the th windowed data frame (time normalized):

3. Extend with zeros on both sides to obtain a zero-padded frame:

 (8.5)

where is chosen to be a power of two larger than . The number is the zero-padding factor. As discussed in §2.5.3, the zero-padding factor is the interpolation factor for the spectrum, i.e., each FFT bin is replaced by bins, interpolating the spectrum using ideal bandlimited interpolation [264], where the band'' in this case is the -sample nonzero duration of in the time domain.

4. Take a length FFT of to obtain the time-normalized, frequency-sampled STFT at time :

 (8.6)

where , and is the sampling rate in Hz. As in any FFT, we call the bin number.

5. If needed, time normalization may be removed using a linear phase term to yield the sampled STFT:

 (8.7)

The (continuous-frequency) STFT may be approached arbitrarily closely by using more zero padding and/or other interpolation methods.

Note that there is no irreversible time-aliasing when the STFT frequency axis is sampled to the points , provided the FFT size is greater than or equal to the window length .

Next Section:
Two Dual Interpretations of the STFT
Previous Section:
Practical Computation of the STFT