## Dual Views of the Short Time Fourier Transform (STFT)

In the overlap-add formulation of Chapter 8, we used a
*hopping* window to extract time-limited signals to which we
applied the DFT. Assuming for the moment that the hop size
(the ``sliding DFT''), we have

This is the usual definition of the Short-Time Fourier Transform (STFT) (§7.1). In this chapter, we will look at the STFT from two different points of view: the

*OverLap-Add*(OLA) and

*Filter-Bank Summation*(FBS) points of view. We will show that one is the Fourier dual of the other [9]. Next we will explore some implications of the filter-bank point of view and obtain some useful insights. Finally, some applications are considered.

### Overlap-Add (OLA) Interpretation of the STFT

In the OLA interpretation of the STFT, we apply a time-shifted window
to our signal
, selecting data near time
, and
compute the Fourier-transform to obtain the spectrum of the
th
frame. As shown in Fig.9.1, the STFT is viewed as a
*time-ordered sequence of spectra*, one per frame, with the
frames overlapping in time.

### Filter-Bank Summation (FBS) Interpretation of the STFT

We can group the terms in the STFT definition differently to obtain
the *filter-bank interpretation*:

As will be explained further below (and illustrated further in Figures 9.3, 9.4, and 9.5), under the filter-bank interpretation, the spectrum of is first

*rotated*along the unit circle in the plane so as to shift frequency down to

**0**(via modulation by in the time domain), thus forming the heterodyned signal . Next, the heterodyned signal is lowpass-filtered to a narrow band about frequency

**0**(via convolving with the time-reversed window ). The STFT is thus interpreted as a

*frequency-ordered collection of narrow-band time-domain signals*, as depicted in Fig.9.2. In other words, the STFT can be seen as a uniform

*filter bank*in which the input signal is converted to a set of time-domain output signals , , one for each channel of the -channel filter bank.

Expanding on the previous paragraph, the STFT (9.2) is computed by the following operations:

- Frequency-shift by to get .
- Convolve
with
to get
:
(10.3)

*baseband signal*; that is, it is centered about dc, with the ``carrier term'' taken out by ``demodulation'' (frequency-shifting). In particular, the th channel signal is constant whenever the input signal happens to be a sinusoid tuned to frequency exactly.

Note that the STFT analysis window
is now interpreted as (the flip
of) a lowpass-filter impulse response. Since the analysis window
in the STFT is typically symmetric, we usually have
.
This filter is effectively frequency-shifted to provide each channel
bandpass filter. If the cut-off frequency of the window transform is
(typically half a main-lobe width), then each channel
signal can be downsampled significantly. This downsampling factor is
the FBS counterpart of the *hop size*
in the OLA context.

Figure 9.3 illustrates the filter-bank interpretation for (the ``sliding STFT''). The input signal is frequency-shifted by a different amount for each channel and lowpass filtered by the (flipped) window.

### FBS and Perfect Reconstruction

An important property of the STFT established in Chapter 8 is that
it is exactly *invertible* when the analysis window satisfies the
constant-overlap-add constraint. That is, neglecting numerical
round-off error, the inverse STFT reproduces the original input signal
exactly. This is called the *perfect reconstruction* property of
the STFT, and modern filter banks are usually designed with this
property in mind [287].

In the OLA processors of Chapter 8, perfect reconstruction was
assured by using FFT analysis windows
having the
*Constant-Overlap-Add* (COLA) property at the particular hop-size
used (see §8.2.1).

In the Filter Bank Summation (FBS) interpretation of the STFT
(Eq.
(9.1)), it is the *analysis filter-bank frequency
responses*
that are constrained to be COLA. We
will take a look at this more closely below.

**Next Section:**

STFT Filter Bank

**Previous Section:**

Review of Zero Padding