Sign in

username:

password:



Not a member?

Search Online Books



Search tips

Free Online Books

Sponsor

NEW! TMS320C6474: 3x the performance. 1/3 the cost. Three 1 GHz cores on 1 chip.

Chapters

Chapter Contents:

Search Spectral Audio Signal Processing

  

Book Index | Global Index


Would you like to be notified by email when Julius Orion Smith III publishes a new entry into his blog?

  

Summary of STFT Computation Using the FFT

  1. Read $ M$ samples of the input signal $ x$ into a local buffer of length $ N\geq M$ which is initially zeroed

    $\displaystyle \tilde{x}_m(n) \isdef x(n+mR), \qquad n=-M_h,
\ldots\,,-1,0,1,\ldots\,,
M_h
$

    where $ x_m$ is called the $ m$th frame of the input signal, and $ M\isdef 2M_h+1$ is the frame length (which we assume is odd for reasons to be discussed later). The time advance $ R$ (in samples) from one frame to the next is called the hop size or step size.

  2. Multiply the data frame pointwise by a length $ M$ spectrum analysis window $ w(n), n=-M_h,\ldots\,,M_h$ to obtain the $ m$th windowed data frame:

    $\displaystyle \tilde{x}_m^w(n) \isdef \tilde{x}_m(n) w(n), \qquad n=-M_h,\ldots\,,M_h
$

  3. Extend $ \tilde{x}_m^w$ with zeros on both sides to obtain a zero-padded windowed data frame:

    $\displaystyle \tilde{x}_m^{w,z}(n) \isdef \left\{\begin{array}{ll}
\tilde{x}_m^...
...frac{N}{2}}-1 \\ [5pt]
0, & -{\frac{N}{2}}\leq n < -M_h \\
\end{array}\right.
$

    where $ N$ is chosen to be a power of two larger than $ M$. The number $ N/M$ is the zero-padding factor; as discussed in §4.3, the zero-padding factor is the interpolation factor for the spectrum, i.e., each FFT bin is replaced by $ N/M$ bins, interpolating the spectrum using ideal bandlimited interpolation [243], where the ``band'' in this case is the $ M$-sample nonzero duration of $ \tilde{x}_m^w$ in the time domain.

  4. Take a length $ N$ FFT of $ \tilde{x}_m^{w,z}$ to obtain the STFT at time $ m$:

    $\displaystyle \tilde{X}_m^{w,z}(e^{j\omega_k })=\sum _{n=-N/2}^{N/2-1} \tilde{x}_m^{w,z}(n) e^{-j\omega_k n T}
$

    where $ \omega_k = 2\pi k f_s / N $, and $ f_s=1/T$ is the sampling rate in Hz. The STFT bin number is $ k$.


Order a Hardcopy of Spectral Audio Signal Processing

Previous: Practical Computation of the STFT
Next: Two Dual Interpretations of the STFT

written by Julius Orion Smith III
Julius Smith's background is in electrical engineering (BS Rice 1975, PhD Stanford 1983). He is presently Professor of Music and Associate Professor (by courtesy) of Electrical Engineering at Stanford's Center for Computer Research in Music and Acoustics (CCRMA), teaching courses and pursuing research related to signal processing applied to music and audio systems. See http://ccrma.stanford.edu/~jos/ for details.


Comments


No comments yet for this page


Add a Comment
You need to login before you can post a comment (best way to prevent spam). ( Not a member? )