Sign in

username:

password:



Not a member?

Search Online Books



Search tips

Free Online Books

Ads

Chapters

Chapter Contents:

Search Spectral Audio Signal Processing

  

Book Index | Global Index


Would you like to be notified by email when Julius Orion Smith III publishes a new entry into his blog?

  

Phase Vocoder Implementation via FFT

The steps normally taken by a ``phase vocoder'' to measure instantaneous amplitude and frequency for each bin of each STFT frame are as follows (as practiced at CCRMA in the 1970s and early 1980s, based on the signal-processing and computer-music literature up to that point [168,169]):

  1. Compute the STFT as described in §10.4.

  2. For each positive-frequency FFT bin $ \tilde{x}_m^\prime (e^{j\omega_k })$,

    1. Convert the complex bin value to polar form to obtain magnitude

      $\displaystyle A_k(m) = \vert\tilde{x}_m^\prime (e^{j\omega_k })\vert
$

      and phase

      $\displaystyle \theta_k(m)=\angle \tilde{x}_m^\prime (e^{j\omega_k })
$

      instead of real and imaginary part, where $ k$ is the bin number, or ``channel number'' in the phase vocoder filter bank (frequency index), and $ m$ is the frame number (time index).

    2. Write out magnitude versus time $ A_k(m)$ to the output file of vocoder analysis data. $ A_k(m)$ is interpreted as an unprocessed amplitude envelope that will control the $ k$th sinusoidal oscillator in an additive-synthesis reconstruction of the signal from phase-vocoder data. It is usually further processed by converting it to a slowly varying, piecewise linear approximation, with nonuniformly space breakpoints. In the early days of CCRMA, these were called ``seg functions'' (where ``seg'' was short for ``segment'').

    3. Differentiate the unwrapped phase $ \theta_k$ to obtain instantaneous frequency:

      $\displaystyle F_k(m) = \frac{\theta_k(m) - \theta_k(m-1)}{2\pi R T} \qquad \hbox{(Hz)}
$

      where $ R$ is the STFT hop size, in samples, and $ T$ is the sampling period in seconds.

    4. Write out the instantaneous-frequency signals $ F_k(m)$. In an additive synthesis reconstruction, $ F_k(m)$ is summed with the carrier frequency of the $ k$th sinusoidal oscillator. Like $ A_k(m)$, $ F_k(m)$ was typically reduced by further processing to slowly varying, piecewise linear form. In other words, both amplitude and frequency envelopes were stored as ``seg functions''.


Order a Hardcopy of Spectral Audio Signal Processing

Previous: Computing the Vocoder Parameters
Next: Further Reading on the Vocoder

written by Julius Orion Smith III
Julius Smith's background is in electrical engineering (BS Rice 1975, PhD Stanford 1983). He is presently Professor of Music and Associate Professor (by courtesy) of Electrical Engineering at Stanford's Center for Computer Research in Music and Acoustics (CCRMA), teaching courses and pursuing research related to signal processing applied to music and audio systems. See http://ccrma.stanford.edu/~jos/ for details.


Comments


No comments yet for this page


Add a Comment
You need to login before you can post a comment (best way to prevent spam). ( Not a member? )