Sign in

username:

password:



Not a member?

Search Online Books



Search tips

Free Online Books



Chapters

Chapter Contents:

Search Spectral Audio Signal Processing

  

Book Index | Global Index


Would you like to be notified by email when Julius Orion Smith III publishes a new entry into his blog?

  

Computing the Vocoder Parameters

For the channel vocoder, we only need to determine the amplitude of the signal. One technique is to apply an envelope follower to each subband. Specifically, this can be done by rectification and subsequent low pass filtering. This produces an approximation of the energy in each subband. This approach has been applied in parametric speech models.


\begin{psfrags}
% latex2html id marker 28347\psfrag{x} []{ \normalsize$ x_k(t)...
...ope
extraction in continuous-time analog circuits.}
\end{figure}
\end{psfrags}

In the case of the Phase Vocoder, we need to determine both the amplitude and the phase of the signal in each subband. Under the assumption of no more than one varying sinusoid in each subband, we can represent the signal in each channel as

$\displaystyle x_k(t)=a_k(t)\cos[ \omega_kt + \phi_k(t) ]
$

where $ \omega_k$ is the fixed channel center frequency. This gives us two real signals for each vocoder channel: $ a_k(t) $ is also called the amplitude envelope.

In order to determine these signals, it is helpful to express the channel signal $ x_k(t)$ in its complex ``analytic'' representation. We will denote this by $ x_k^a(t)$. Ideally, the imaginary part of the analytic signal is obtained from its real part using the Hilbert transform:


\begin{psfrags}
% latex2html id marker 28364\psfrag{x} []{ \normalsize$ x_k(t)...
...nal
from its real part using the Hilbert transform.}
\end{figure}
\end{psfrags}

Practical Hilbert transformers were covered in the previous lecture.

$\displaystyle x_k^a(t) = \Re \{ x_k^a(t) \} + j\ensuremath{\hbox{Im}}\{ x_k^a(t)\} = a_k(t)e^{j[ \omega_kt +\phi_k(t)] }
$

Hence,

\begin{eqnarray*}
a_k(t) &=& \vert x_k^a(t) \vert \\
\phi_k(t) &=& \angle x_k^a...
...\hbox{Im}}\{x_k^a(t) \}}
{\Re\{x_k^a(t) \}} \right] - \omega_kt
\end{eqnarray*}

We normally work in practice with instantaneous frequency deviation instead of phase:

$\displaystyle \Delta \omega_k(t) \mathrel{\stackrel{\Delta}{=}}\frac{d}{dt} \phi_k(t)
$

Since the $ k$th channel of an $ N$-channel uniform filterbank has nominal bandwidth given by $ f_s/N$, the frequency deviation usually does not exceed $ \pm f_s/(2N)$.

Note that $ x_k^a(t)$ is a narrowband signal centered about the channel frequency $ \omega_k$. It is common to heterodyne the channel output signals to ``base band'' by shifting its spectrum by $ -\omega_k$ so as to center the channel bandwidth about zero. This is accomplished by modulating the analytic signal by $ \exp(-j\omega_k t)$ to get

$\displaystyle x_k^m(t) \mathrel{\stackrel{\Delta}{=}}e^{-j\omega_k t} x_k^a(t) = a_k(t) e^{j\phi_k(t)}
$

Working with the baseband channel signals, we may compute the frequency deviation more easily as simply the derivative of the instantaneous phase:

$\displaystyle \Delta\omega_k(t) \mathrel{\stackrel{\Delta}{=}}\frac{d}{dt} \angle x_k^m(t) = \dot{\phi}_k(t)
$

Let, $ x \mathrel{\stackrel{\Delta}{=}}\Re \{ x_k^m(t) \} $ and $ y\mathrel{\stackrel{\Delta}{=}}\ensuremath{\hbox{Im}}\{ x_k^m(t) \} $. Then we have

\begin{eqnarray*}
\dot{\phi}_k(t) &=& \frac{d}{dt}\tan^{-1}\left(\frac{y}{x}\rig...
...x - y\dot{x}/x^2]}{x^2+y^2}
= \frac{x\dot{y}-y\dot{x}}{x^2+y^2}
\end{eqnarray*}

For each of the subbands, we get data which looks like the following:


\begin{psfrags}
% latex2html id marker 28404\psfrag{ak} []{ \normalsize$ a_k(t...
...ude envelope (top)
and frequency envelope (bottom).}
\end{figure}
\end{psfrags}

Once we have data in this form, we can compress it

  • Piecewise linear approximation
    • Large compression ratios are possible
    • Depends on the nature of the signal
  • Decimate each channel
    • Each subband is bandlimited to the channel bandwidth
    • Actually, this just gets us back to the original number of samples
      • N channels
      • decimate by N
  • Requantize the signal
    • Allocate bits depending on the amount of energy in each subband

There are many inherent problems with this technique:

  • We required a maximum of one sinusoid per subband
    • This means we need lots of filters
  • Poor model for signals with transients or sharp attacks
  • Inconvenient for inharmonic signals
  • Inefficient model for signals with noise like qualities (e.g., flute)
  • Not an identity system
    (unless phase retained and no data reduction done)
  • Computationally expensive

Note: in some phase-vocoder applications, such as time-scale-modification and pitch shifting, the instantaneous frequencies of the channel signals are not explicitly computed. We will return to this topic after we have introduced the Weighted Overlap-Add (WOLA) method for short-time Fourier analysis, modification, and resynthesis.


Order a Hardcopy of Spectral Audio Signal Processing

Previous: Vocoder Analysis
Next: Phase Vocoder Implementation via FFT

written by Julius Orion Smith III
Julius Smith's background is in electrical engineering (BS Rice 1975, PhD Stanford 1983). He is presently Professor of Music and Associate Professor (by courtesy) of Electrical Engineering at Stanford's Center for Computer Research in Music and Acoustics (CCRMA), teaching courses and pursuing research related to signal processing applied to music and audio systems. See http://ccrma.stanford.edu/~jos/ for details.


Comments


No comments yet for this page


Add a Comment
You need to login before you can post a comment (best way to prevent spam). ( Not a member? )