Free Books

Cepstral Windowing

The spectral envelope obtained by cepstral windowing is defined as

$\displaystyle Y_m \eqsp \hbox{\sc DFT}[w \cdot \underbrace{\hbox{\sc DFT}^{-1}\log(\vert X_m\vert)}_{\hbox{real cepstrum}}]$ (11.2)

where $ w$ is a lowpass-window in the cepstral domain. A simple but commonly used lowpass-window is given by

$\displaystyle w(n) \eqsp \left\{\begin{array}{ll} 1, & \vert n\vert< n_c \\ [5pt] 0.5, & \vert n\vert=n_c \\ [5pt] 0, & \vert n\vert>n_c, \\ \end{array} \right.$ (11.3)

where $ n_c$ denotes the lowpass ``cut-off'' sample.

The log-magnitude spectrum of $ X_m$ is thus lowpass filtered (the real cepstrum of $ x$ is ``liftered'') to obtain a smooth spectral envelope. For periodic signals, $ n_c$ should be set below the period in samples.

Cepstral coefficients are typically used in speech recognition to characterize spectral envelopes, capturing primarily the formants (spectral resonances) of speech [227]. In audio applications, a warped frequency axis, such as the ERB scale (Appendix E), Bark scale, or Mel frequency scale is typically preferred. Mel Frequency Cepstral Coefficients (MFCC) appear to remain quite standard in speech-recognition front ends, and they are often used to characterize steady-state spectral timbre in Music Information Retrieval (MIR) applications.

Next Section:
Linear Prediction Spectral Envelope
Previous Section:
References on Estimation