Sign in

username:

password:



Not a member?

Search Online Books



Search tips

Free Online Books

Ads

Chapters

Chapter Contents:

Search Spectral Audio Signal Processing

  

Book Index | Global Index


Would you like to be notified by email when Julius Orion Smith III publishes a new entry into his blog?

  

PARSHL

A difficulty with the phase vocoder, as traditionally implemented, is that it uses a fixed uniform filter bank. While this works well for periodic signals, it is relatively inconvenient for inharmonic signals. An ``inharmonic phase vocoder'' called PARSHL11.2 was developed in 1985 to address this problem in the context of piano signal modeling [248]. PARSHL worked by tracking peaks in the short-time Fourier transform (STFT), thereby synthesizing an adaptive inharmonic FIR filter bank, replacing the fixed uniform filter bank of the vocoder. In other respects, PARSHL could be regarded as a phase-vocoder analysis program. This section is adapted from the original paper describing PARSHL [248].

The PARSHL program converted an STFT to a set of amplitude and frequency envelopes for inharmonic, quasi-sinusoidal-sum signals. Only the most prominent peaks in the spectrum of the input signal were tracked. For quasi-harmonic sounds, such as the piano, the amplitudes and frequencies were sampled approximately once per period of the lowest frequency in the analysis band. For resynthesis, PARSHL supported both additive synthesis [216] using an oscillator bank and overlap-add reconstruction from the STFT, or both.

PARSHL followed the amplitude, frequency, and phase11.3 of the most prominent peaks over time in a series of spectra, computed using the Fast Fourier Transform (FFT) The synthesis part of the program used the analysis parameters, or their modification, to generate a sinewave in the output for each peak track found.

The steps carried out by PARSHL were as follows:

  1. Compute the STFT $ \tilde{x}_m^\prime (e^{j\omega_k })$ using the frame size, window type, FFT size, and hop size specified by the user.

  2. Compute the squared magnitude spectrum in dB ( $ 20\log_{10}\left\vert\tilde{x}_m^\prime (e^{j\omega_k })\right\vert$).

  3. Find the bin numbers (frequency samples) of the spectral peaks. Parabolic interpolation is used to refine the peak location estimates. Three spectral samples (in dB) consisting of the local peak in the FFT and the samples on either side of it suffice to determine the parabola used.

  4. The magnitude and phase of each peak is calculated from the maximum of the parabola determined in the previous step. The parabola is evaluated separately on the real and imaginary parts of the spectrum to provide a complex interpolated spectrum value.

  5. Each peak is assigned to a frequency track by matching the peaks of the previous frame with the current one. These tracks can be ``started up,'' ``turned-off'' or ``turned-on'' at any frame by ramping in amplitude from or toward 0.

  6. Arbitrary modifications can be applied to the analysis parameters before resynthesis.

  7. If additive synthesis is requested, a sinewave is generated for each frequency track, and all are summed into an output buffer. The instantaneous amplitude, frequency, and phase for each sinewave are calculated by interpolating the values from frame to frame. The length of the output buffer is equal to the hop size $ R$ which is typically some fraction of the window length $ M$.

  8. Repeat from step 1, advancing $ R$ samples each iteration until the end of the input sound is reached.

The following sections provide further details:



Subsections

Order a Hardcopy of Spectral Audio Signal Processing

Previous: Further Reading on the Vocoder
Next: Choice of Hop Size

written by Julius Orion Smith III
Julius Smith's background is in electrical engineering (BS Rice 1975, PhD Stanford 1983). He is presently Professor of Music and Associate Professor (by courtesy) of Electrical Engineering at Stanford's Center for Computer Research in Music and Acoustics (CCRMA), teaching courses and pursuing research related to signal processing applied to music and audio systems. See http://ccrma.stanford.edu/~jos/ for details.


Comments


No comments yet for this page


Add a Comment
You need to login before you can post a comment (best way to prevent spam). ( Not a member? )