The simplest application of PARSHL is as an analysis tool since we can get a very good picture of the evolution of the sound in time by looking at the amplitude, frequency and phase trajectories. The tracking characteristics of the technique yield more accurate amplitudes and frequencies than if the analysis were done with an equally spaced bank of filters (the traditional STFT implementation).

In speech applications, the most common use of the STFT is for data-reduction. With a set of amplitude, frequency and phase functions we can get a very accurate resynthesis of many sounds with much less information than for the original sampled sounds. From our work it is still not clear how important is the phase information in the case of resynthesis without modifications, but McAulay and Quatieri [174] have shown the importance of phase in the case of speech resynthesis.

Some of the most interesting musical applications of the STFT techniques are given by their ability to separate temporal from spectral information, and, within each spectrum, pitch and harmonicity from formant information. In §H.0.5, Parameter Modifications, we discussed some of them, such as time scaling and pitch transposition. But this group of applications has a lot of possibilities that still need to be carefully explored. From the few experiments we have done to date, the tools presented give good results in situations where less flexible implementations do not, namely, when the input sound has inharmonic spectra and/or rapid frequency changes.

The main characteristic that differentiates this model from the traditional ones is the selectivity of spectral information and the phase tracking. This opens up new applications that are worth our attention. One of them is the use of additive synthesis in conjunction with other synthesis techniques. Since the program allows tracking of specific spectral components of a sound, we have the flexibility of synthesizing only part of a sound with additive, synthesis, leaving the rest for some other technique. For example, Serra [247] has used this program in conjunction with LPC techniques to model bar percussion instruments, and Marks and Polito [163] have modeled piano tones by using it in conjunction with FM synthesis [38]. David Jaffe has had good success with birdsong, and Rachel Boughton used PARSHL to create abstractions of ocean sounds.

One of the problems encountered when using several techniques to synthesize the same sound is the difficulty of creating the perceptual fusion of the two synthesis components. By using phase information we have the possibility of matching the phases of the additive synthesis part to the rest of the sound (independently of what technique was used to generate it). This provides improved signal ``splicing'' capability, allowing very fast cross-fades (e.g., over one frame period).

PARSHL was originally written to properly analyze the steady state of piano sounds; it did not address modeling the attack of the piano sound for purposes of resynthesis. The phase tracking was primarily motivated by the idea of splicing the real attack (sampled waveform) to its synthesized steady state. It is well known that additive synthesis techniques have a very hard time synthesizing attacks, both due to their fast transition and their ``noisy'' characteristics. The problem is made more difficult by the fact that we are very sensitive to the quality of a sound's attack. For plucked or struck strings, if we are able to splice two or three periods, or a few milliseconds, of the original sound into our synthesized version the quality can improve considerably, retaining a large data-reduction factor and the possibility of manipulating the synthesis part. When this is attempted without the phase information, the splice, even if we do a smooth cross-fade over a number of samples, can be very noticeable. By simply adding the phase data the task becomes comparatively easy, and the splice is much closer to inaudible.

Next Section:
Previous Section: